Compare commits

...

376 Commits

Author SHA1 Message Date
dirkf
ed1ad29633 [extractor/bigo] Update API URL 2022-08-28 01:47:16 +01:00
dirkf
9a0a381954 [bigo] Supply missing media extension 2022-08-28 01:40:08 +01:00
dirkf
4f10e1e7c7 [test] Identify download errors better 2022-08-08 14:21:09 +01:00
dirkf
e679476ea0 [aes] Linter 2022-08-08 08:26:44 +01:00
dirkf
28194bd327 [extractor/minds]
Linter
2022-08-08 08:17:22 +01:00
dirkf
efa723edc6 [test] Identify testcase errors better 2022-08-08 08:09:55 +01:00
Wes
adb5294177 [aenetworks] Update _THEPLATFORM_KEY and _THEPLATFORM_SECRET (#29749)
Fixes ytdl-org/youtube-dl#29300
2022-07-30 02:10:00 +01:00
Kyraminol Endyeran
5f5c127ece [VVVVID] Support video/dash types (#31060)
Resolves #31030.
2022-07-12 00:35:40 +01:00
dirkf
090acd58c1 [options] Improve be35e53 (--match-/reject-title parameter value)
Resolves #31064.
2022-07-03 20:05:21 +01:00
dirkf
a03b9775d5 [Mediaset] Support player version number in URL pattern
Ref: https://github.com/yt-dlp/yt-dlp/issues/4141
2022-06-26 14:24:06 +01:00
dirkf
8a158a936c [NHK] Use new API URL 2022-06-15 18:28:19 +01:00
dirkf
11665dd236 [test] Fix linter for 3aa94d7945 2022-06-15 18:28:19 +01:00
dirkf
cc179df346 [XHamster] Support xhday.com alias, extract uploader_id
* support xhday.com alias for xhamster.com (resolves #31023)
  Authored by: dirkf
* extract `uploader_id`:
  from 908b56eaf7
  (PR https://github.com/yt-dlp/yt-dlp/pull/844)
  Authored by: octotherp
2022-06-12 14:10:38 +01:00
pukkandan
0700fde640 [utils, etc] Kill child processes when yt-dl is killed
* derived from PR #26592, closes #26592

Authored by: Unrud
2022-06-10 19:57:46 +01:00
dirkf
811c480f7b [YouTube] Support JSON3 subtitle format
* subtitle tests updated to match
2022-06-09 15:25:23 +01:00
dirkf
3aa94d7945 [test] Fix workable subtitle tests (except YT) and mark others as skip, broken
* broken tests need to be fixed when fixing the respective IE
2022-06-08 23:11:33 +01:00
dirkf
ef044be34b [test] Skip not _WORKING IE in subtitle tests; use unittest.skipTest throughout 2022-06-08 15:52:21 +01:00
dirkf
530f4582d0 [HRFernsehen] Back-port new extractor from yt-dlp
Closes #26445, where this was originally proposed.
2022-06-06 19:29:48 +01:00
pukkandan
1baa0f5f66 [utils] Escape URL while sanitizing
Closes #31008, #yt-dlp/263

While this fixes the issue in question, it does not try to address the root-cause of the problem
Refer: 915f911e365736227e134ad654601443dbfd7ccb, f5fa042c82300218a2d07b95dd6b9c0756745db3
2022-06-06 16:03:04 +01:00
LewdyCoder
9aa8e5340f [Readme] Clarified extractor naming (#29799)
* Exported usable extractors must be named `xxxxIE`

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-05-30 02:50:50 +01:00
dirkf
04fd3289d3 [YouPorn] Improve upload_date extraction
See https://github.com/yt-dlp/yt-dlp/issues/2701#issuecomment-1034341883
2022-05-28 13:54:32 +01:00
dirkf
52c3751df7 [utils] Enable ALPN in HTTPS to satisfy broken servers
See https://github.com/yt-dlp/yt-dlp/issues/3878
2022-05-28 13:52:51 +01:00
dirkf
187a48aee2 [YouTube] Handle player c5a4daa1 with indirect n-function definition
* resolves #30976
2022-05-24 15:43:56 +01:00
Jacob Chapman
be35e5343a Update options.py 2022-05-20 05:25:54 +01:00
dirkf
c3deca86ae [wat.tv] Add version pver to metadata API call
Resolves #30959.
2022-05-19 17:41:48 +00:00
dirkf
c7965b9fc2 [NHK] Support alphabetic characters in 7-char NhkVod IDs (#29682) 2022-05-09 18:54:41 +01:00
dirkf
e988fa4523 [doc] Clarify test naming 2022-04-29 16:56:00 +01:00
dirkf
e27d8d819f [streamcz] Remove empty '{}'.format() for Py2.6
Use `'-join()'` here, or `{0}`, ..., in general.
2022-04-29 13:36:02 +01:00
Árni Dagur
ebc627847c [KTH] Add new extractor for KTH play (#30885)
* Implement extractor for KTH play
* Make KTH Play url regex more relaxed
2022-04-28 10:18:10 +01:00
dirkf
a0068bd6be [Youtube] Fix "n" descrambling for player fae06c11
Resolves #30856.
2022-04-15 16:07:09 +01:00
dirkf
b764dbe773 Disable blank issues 2022-04-10 05:49:09 +01:00
nixxo
871645a4a4 [RAI] Fix extraction of http formats
From https://github.com/yt-dlp/yt-dlp/pull/3272
Closes https://github.com/yt-dlp/yt-dlp/issues/3270
Authored by: nixxo
2022-04-05 15:21:59 +01:00
nixxo
1f50a07771 [RAI] Extend formats with direct http mp4 link (PR #27990)
* initial support for creating direct mp4 link
* improved regexes and info extraction
* added "connection: close" to request headers
* updated to https://github.com/yt-dlp/yt-dlp/pull/208
2022-04-05 15:21:59 +01:00
nixxo
9e5ca66f16 [RAI] Added checks for DRM protected content (PR #27657)
reviewed by pukkandan (https://github.com/yt-dlp/yt-dlp/pull/150)
2022-04-05 15:21:59 +01:00
lihan7
17d295a1ec [extractor/bilibili] Fix path "/audio/auxxxxx" download return 403 2022-04-01 00:46:34 +01:00
dirkf
49c5293014 Ignore --external-downloader-args if --external-downloader was rejected
... and generate warning
2022-03-25 14:47:26 +00:00
df
6508688e88 Make default upload_/release_date a compat_str
Ensures download tests pass in Python 2 as well as 3; also
add YoutubeDL tests for timestamp -> upload_date etc.
2022-02-26 10:29:42 +00:00
dirkf
4194d253c0 Avoid skipping ID when unlisted_hash is numeric
Pattern needed a non-greedy match; also replaced a redundant test with one for this, issue 29690
2022-02-26 10:29:42 +00:00
dirkf
f8e543c906 [Alsace20TV] Add new extractors Alsace20TVIE, Alsace20TVEmbedIE 2022-02-24 18:43:47 +00:00
dirkf
c4d1738316 [CPAC] Add extractor for Canadian Parliament
CPACIE: single episode
CPACPlaylistIE: playlists and searches
2022-02-24 18:27:57 +00:00
dirkf
1f13ccfd7f Fixed groups() call on potentially empty regex search object (#30676)
* Fixed groups() call on potentially empty regex search object.
- https://github.com/ytdl-org/youtube-dl/issues/30521

* minimising lines changed

Co-authored-by: yayorbitgum <50963144+yayorbitgum@users.noreply.github.com>
2022-02-24 18:26:58 +00:00
marieell
923292ba64 [aliexpress] Fix test case 2022-02-24 13:44:52 +00:00
Lesmiscore (Naoya Ozaki)
782bfd26db [bigo] add support for bigo.tv (#30635)
* [bigo] add support for bigo.tv

* [bigo] prepend "Bigo says"

* title fallback

* add error for invalid json data
2022-02-24 13:34:32 +00:00
Vladimir Stavrinov
3472227074 [rutv] fix vbr for empty string value (#30623)
* [rutv] use str_to_int() (thx dirkf)
2022-02-14 17:54:31 +00:00
Petr Vaněk
bf23bc0489 add missing __future__ import unicode_literals 2022-02-14 07:07:05 +00:00
Petr Vaněk
85bf26c1d0 resolve problem with unpacking operator for <py3.5 2022-02-14 07:07:05 +00:00
Petr Vaněk
d8adca1b66 [streamcz] test fixes and one additional test 2022-02-14 07:07:05 +00:00
Petr Vaněk
d02064218b do not use f-strings 2022-02-14 07:07:05 +00:00
Petr Vaněk
b1297308fb avoid traverse_obj function 2022-02-14 07:07:05 +00:00
Petr Vaněk
8088ce036a revert: use _match_valid_url function 2022-02-14 07:07:05 +00:00
Petr Vaněk
29f7bfc4d7 [streamcz] cherry-pick from yt-dlp
Cherry-picked-from: 7d449fff5346 ("[streamcz] Fix extractor (#1616)")
2022-02-14 07:07:05 +00:00
dirkf
74f8cc48af [extractor/videa] Back-port from yt-dlp PRs 463+1028
Authored by: nyuszika7h
2022-02-11 12:43:26 +00:00
kikuyan
8ff961d10f [extractor/videa] fix extraction in Py2
Fixes #30416
2022-02-11 12:43:26 +00:00
dirkf
266b6ef185 [BBC] Also allow PID with leading 'l' (live?) 2022-02-09 21:21:59 +00:00
dirkf
825d3426c5 [Nuvid] Use site JSON for video details (#29332)
Back-port yt-dlp PR 1022 onto PR #17890 and update

Video details aren't in the original HTML now but populated by async JS

Co-authored by: u-spec-png
Co-authored by: vidaritos
2022-02-09 02:40:34 +00:00
dirkf
47b0c8697a [ARD] Back-port subtitle extraction from yt-dlp PR 2409
Authored by: fstirlitz
Fixes #30543
Closes #17766 (thanks ngdio)
2022-02-07 13:47:38 +00:00
Seonghyeon Cho
734dfbb4e3 Remove redundant assigning format_id 2022-02-05 03:04:35 +00:00
df
ddc080a562 Add ArteTVCategoryIE to support category playlists 2022-02-05 03:02:56 +00:00
Abdullah Ibn Fulan
16a3fe2ba6 Updated Album URL regex
Mistakenly forgot to edit a line in last commit.

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-02-05 02:53:23 +00:00
Abdullah Ibn Fulan
c820a284a2 [extractor/audiomack] Updated URL regex, corrected invalid testcases, fixed bug
Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-02-05 02:53:23 +00:00
dirkf
58babe9af7 Support __INITIAL_DATA__ with stringified JSON
Add test and fix test for bbcthreeConfig
2022-02-05 02:51:46 +00:00
df
6d4932f023 Try for timestamp, description from window.__INITIAL_DATA__ pages 2022-02-05 02:51:46 +00:00
dirkf
92d73ef393 [niconico] Implement heartbeat for download 2022-02-05 02:47:21 +00:00
dirkf
91278f4b6b [niconico] Back-port extractor from yt-dlp
Add Nico search extractors, fix extraction
2022-02-05 02:47:21 +00:00
dirkf
73e1ab6125 [test:download] Only extract enough videos for playlist_mincount 2022-02-05 02:47:21 +00:00
dirkf
584715a803 [applepodcasts] Extract default thumbnail image 2022-02-05 02:32:45 +00:00
dirkf
e00b0eab1e [applepodcasts] Improve format extraction
Set acodec and vcodec, etc, to avoid breaking, eg, bestaudio
2022-02-05 02:32:45 +00:00
dirkf
005339d637 [applepodcasts] Support new AMP-ish page structure 2022-02-05 02:32:45 +00:00
Chris Rose
23ad6402a6 xvideos: Fix for #30271 2022-02-05 02:24:51 +00:00
dirkf
9642344965 Fix tests for working IEs; disable obsolete WDRMobile 2022-02-05 02:22:45 +00:00
dirkf
568c7005d5 Fix WDRMaus; extend URL matching for other Maus pages; improve ID extraction 2022-02-05 02:22:45 +00:00
dirkf
5cb4833f40 Update URPlayIE extractor for Next.js page format, with subtitles 2022-02-05 02:16:53 +00:00
dirkf
5197336de6 Support more deeply nested ptmd_path with test, update tests 2022-02-05 02:14:35 +00:00
dirkf
01824d275b Additional tweaks: allow any .ndr.de, simplify quote match 2022-02-05 02:12:44 +00:00
dirkf
39a98b09a2 Fix NDR, NJoy tests 2022-02-05 02:12:44 +00:00
dirkf
f0a05a55c2 NJoy: improve extraction of NDR id, description, etc with current page formats 2022-02-05 02:12:44 +00:00
dirkf
4186e81777 NDR: improve extraction of NDR id, description, etc with current page formats 2022-02-05 02:12:44 +00:00
dirkf
b494824286 Support Tele5 pages with Discovery Networks format instead of JWPlatform 2022-02-05 02:08:11 +00:00
dirkf
8248133e5e Back-port yt-dlp Viki extractor
From https://github.com/yt-dlp/yt-dlp/pull/2540
2022-02-04 15:49:12 +00:00
dirkf
27dbf6f0ab Return the item itself if playlist has one entry
Removes playlist spam from log
2022-02-04 14:28:50 +00:00
dirkf
61d791726f Find TV2DK Kaltura ID in Nuxt.js page format 2022-02-04 14:28:50 +00:00
pukkandan
0c0876f790 [youtube:search] Add tests 2022-02-04 11:09:18 +00:00
dirkf
7a497f1405 Rework 2c2c2bd with an actual Mix page and realistic playlist size
From 2c2c2bd348 (commitcomment-65953545)
2022-02-04 04:09:23 +00:00
dirkf
5add3f4373 Merge branch 'pukkandan-yt-searchurl' into yt-dl-master
Closes #27749
2022-02-04 03:50:32 +00:00
pukkandan
78ce962f4f [youtube] Support channel search
Code from cd684175ad
2022-02-03 01:02:58 +00:00
dirkf
41f0043983 Avoid crashing if n-sig decode fails 2022-02-02 14:25:03 +00:00
dirkf
34c06b16f5 Support Youtube Shorts URL format 2022-02-01 14:40:20 +00:00
dirkf
1e677567cd [YouTube] Fix n-sig for player e06dea74 (#30582)
From yt-dl commit 48416bc
2022-02-01 14:39:03 +00:00
df
af9e72507e Implement n-param descrambling using JSInterp
Fixes #29326, closes #29790, closes #30004, closes #30024, closes #30052,
closes #30088, closes #30097, closes #30102, closes #30109, closes #30119,
closes #30125, closes #30128, closes #30162, closes #30173, closes #30186,
closes #30192, closes #30221, closes #30239, closes #30539, closes #30552.
2022-01-31 00:19:58 +00:00
dirkf
6ca7b77696 Refactor JSInterpreter._separate
yt-dlp/yt-dlp/@06dfe0a, improve _MATCHING_PARENS
2022-01-30 00:05:54 +00:00
dirkf
9d142109f4 Back-port test_youtube_signature.py from yt-dlp and fix JSInterp accordingly 2022-01-30 00:05:54 +00:00
dirkf
1ca673bd98 Fix splice to handle float
Needed for new youtube js player f1ca6900
Add 57dbe8077f (diff-729b57caa8d006426f6a8960c061f519a8b6658682284015e069745af52ffb07)
2022-01-30 00:05:54 +00:00
df
e1eae16b56 Handle default in switch better
Add a1fc7ca074
Thanks coletdjnz
2022-01-30 00:05:54 +00:00
df
96f87aaa3b Back-port JS interpreter upgrade from yt-dlp PR #1437 2022-01-30 00:05:54 +00:00
df
5f5de51a49 Add compat_map/filter and use the former 2022-01-30 00:05:36 +00:00
df
39ca35e765 Fix test_youtube_flat_playlist_extraction 2022-01-29 20:00:21 +00:00
df
d76d59d99d Remove obsolete non-working test_youtube_toptracks 2022-01-29 20:00:21 +00:00
df
2c2c2bd348 Fix test_youtube_mix 2022-01-29 20:00:21 +00:00
df
46e0a729b2 Remove obsolete test_youtube_course 2022-01-29 20:00:21 +00:00
df
57044eaceb Fix test_youtube_playlist_noplaylist 2022-01-29 20:00:21 +00:00
pukkandan
a3373da70c Merge branch 'UP/youtube-dl' into dl/YoutubeSearchURLIE 2022-01-30 01:07:28 +05:30
pukkandan
2c4cb134a9 Fix max_results 2022-01-30 00:54:22 +05:30
pukkandan
bfe72723d8 Use itertools.islice 2022-01-30 00:49:55 +05:30
pukkandan
ed99d68bdd Add back YoutubeSearchURLIE 2022-01-30 00:41:47 +05:30
Sergey M․
5014bd67c2 release 2021.12.17 2021-12-17 01:49:07 +07:00
Sergey M․
e418823350 [ChangeLog] Actualize
[ci skip]
2021-12-17 01:43:16 +07:00
lanegramling
b5242da7d2 [youtube] Update signature function patterns (closes #30363) (#30366) 2021-12-17 01:42:17 +07:00
bopol
a803582717 [peertube] only call description endpoint if necessary (#29383) 2021-07-01 06:53:22 +00:00
Remita Amine
7fb9564420 [periscope] pass referer to HLS requests(closes #29419) 2021-06-28 20:08:39 +01:00
Aleri Kaisattera
379f52a495 [liveleak] Remove extractor (closes #17625, closes #24222) (#29331) 2021-06-21 04:23:50 +07:00
Sergey M․
cb668eb973 [pornhub] Add support for pornhubthbh7ap3u.onion 2021-06-21 04:08:15 +07:00
Sergey M․
751c9ae39a [pornhub] Detect geo restriction 2021-06-21 03:33:43 +07:00
Sergey M․
da32828208 [pornhub] Dismiss tbr extracted from download URLs (closes #28927)
No longer reliable
2021-06-21 03:22:37 +07:00
Sergey M․
2ccee8db74 [curiositystream:collection] Extend _VALID_URL (closes #26326, closes #29117) 2021-06-21 01:54:52 +07:00
Sergey M․
47f2f2fbe9 [youtube] Make get_video_info processing more robust (closes #29333) 2021-06-21 01:35:21 +07:00
Sergey M․
03ab02730f [youtube] Workaround for get_video_info request (refs #29333)
See https://github.com/ytdl-org/youtube-dl/issues/29333#issuecomment-864049544
2021-06-21 01:34:27 +07:00
Tianyi Shi
4c77a2e538 [bilibili] Strip uploader name (#29202) 2021-06-21 01:03:21 +07:00
bopol
4131703001 [youtube] Update invidious instance list (#29281) 2021-06-21 00:42:09 +07:00
Logan B
cc21aebe90 [umg:de] Update GraphQL API URL (#29304)
Previous one no longer resolves

Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-21 00:41:14 +07:00
Sergey M․
57b9a4b4c6 [nrk] Switch psapi URL to https (closes #29344)
Catalog calls no longer work via http
2021-06-21 00:36:28 +07:00
kikuyan
3a7ef27cf3 [postprocessor/ffmpeg] Show ffmpeg output on error (refs #22680) (#29336) 2021-06-20 23:58:19 +07:00
kikuyan
a7f61feab2 [egghead] Add support for app.egghead.io (closes #28404) (#29303)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-17 10:34:33 +07:00
kikuyan
8fe5d54eb7 [appleconnect] Fix extraction (#29208) 2021-06-17 04:12:13 +07:00
kikuyan
d156bc8d59 [orf:tvthek] Add support for MPD formats (closes #28672) (#29236) 2021-06-17 04:02:06 +07:00
Sergey M
c2350cac24 [README.md] Update MSVC 2010 redist URL (closes #29222) 2021-06-06 05:32:27 +07:00
Sergey M․
b224cf39d5 release 2021.06.06 2021-06-06 01:38:22 +07:00
Sergey M․
5f85eb820c [ChangeLog] Actualize
[ci skip]
2021-06-06 01:32:15 +07:00
Sergey M․
bb7ac1ed66 [facebook] Improve login required detection 2021-06-06 01:16:43 +07:00
Sergey M․
fdf91c52a8 [youporn] Fix formats and view count extraction (closes #29216) 2021-06-06 00:11:09 +07:00
Sergey M․
943070af4a [orf:tvthek] Fix thumbnails extraction (closes #29217) 2021-06-05 23:42:25 +07:00
Remita Amine
82f3993ba3 [formula1] fix extraction(closes #29206) 2021-06-04 17:51:44 +01:00
Sergey M․
d495292852 [ard] Relax _VALID_URL and fix video ids (closes #22724, closes #29091) 2021-05-30 06:14:59 +07:00
Sergey M․
2ee6c7f110 [ustream] Detect https embeds (closes #29133) 2021-05-30 03:43:59 +07:00
Sergey M․
6511b8e8d7 [ted] Prefer own formats over external sources (closes #29142) 2021-05-30 03:05:22 +07:00
Sergey M․
f3cd1d9cec [twitch:clips] Improve extraction (closes #29149) 2021-05-30 01:49:51 +07:00
phlip
e13a01061d [twitch:clips] Add access token query to download URLs (closes #29136) 2021-05-30 01:47:33 +07:00
Sergey M․
24297a42ef [youtube] Fix get_video_info request (closes #29086, closes #29165) 2021-05-30 00:36:26 +07:00
Remita Amine
1980ff4550 [vimeo] fix vimeo pro embed extraction(closes #29126) 2021-05-26 11:04:39 +01:00
Remita Amine
dfbbe2902f [redbulltv] fix embed data extraction(closes #28770) 2021-05-17 12:56:49 +01:00
Remita Amine
e1a9d0ef78 [shahid] relax _VALID_URL(closes #28772, closes #28930) 2021-05-17 12:37:39 +01:00
Sergey M․
f47627a1c9 release 2021.05.16 2021-05-16 22:55:05 +07:00
Sergey M․
efeb9e0fbf [ChangeLog] Actualize
[ci skip]
2021-05-16 22:40:39 +07:00
Sergey M․
e90a890f01 [playstuff] Add extractor (closes #28901, closes #28931) 2021-05-16 22:31:37 +07:00
Sergey M․
199c645bee [eroprofile] Skip test 2021-05-16 22:01:51 +07:00
Sergey M․
503a3744ad [eroprofile] Fix extraction (closes #23200, closes #23626, closes #29008) 2021-05-16 21:57:21 +07:00
kr4ssi
ef03721f47 [vivo] Add support for vivo.st (#29009)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-16 21:46:32 +07:00
Sergey M․
1e8aaa1d15 [generic] Add support for og:audio (closes #28311, closes #29015) 2021-05-16 21:42:38 +07:00
Sergey M․
6423d7054e [options] Fix thumbnail option group name (closes #29042) 2021-05-16 21:34:10 +07:00
Sergey M․
eb5080286a [phoenix] Fix extraction (closes #29057) 2021-05-16 21:21:14 +07:00
Sergey M․
286e01ce30 [generic] Add support for sibnet embeds 2021-05-16 20:50:32 +07:00
Sergey M․
8536dcafd8 [vk] Add support for sibnet embeds (closes #9500) 2021-05-16 20:48:24 +07:00
Sergey M․
552b139911 [generic] Add Referer header for direct videojs download URLs (closes #2879, closes #20217, closes #29053) 2021-05-16 20:29:35 +07:00
Lukas Anzinger
2202cef0e4 [orf:radio] Switch download URLs to HTTPS (closes #29012) (#29046) 2021-05-16 19:54:15 +07:00
Sergey M․
a726009987 [blinkx] Remove extractor (closes #28941)
No longer exists.
2021-05-05 04:12:35 +07:00
catboy
03afef7538 [medaltv] Relax _VALID_URL (#28884)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-05 03:44:07 +07:00
Jacob Chapman
b797c1cc75 [YoutubeDL] Improve extract_info doc (#28946)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-05 03:31:24 +07:00
Sergey M․
04be55307a [funimation] Add support for optional lang code in URLs (closes #28950) 2021-05-05 02:54:12 +07:00
Sergey M․
504e4d804d [gdcvault] Add support for HTML5 videos 2021-05-05 02:44:29 +07:00
Sergey M․
1786cd3fe4 [dispeak] DRY and update tests (closes #28970) 2021-05-05 02:30:42 +07:00
Ben Rog-Wilhelm
b8645c1f58 [dispeak] Improve FLV extraction (closes #13513) 2021-05-05 02:24:55 +07:00
Ben Rog-Wilhelm
fe05191b8c [kaltura] Improve iframe extraction (#28969)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-05 02:14:35 +07:00
Sergey M․
0204838163 [kaltura] Make embed code alternatives actually work 2021-05-05 02:01:22 +07:00
Sergey M․
a0df8a0617 [cda] Improve extraction (closes #28709, closes #28937) 2021-05-01 22:53:30 +07:00
Sergey M․
d1b9a5e2ef [twitter] Improve formats extraction from vmap URL (closes #28909) 2021-05-01 19:00:39 +07:00
Sergey M․
ff04d43c46 [xtube] Fix formats extraction (closes #28870) 2021-05-01 18:33:05 +07:00
Sergey M․
d2f72c40db [svtplay] Improve extraction (closes #28507, closes #28876) 2021-05-01 18:09:32 +07:00
Sergey M․
e33dfb445c [tv2dk] Fix extraction (closes #28888) 2021-05-01 17:53:27 +07:00
Sergey M․
94520568b3 [workflows/ci.yml] Update link to jython-installer 2021-04-26 02:16:47 +07:00
Sergey M․
273964d190 release 2021.04.26 2021-04-26 01:33:30 +07:00
Sergey M․
346dd3b5e8 [ChangeLog] Actualize
[ci skip]
2021-04-26 01:29:50 +07:00
schnusch
f5c2c06231 [xfileshare] Add support for wolfstream.tv (#28858) 2021-04-26 00:32:47 +07:00
Sergey M․
57eaaff5cf [francetvinfo] Improve video id extraction (closes #28792) 2021-04-25 22:52:28 +07:00
Sergey M․
999329cf6b [workflows/ci.yml] Fix install nose for Jython 2021-04-24 23:52:16 +07:00
catboy
c6ab792990 [medaltv] Fix extraction (#28807)
numeric clip ids are no longer used by medal, and integer user ids are now sent as strings.
2021-04-24 19:10:35 +07:00
The Hatsune Daishi
0db79d8181 [tver] Redirect all downloads to Brightcove (#28849) 2021-04-24 18:58:03 +07:00
Sergey M․
7e8b3f9439 [youtube] Remove unused code 2021-04-21 05:37:51 +07:00
Sergey M․
ac19c3ac80 [go] Improve video id extraction (closes #25207, closes #25216, closes #26058) 2021-04-21 05:35:39 +07:00
Sergey M․
c4a451bcdd [test_execution] Add test for lazy extractors (refs #28780) 2021-04-21 04:47:29 +07:00
Sergey M․
5ad69d3d0e [test_youtube_misc] Move YoutubeIE.extract_id test into separate module 2021-04-21 04:45:13 +07:00
Sergey M․
32290307a4 [youtube] Fix lazy extractors (closes #28780) 2021-04-21 03:56:04 +07:00
Sergey M․
dab83a2597 [bbc] Extract full description from __INITIAL_DATA__ (refs #28774) 2021-04-21 03:00:56 +07:00
dirkf
41920fc80e [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774) 2021-04-21 02:51:55 +07:00
Sergey M․
9f6c03a006 [cbsnews] Fix extraction for python <3.6 (closes #23359) 2021-04-17 05:05:31 +07:00
Sergey M․
596b26606c release 2021.04.17 2021-04-17 03:50:09 +07:00
Sergey M․
f20b505b46 [ChangeLog] Actualize
[ci skip]
2021-04-17 03:47:00 +07:00
Sergey M․
cfee2dfe83 [utils] PEP 8 2021-04-17 03:32:04 +07:00
Sergey M․
30a3a4c70f [lbry] Add support for HLS videos (closes #27877, closes #28768) 2021-04-17 03:23:47 +07:00
Sergey M․
a00a7e0cad [utils] Add support for support for experimental HTTP response status code 308 Permanent Redirect (refs #27877, refs #28768) 2021-04-17 03:22:13 +07:00
Sergey M․
54558e0baa [youtube] Improve stretch extraction and fix stretched ratio calculation (closes #28769) 2021-04-17 02:27:54 +07:00
Sergey M․
7c52395479 [youtube:tab] Improve grid extraction (closes #28725) 2021-04-17 01:13:10 +07:00
zraktvor
ea87ed8394 [youtube:tab] Detect series playlist on playlists page (closes #28723) 2021-04-17 01:13:10 +07:00
Cássio Ávila
d01e261a15 [youtube] Add more invidious instances (#28706) 2021-04-17 00:31:34 +07:00
quyleanh
79e4ccfc4b [pluralsight] Extend anti-throttling timeout (#28712) 2021-04-17 00:30:10 +07:00
Sergey M․
06159135ef [youtube] Improve URL to extractor routing (closes #27572, closes #28335, closes #28742) 2021-04-17 00:07:32 +07:00
Aaron Lipinski
4fb25ff5a3 [maoritv] Add new extractor(closes #24552) 2021-04-09 09:02:37 +01:00
Sergey M․
1b0a13f33c [youtube:tab] Pass innertube context and x-goog-visitor-id header along with continuation requests (closes #28702) 2021-04-09 02:10:34 +07:00
Remita Amine
27e5a4464d [mtv] Fix Viacom A/B Testing Video Player extraction(closes #28703) 2021-04-08 18:54:44 +01:00
Sergey M․
545d6cb9d0 [pornhub] Extract DASH and HLS formats from get_media end point (closes #28698) 2021-04-08 15:32:59 +07:00
Remita Amine
006eea564d [cbssports] fix extraction(closes #28682) 2021-04-07 14:01:48 +01:00
Remita Amine
281b8e3443 [jamendo] fix track extraction(closes #28686) 2021-04-07 10:41:06 +01:00
Remita Amine
c0c5134c57 [curiositystream] fix format extraction(closes #26845, closes #28668) 2021-04-07 09:27:05 +01:00
Sergey M․
72a2c0a9ed release 2021.04.07 2021-04-07 03:42:24 +07:00
Sergey M․
445db582a2 [ChangeLog] Actualize
[ci skip]
2021-04-07 03:39:02 +07:00
Sergey M․
6b116f0c03 [youtube] Fix videos with restricted location (closes #28685) 2021-04-07 03:34:43 +07:00
Sergey M․
70d0d4f9be [compat] Use more conventional name for compat SimpleCookie 2021-04-06 14:22:28 +07:00
Sergey M․
6b315d96bc [compat] flake8 2021-04-06 14:15:13 +07:00
guredora
25b1287323 [line] add support live.line.me (closes #17205)(closes #28658) 2021-04-05 10:11:01 +01:00
Remita Amine
760c911299 [compat] add compat_SimpleCookie to __all__ array 2021-04-05 07:16:50 +01:00
Remita Amine
162bf9e10a [compat] add compat_SimpleCookie 2021-04-04 19:49:24 +01:00
Remita Amine
6beb1ac65b [extractor/common] keep support for non standard JSON-LD VideoObject author values 2021-04-04 19:16:17 +01:00
Remita Amine
3ae9c0f410 [vimeo] improve extraction(closes #28591) 2021-04-04 16:28:26 +01:00
Remita Amine
e165f5641f [extractor/common] fix JSON-LD VideoObject author extraction 2021-04-04 16:28:26 +01:00
RomanEmelyanov
aee6feb02a [youku] Update ccode(closes #17852, closes #28447, closes #28460) (#28648) 2021-04-04 08:14:37 +00:00
Remita Amine
654b4f4ff2 [youtube] prioritize information from YoutubeIE for playlist entries(closes #28619, closes #28636) 2021-04-03 08:23:35 +01:00
Remita Amine
1df2596f81 [extractor/common] fix _get_cookies method for python 2(#20673, #23256, #20326, closes #28640) 2021-04-03 07:54:16 +01:00
Remita Amine
04d4a3b136 [screencastomatic] fix extraction(closes #11976, closes #24489) 2021-04-01 19:05:00 +01:00
Allan Daemon
392c467f95 [palcomp3] Add new extractor(closes #13120) 2021-04-01 17:10:38 +01:00
Vid
c5aa8f36bf [arnes] Add new extractor(closes #28483) 2021-04-01 13:59:12 +01:00
Remita Amine
3748863070 [youtube:tab] Add support for hashtag videos extraction(closes #28308) 2021-04-01 11:52:23 +01:00
Sergey M․
ca304beb15 release 2021.04.01 2021-04-01 04:47:11 +07:00
Sergey M․
e789bb1aa4 [ChangeLog] Actualize
[ci skip]
2021-04-01 04:43:54 +07:00
Sergey M․
14f29f087e [youtube] Setup CONSENT cookie when needed (closes #28604) 2021-04-01 04:05:10 +07:00
Remita Amine
b97fb2edac [vimeo] fix password protected review extraction(closes #27591) 2021-03-31 20:07:13 +01:00
Remita Amine
28bab774a0 [youtube] imporve age-restricted video extraction(#28578) 2021-03-30 21:45:08 +01:00
Sergey M․
8f493de9fb release 2021.03.31 2021-03-31 02:59:07 +07:00
Sergey M․
207bc35d34 [ChangeLog] Actualize
[ci skip]
2021-03-31 02:58:01 +07:00
Remita Amine
955894e72f [vlive] fix inkey request(closes #28589) 2021-03-30 10:01:06 +01:00
Sergey M․
287e50b56b [francetvinfo] Improve video id extraction (closes #28584) 2021-03-30 03:37:43 +07:00
Chris Hranj
da762c4e32 [instagram] Improve title extraction and extract duration (#28469)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-03-30 02:05:19 +07:00
Remita Amine
87a8bde777 [sbs] add support for ondemand watch URLs(closes #28566) 2021-03-28 08:46:33 +01:00
Remita Amine
49fc0a567f [youtube] fix video's channel extraction(closes #28562) 2021-03-27 19:11:41 +01:00
Remita Amine
cc777dcaa0 [picarto] fix live stream extraction(closes #28532) 2021-03-27 17:37:45 +01:00
Remita Amine
c785911870 [vimeo] fix unlisted video extraction(closes #28414) 2021-03-25 17:06:57 +01:00
Remita Amine
605e7b5e47 [youtube:tab] fix playlist/comunity continuation items extraction(closes #28266) 2021-03-25 12:53:18 +01:00
Remita Amine
8562218350 [ard] improve clip id extraction(#22724)(closes #28528) 2021-03-24 19:29:25 +01:00
Sergey M․
76da1c954a release 2021.03.25 2021-03-25 00:04:10 +07:00
Sergey M․
c2fbfb49da [ChangeLog] Actualize
[ci skip]
2021-03-25 00:03:00 +07:00
Roman Sebastian Karwacik
d1069d33b4 [zoom] Add new extractor(closes #16597, closes #27002, closes #28531) 2021-03-24 17:26:44 +01:00
The Hatsune Daishi
eafcadea26 [extractor] escape forgotten dot for hostnames in regular expression (#28530) 2021-03-24 14:33:19 +00:00
Remita Amine
a40002444e [bbc] fix BBC IPlayer Episodes/Group extraction(closes #28360) 2021-03-24 15:10:19 +01:00
Sergey M․
5208ae92fc [youtube] Fix default value for youtube_include_dash_manifest (closes #28523) 2021-03-24 02:57:35 +07:00
Remita Amine
8117d613ac [zingmp3] fix extraction(closes #11589, closes #16409, closes #16968, closes #27205) 2021-03-22 15:58:56 +01:00
Martin Ström
00b4d72d1e [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-03-22 20:56:58 +07:00
Remita Amine
21ccd0d7f4 [tiktok] detect private videos(closes #28453) 2021-03-21 09:10:38 +01:00
Sergey M․
7e79ba7dd6 [vimeo:album] Fix extraction for albums with number of videos multiple to page size (closes #28486) 2021-03-20 05:47:26 +07:00
Remita Amine
fa6bf0a711 [vvvvid] fix kenc format extraction(closes #28473) 2021-03-19 12:37:22 +01:00
Remita Amine
f912d6c8cf [mlb] fix video extracion(#21241) 2021-03-15 21:46:39 +01:00
Sergey M․
357bfe251d [svtplay] Improve extraction (closes #28448) 2021-03-15 20:42:20 +07:00
Remita Amine
3be098010f [applepodcasts] fix extraction(closes #28445) 2021-03-14 20:08:46 +01:00
Remita Amine
9955bb4a27 [rtve] improve extraction
- extract all formats
- fix RTVE Infantil extraction(closes #24851)
- extract is_live and series
2021-03-14 15:05:25 +01:00
Sergey M․
ebfd66c4b1 release 2021.03.14 2021-03-14 09:38:16 +07:00
Sergey M․
b509d24b2f [ChangeLog] Actualize
[ci skip]
2021-03-14 09:36:11 +07:00
Sergey M․
1860d0f41c [southpark] Fix extraction and add support for southparkstudios.com (closes #26763, closes #28413) 2021-03-14 09:26:54 +07:00
Remita Amine
60845121ca [sportdeutschland] fix extraction(closes #21856)(closes #28425) 2021-03-13 15:19:24 +01:00
Remita Amine
1182f9567b [pinterest] reduce the number of HLS format requests 2021-03-12 18:11:11 +01:00
Remita Amine
ef414343e5 [peertube] improve thumbnail extraction(closes #28419) 2021-03-12 10:48:58 +01:00
Remita Amine
43d986acd8 [tver] improve title extraction(closes #28418) 2021-03-12 10:14:28 +01:00
Remita Amine
9c644a6419 [fujitv] fix HLS formats extension(closes #28416) 2021-03-12 09:51:01 +01:00
Remita Amine
fc2c6d5323 [shahid] fix format extraction(closes #28383) 2021-03-10 13:16:21 +01:00
Remita Amine
64ed3af328 [lbry] add support for channel filters(closes #28385) 2021-03-10 11:45:30 +01:00
Sergey M․
bae7dbf78b [bandcamp] Extract release_timestamp 2021-03-10 03:41:21 +07:00
Sergey M․
15c24b0346 [lbry] Extract release_timestamp (closes #28386) 2021-03-10 03:40:56 +07:00
Sergey M․
477bff6906 Introduce release_timestamp meta field (refs #28386) 2021-03-10 03:36:31 +07:00
Sergey M․
1a1ccd9a6e [pornhub] Detect flagged videos 2021-03-10 02:56:01 +07:00
Sergey M․
7dc513487f [pornhub] Extract formats from get_media end point (#28395) 2021-03-10 02:54:10 +07:00
Remita Amine
c6a14755bb [bilibili] fix video info extraction(closes #28341) 2021-03-08 16:53:50 +01:00
Remita Amine
7f064d50db [cbs] add support for Paramount+ (closes #28342) 2021-03-07 08:32:37 +01:00
Remita Amine
b8b622fbeb [trovo] Add Origin header to VOD formats(closes #28346) 2021-03-04 17:57:16 +01:00
Remita Amine
ec64ec9651 [voxmedia] fix volume embed extraction(closes #28338) 2021-03-04 12:42:31 +01:00
Sergey M․
f68692b004 release 2021.03.03 2021-03-03 11:47:34 +07:00
Sergey M․
8c9766f4bf [ChangeLog] Actualize
[ci skip]
2021-03-03 11:44:49 +07:00
Sergey M․
061c030133 [youtube:tab] Switch continuation to browse API (closes #28289, closes #28327)
Until further investigation.
2021-03-03 11:42:59 +07:00
Remita Amine
8f56907afa [9c9media] fix extraction for videos with multiple ContentPackages(closes #28309) 2021-03-02 12:04:31 +01:00
Remita Amine
e1adb3ed4f [bbc] correct catched exception type 2021-03-02 11:21:49 +01:00
dirkf
e465b25c1f [bbc] add support for BBC Reel videos(closes #21870, closes #23660, closes #28268) 2021-03-02 10:49:20 +01:00
Sergey M․
7c06216abf release 2021.03.02 2021-03-02 06:19:42 +07:00
Sergey M․
0002888627 [ChangeLog] Actualize
[ci skip]
2021-03-02 06:16:41 +07:00
Sergey M․
3fb14cd214 [zdf] Rework extractors (closes #11606, closes #13473, closes #17354, closes #21185, closes #26711, closes #27068, closes #27930, closes #28198, closes #28199, closes #28274)
* Generalize unique video ids for zdf based extractors
* Improve extraction
* Fix 3sat and phoenix
2021-03-02 06:07:30 +07:00
Remita Amine
bee6182680 [stretchinternet] Fix extraction(closes #28297) 2021-03-01 14:00:03 +01:00
Remita Amine
38fe5e239a [urplay] fix episode data extraction(closes #28292) 2021-02-28 12:31:18 +01:00
Remita Amine
678d46f6bb [bandaichannel] Add new extractor(closes #21404) 2021-02-28 10:42:41 +01:00
Alexander Seiler
3c58f9e0b9 [srgssr] improve extraction
- extract subtitle
- fix extraction for new videos
- update srf download domains

closes #14717
closes #14725
closes #27231
closes #28238
2021-02-25 15:50:49 +01:00
Remita Amine
ef28e33249 [vvvvid] reduce season request payload size 2021-02-24 22:29:35 +01:00
nixxo
9662e4964b [vvvvid] extract series sublists playlist_title (#27601) (#27618) 2021-02-24 22:17:29 +01:00
Remita Amine
44603290e5 [dplay] Extract Ad-Free uplynk URLs(#28160) 2021-02-24 18:34:28 +01:00
Remita Amine
1631fca1ee [wat] detect DRM protected videos(closes #27958) 2021-02-23 13:50:18 +01:00
Remita Amine
295860ff00 [tf1] improve extraction(closes #27980)(closes #28040) 2021-02-23 12:41:32 +01:00
Sergey M․
8cb4b71909 [tmz] Fix and improve extraction (closes #24603, closes #24687, closes #28211) 2021-02-23 18:37:06 +07:00
Remita Amine
d81421af4b [gedidigital] improve asset id matching 2021-02-22 23:02:15 +01:00
nixxo
7422a2194f [gedidigital] Add new extractor(closes #7347)(closes #26946) 2021-02-22 20:42:14 +01:00
Remita Amine
2090dbdc8c [youtube] fix get_video_info request 2021-02-21 23:09:09 +01:00
Sergey M․
0a04e03a02 release 2021.02.22 2021-02-22 02:42:16 +07:00
Sergey M․
44b2d5f5fc [ChangeLog] Actualize
[ci skip]
2021-02-22 02:40:00 +07:00
Sergey M․
aa9118a373 [apa] Improve extraction (closes #27750) 2021-02-22 02:29:50 +07:00
Adrian Heine
36abc16c3c [apa] Fix extraction 2021-02-22 02:28:28 +07:00
Sergey M․
919d764600 [youporn] Skip test 2021-02-21 23:21:38 +07:00
piplongrun
696183e133 [youporn] Extract duration (#28019)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-02-21 23:19:37 +07:00
SirCipherz
f90d825a6b [peertube] Add support for canard.tube (#28190) 2021-02-21 23:05:33 +07:00
Remita Amine
3037ab00c7 [youtube] fixup m4a_dash formats(closes #28165) 2021-02-21 10:31:27 +01:00
Isaac-the-Man
21e872b19a [samplefocus] Add new extractor(closes #27763) 2021-02-20 10:55:19 +01:00
Remita Amine
cf2dbec630 [vimeo] add support for unlisted video source format extraction 2021-02-19 21:13:56 +01:00
Remita Amine
b92bb0e02a [viki] improve extraction(closes #26522)(closes #28203)
- extract uploader_url and episode_number
- report login required error
- extract 480p formats
- fix API v4 calls
2021-02-19 16:00:22 +01:00
Remita Amine
40edffae3d [ninegag] unscape title(#28201) 2021-02-19 11:55:40 +01:00
Sergey M․
9fc5eafb8e [youtube] Improve _VALID_URL (refs #28193) 2021-02-18 04:59:56 +07:00
bopol
08c2fbb844 [youtube] Add support for redirect.invidious.io (#28193)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-02-18 04:29:32 +07:00
Remita Amine
3997efb65e [dplay] add support for de.hgtv.com (closes #28182) 2021-02-17 19:50:04 +01:00
Remita Amine
a7356dffe9 [dplay] Add support for discoveryplus.com (closes #24698) 2021-02-17 18:33:33 +01:00
dmsummers
e20ec43094 [simplecast] Add new extractor(closes #24107) 2021-02-17 14:53:23 +01:00
PrinceOfPuppers
70baa7bfae [test_youtube_lists] Actualize youtube flat playlist test (closes #28045) 2021-02-17 04:58:54 +07:00
PrinceOfPuppers
8980f53b42 [youtube] Fix uploader extraction in flat playlist mode (#28045) 2021-02-17 04:21:33 +07:00
Sergey M․
a363fb5d28 [yandexmusic:playlist] Request missing tracks in chunks (closes #27355, closes #28184) 2021-02-17 04:03:54 +07:00
Max
646052e416 [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowercase (#28112) 2021-02-17 03:22:51 +07:00
Stephen Stair
844e4cbc54 [storyfire] Add new extractor(closes #25628)(closes #26349) 2021-02-16 21:14:43 +01:00
Remita Amine
56c63c8c02 [zhihu] Add new extractor(closes #28177) 2021-02-16 10:08:43 +01:00
Sergey M․
07eb8f1916 [youtube] Fix controversial videos when authenticated with cookies (closes #28174) 2021-02-16 05:57:53 +07:00
Remita Amine
4b5410c5c8 [ccma] fix timestamp parsing in python 2 2021-02-15 13:06:54 +01:00
Remita Amine
be2e9b76ee [videopress] add support for video.wordpress.com 2021-02-14 22:10:06 +01:00
Remita Amine
d8085580f6 [kakao] improve info extraction and detect geo restriction(closes #26577) 2021-02-14 19:48:26 +01:00
Remita Amine
6d32c6c6d3 [xboxclips] fix extraction(closes #27151) 2021-02-14 16:22:45 +01:00
Sergey M․
f94d764993 [ard] Improve formats extraction (closes #28155) 2021-02-14 05:03:15 +07:00
Kevin Velghe
f28f1b4d6e [canvas] Add new extractor for Dagelijkse Kost (#28119) 2021-02-11 08:04:16 +00:00
Sergey M․
360d5f0daa release 2021.02.10 2021-02-10 22:34:47 +07:00
Sergey M․
cd493c5adc [ChangeLog] Actualize
[ci skip]
2021-02-10 22:32:25 +07:00
Sergey M․
a4c7ed6b1e [youtube:tab] Improve grid continuation extraction (closes #28130) 2021-02-10 22:28:58 +07:00
Remita Amine
7f8b8bc418 [ign] fix extraction(closes #24771) 2021-02-08 15:58:20 +01:00
Sergey M․
311ebdd9a5 [xhamster] Extract formats from xplayer settings and extract filesizes (closes #28114) 2021-02-08 15:47:12 +07:00
Remita Amine
99c68db0a8 [youtube] add support phone/tablet JS player(closes #26424) 2021-02-08 09:20:28 +01:00
Sergey M․
5fc53690cb [archiveorg] Fix and improve extraction (closes #21330, closes #23586, closes #25277, closes #26780, closes #27109, closes #27236, closes #28063) 2021-02-07 20:34:41 +07:00
Sergey M․
7a9161578e [cda] Detect geo restricted videos (refs #28106) 2021-02-07 19:18:40 +07:00
Adrian Heine né Lang
2405854705 [urplay] Fix extraction (closes #28073) (#28074) 2021-02-07 02:46:05 +07:00
Sergey M․
0cf09c2b41 [youtube] Fix release date extraction (closes #28094) 2021-02-07 02:17:03 +07:00
Sergey M․
0156ce95c5 [youtube] Extract abr and vbr (closes #28100) 2021-02-07 02:03:47 +07:00
Remita Amine
1641b13232 [youtube] skip OTF formats(#28070) 2021-02-04 13:05:35 +01:00
Sergey M․
a4bdc3112b release 2021.02.04.1 2021-02-04 13:11:33 +07:00
Sergey M․
c7d407bca2 [ChangeLog] Actualize
[ci skip]
2021-02-04 13:09:28 +07:00
Sergey M․
7215691ab7 [youtube] Prefer DASH formats (closes #28070) 2021-02-04 13:07:43 +07:00
Adrian Heine né Lang
fc88e8f0e3 [azmedien] Fix extraction (#28064) 2021-02-03 23:57:56 +00:00
Sergey M․
cfefb7d854 release 2021.02.04 2021-02-04 04:49:25 +07:00
Sergey M․
3c07d007ca [ChangeLog] Actualize
[ci skip]
2021-02-04 04:47:30 +07:00
Sergey M․
89c5a7d5aa [pornhub] Implement lazy playlist extraction 2021-02-04 04:42:14 +07:00
Sergey M․
2adc0c51cd [pornhub] Add placeholder netrc machine 2021-02-04 04:20:09 +07:00
Sergey M․
1f0910bc27 [svtplay] Fix video id extraction (closes #28058) 2021-02-04 04:17:45 +07:00
Sergey M․
e22ff4e356 [pornhub] Add support for authentication (closes #18797, closes #21416, closes #24294) 2021-02-04 04:09:11 +07:00
Sergey M․
83031d749b [pornhub:user] Add support for URLs unavailable via /videos page and improve paging (closes #27853) 2021-02-04 00:25:53 +07:00
Remita Amine
1b731ebcaa [bravotv] add support for oxygen.com(closes #13357)(closes #22500) 2021-02-03 18:13:17 +01:00
Remita Amine
ab25f3f431 [youtube] pass embed URL to get_video_info request 2021-02-03 17:15:31 +01:00
Guillem Vela
07f7aad81c [ccma] improve metadata extraction(closes #27994)
- extract age_limit, alt_title, categories, series and episode_number
- fix timestamp multiple subtitles extraction
2021-02-03 09:19:54 +01:00
Remita Amine
1e2575df87 Credit @adrianheine for #27732 2021-02-03 00:21:46 +01:00
Remita Amine
b111a64135 [egghead] fix typo 2021-02-02 19:05:37 +01:00
Viren Rajput
0e3a968479 [egghead] update API domain(closes #28038) 2021-02-02 19:00:36 +01:00
Remita Amine
c11f7cf9bd [vidzi] remove extractor(closes #12629) 2021-02-01 22:35:28 +01:00
Remita Amine
8fa7cc387d [vidio] improve metadata extraction 2021-02-01 21:35:18 +01:00
Remita Amine
65eee5a745 [youtube] improve subtitle extraction 2021-02-01 18:12:35 +01:00
Remita Amine
efef4ddf51 [youtube] fix chapter extraction fallback 2021-02-01 16:49:52 +01:00
Remita Amine
159a3d48df [youtube] keep _formats array for format sorting tests 2021-02-01 16:36:19 +01:00
Remita Amine
b46483a6ec [youtube/test_youtube_signature] fix test 2021-02-01 16:35:07 +01:00
Remita Amine
9c724601ba [youtube] remove description chapters tests
video description no longer contain yt.www.watch.player.seekTo
function
2021-02-01 16:11:07 +01:00
Remita Amine
67299f23d8 [youtube] Rewrite Extractor
- improve format sorting
- remove unused code(swf parsing, ...)
- fix series metadata extraction
- fix trailer video extraction
- improve error reporting
- extract video location
2021-02-01 14:53:01 +01:00
Adrian Heine né Lang
8bf9591a70 [AENetworks] update AENetworksShowIE test playlist id (#27851) 2021-01-30 13:44:01 +01:00
nixxo
a800838f5a [vvvvid] add support for youtube embeds (#27825) 2021-01-29 14:24:28 +01:00
Remita Amine
ba15b2fee6 [googledrive] report download page errors(closes #28005) 2021-01-29 11:53:49 +01:00
ping
56a7ee9033 [vlive] Fix error message decoding for python 2 (#28004) 2021-01-29 08:02:18 +00:00
Remita Amine
0b4f03a563 [youtube] improve DASH formats file size extraction 2021-01-28 16:24:46 +01:00
knapior
7b8fa658f8 [cda] Improve birth validation detection (closes #14022) (#27929)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-01-28 02:43:20 +07:00
Adrian Heine né Lang
fd95fc33b1 [awaan] Extract uploader id (#27963) 2021-01-28 02:06:12 +07:00
Remita Amine
c669554ef5 [medialaan] add support DPG Media MyChannels based websites
closes #14871
closes #15597
closes #16106
closes #16489
2021-01-27 14:58:47 +01:00
Remita Amine
11b68df7a4 [abcnews] fix extraction(closes #12394)(closes #27920) 2021-01-27 12:28:22 +01:00
Adrian Heine né Lang
d18f4419a7 [AMP] Fix upload_date and timestamp extraction (#27970) 2021-01-26 22:43:11 +01:00
Remita Amine
0f7d413d5b [tv4] relax _VALID_URL(closes #27964) 2021-01-25 15:15:45 +01:00
Remita Amine
286e5d6724 [tv2] improve MTV Uutiset Article extraction 2021-01-25 14:46:04 +01:00
tpikonen
395981288b [tv2] Add support for mtvuutiset.fi (#27744) 2021-01-25 14:43:41 +01:00
Remita Amine
55bb3556c8 [adn] improve login warning reporting 2021-01-25 09:28:24 +01:00
Remita Amine
57f2488bbe [zype] fix uplynk id extraction(closes #27956) 2021-01-25 09:26:22 +01:00
Adrian Heine né Lang
ea399a53eb [ADN] Implement login (#27937)
closes #17091
closes #27841
2021-01-25 08:25:57 +00:00
Sergey M․
811a183eb6 [ChangeLog] Actualize
[ci skip]
2021-01-24 18:15:32 +07:00
Sergey M․
b63981e850 release 2021.01.24.1 2021-01-24 18:11:20 +07:00
184 changed files with 10217 additions and 5885 deletions

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2021.01.24**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.01.24
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2021.01.24**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2021.01.24**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2021.01.24**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.01.24
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2021.01.24**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

1
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1 @@
blank_issues_enabled: false

View File

@@ -49,11 +49,18 @@ jobs:
- name: Install Jython
if: ${{ matrix.python-impl == 'jython' }}
run: |
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose
if: ${{ matrix.python-impl != 'jython' }}
run: pip install nose
- name: Install nose (Jython)
if: ${{ matrix.python-impl == 'jython' }}
# Working around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
run: |
wget https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl
pip install nose-1.3.7-py2-none-any.whl
- name: Run tests
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env:

View File

@@ -246,3 +246,4 @@ Enes Solak
Nathan Rossi
Thomas van der Berg
Luca Cherubin
Adrian Heine

View File

@@ -150,7 +150,7 @@ After you have ensured this site is distributing its content legally, you can fo
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). This makes the extractor available for use, as long as the class ends with `IE`.
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):

341
ChangeLog
View File

@@ -1,4 +1,343 @@
version 2021.01.24
version 2021.12.17
Core
* [postprocessor/ffmpeg] Show ffmpeg output on error (#22680, #29336)
Extractors
* [youtube] Update signature function patterns (#30363, #30366)
* [peertube] Only call description endpoint if necessary (#29383)
* [periscope] Pass referer to HLS requests (#29419)
- [liveleak] Remove extractor (#17625, #24222, #29331)
+ [pornhub] Add support for pornhubthbh7ap3u.onion
* [pornhub] Detect geo restriction
* [pornhub] Dismiss tbr extracted from download URLs (#28927)
* [curiositystream:collection] Extend _VALID_URL (#26326, #29117)
* [youtube] Make get_video_info processing more robust (#29333)
* [youtube] Workaround for get_video_info request (#29333)
* [bilibili] Strip uploader name (#29202)
* [youtube] Update invidious instance list (#29281)
* [umg:de] Update GraphQL API URL (#29304)
* [nrk] Switch psapi URL to https (#29344)
+ [egghead] Add support for app.egghead.io (#28404, #29303)
* [appleconnect] Fix extraction (#29208)
+ [orf:tvthek] Add support for MPD formats (#28672, #29236)
version 2021.06.06
Extractors
* [facebook] Improve login required detection
* [youporn] Fix formats and view count extraction (#29216)
* [orf:tvthek] Fix thumbnails extraction (#29217)
* [formula1] Fix extraction (#29206)
* [ard] Relax URL regular expression and fix video ids (#22724, #29091)
+ [ustream] Detect https embeds (#29133)
* [ted] Prefer own formats over external sources (#29142)
* [twitch:clips] Improve extraction (#29149)
+ [twitch:clips] Add access token query to download URLs (#29136)
* [youtube] Fix get_video_info request (#29086, #29165)
* [vimeo] Fix vimeo pro embed extraction (#29126)
* [redbulltv] Fix embed data extraction (#28770)
* [shahid] Relax URL regular expression (#28772, #28930)
version 2021.05.16
Core
* [options] Fix thumbnail option group name (#29042)
* [YoutubeDL] Improve extract_info doc (#28946)
Extractors
+ [playstuff] Add support for play.stuff.co.nz (#28901, #28931)
* [eroprofile] Fix extraction (#23200, #23626, #29008)
+ [vivo] Add support for vivo.st (#29009)
+ [generic] Add support for og:audio (#28311, #29015)
* [phoenix] Fix extraction (#29057)
+ [generic] Add support for sibnet embeds
+ [vk] Add support for sibnet embeds (#9500)
+ [generic] Add Referer header for direct videojs download URLs (#2879,
#20217, #29053)
* [orf:radio] Switch download URLs to HTTPS (#29012, #29046)
- [blinkx] Remove extractor (#28941)
* [medaltv] Relax URL regular expression (#28884)
+ [funimation] Add support for optional lang code in URLs (#28950)
+ [gdcvault] Add support for HTML5 videos
* [dispeak] Improve FLV extraction (#13513, #28970)
* [kaltura] Improve iframe extraction (#28969)
* [kaltura] Make embed code alternatives actually work
* [cda] Improve extraction (#28709, #28937)
* [twitter] Improve formats extraction from vmap URL (#28909)
* [xtube] Fix formats extraction (#28870)
* [svtplay] Improve extraction (#28507, #28876)
* [tv2dk] Fix extraction (#28888)
version 2021.04.26
Extractors
+ [xfileshare] Add support for wolfstream.tv (#28858)
* [francetvinfo] Improve video id extraction (#28792)
* [medaltv] Fix extraction (#28807)
* [tver] Redirect all downloads to Brightcove (#28849)
* [go] Improve video id extraction (#25207, #25216, #26058)
* [youtube] Fix lazy extractors (#28780)
+ [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774)
* [cbsnews] Fix extraction for python <3.6 (#23359)
version 2021.04.17
Core
+ [utils] Add support for experimental HTTP response status code
308 Permanent Redirect (#27877, #28768)
Extractors
+ [lbry] Add support for HLS videos (#27877, #28768)
* [youtube] Fix stretched ratio calculation
* [youtube] Improve stretch extraction (#28769)
* [youtube:tab] Improve grid extraction (#28725)
+ [youtube:tab] Detect series playlist on playlists page (#28723)
+ [youtube] Add more invidious instances (#28706)
* [pluralsight] Extend anti-throttling timeout (#28712)
* [youtube] Improve URL to extractor routing (#27572, #28335, #28742)
+ [maoritv] Add support for maoritelevision.com (#24552)
+ [youtube:tab] Pass innertube context and x-goog-visitor-id header along with
continuation requests (#28702)
* [mtv] Fix Viacom A/B Testing Video Player extraction (#28703)
+ [pornhub] Extract DASH and HLS formats from get_media end point (#28698)
* [cbssports] Fix extraction (#28682)
* [jamendo] Fix track extraction (#28686)
* [curiositystream] Fix format extraction (#26845, #28668)
version 2021.04.07
Core
* [extractor/common] Use compat_cookies_SimpleCookie for _get_cookies
+ [compat] Introduce compat_cookies_SimpleCookie
* [extractor/common] Improve JSON-LD author extraction
* [extractor/common] Fix _get_cookies on python 2 (#20673, #23256, #20326,
#28640)
Extractors
* [youtube] Fix extraction of videos with restricted location (#28685)
+ [line] Add support for live.line.me (#17205, #28658)
* [vimeo] Improve extraction (#28591)
* [youku] Update ccode (#17852, #28447, #28460, #28648)
* [youtube] Prefer direct entry metadata over entry metadata from playlist
(#28619, #28636)
* [screencastomatic] Fix extraction (#11976, #24489)
+ [palcomp3] Add support for palcomp3.com (#13120)
+ [arnes] Add support for video.arnes.si (#28483)
+ [youtube:tab] Add support for hashtags (#28308)
version 2021.04.01
Extractors
* [youtube] Setup CONSENT cookie when needed (#28604)
* [vimeo] Fix password protected review extraction (#27591)
* [youtube] Improve age-restricted video extraction (#28578)
version 2021.03.31
Extractors
* [vlive] Fix inkey request (#28589)
* [francetvinfo] Improve video id extraction (#28584)
+ [instagram] Extract duration (#28469)
* [instagram] Improve title extraction (#28469)
+ [sbs] Add support for ondemand watch URLs (#28566)
* [youtube] Fix video's channel extraction (#28562)
* [picarto] Fix live stream extraction (#28532)
* [vimeo] Fix unlisted video extraction (#28414)
* [youtube:tab] Fix playlist/community continuation items extraction (#28266)
* [ard] Improve clip id extraction (#22724, #28528)
version 2021.03.25
Extractors
+ [zoom] Add support for zoom.us (#16597, #27002, #28531)
* [bbc] Fix BBC IPlayer Episodes/Group extraction (#28360)
* [youtube] Fix default value for youtube_include_dash_manifest (#28523)
* [zingmp3] Fix extraction (#11589, #16409, #16968, #27205)
+ [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
+ [tiktok] Detect private videos (#28453)
* [vimeo:album] Fix extraction for albums with number of videos multiple
to page size (#28486)
* [vvvvid] Fix kenc format extraction (#28473)
* [mlb] Fix video extraction (#21241)
* [svtplay] Improve extraction (#28448)
* [applepodcasts] Fix extraction (#28445)
* [rtve] Improve extraction
+ Extract all formats
* Fix RTVE Infantil extraction (#24851)
+ Extract is_live and series
version 2021.03.14
Core
+ Introduce release_timestamp meta field (#28386)
Extractors
+ [southpark] Add support for southparkstudios.com (#28413)
* [southpark] Fix extraction (#26763, #28413)
* [sportdeutschland] Fix extraction (#21856, #28425)
* [pinterest] Reduce the number of HLS format requests
* [peertube] Improve thumbnail extraction (#28419)
* [tver] Improve title extraction (#28418)
* [fujitv] Fix HLS formats extension (#28416)
* [shahid] Fix format extraction (#28383)
+ [lbry] Add support for channel filters (#28385)
+ [bandcamp] Extract release timestamp
+ [lbry] Extract release timestamp (#28386)
* [pornhub] Detect flagged videos
+ [pornhub] Extract formats from get_media end point (#28395)
* [bilibili] Fix video info extraction (#28341)
+ [cbs] Add support for Paramount+ (#28342)
+ [trovo] Add Origin header to VOD formats (#28346)
* [voxmedia] Fix volume embed extraction (#28338)
version 2021.03.03
Extractors
* [youtube:tab] Switch continuation to browse API (#28289, #28327)
* [9c9media] Fix extraction for videos with multiple ContentPackages (#28309)
+ [bbc] Add support for BBC Reel videos (#21870, #23660, #28268)
version 2021.03.02
Extractors
* [zdf] Rework extractors (#11606, #13473, #17354, #21185, #26711, #27068,
#27930, #28198, #28199, #28274)
* Generalize cross-extractor video ids for zdf based extractors
* Improve extraction
* Fix 3sat and phoenix
* [stretchinternet] Fix extraction (#28297)
* [urplay] Fix episode data extraction (#28292)
+ [bandaichannel] Add support for b-ch.com (#21404)
* [srgssr] Improve extraction (#14717, #14725, #27231, #28238)
+ Extract subtitle
* Fix extraction for new videos
* Update srf download domains
* [vvvvid] Reduce season request payload size
+ [vvvvid] Extract series sublists playlist title (#27601, #27618)
+ [dplay] Extract Ad-Free uplynk URLs (#28160)
+ [wat] Detect DRM protected videos (#27958)
* [tf1] Improve extraction (#27980, #28040)
* [tmz] Fix and improve extraction (#24603, #24687, 28211)
+ [gedidigital] Add support for Gedi group sites (#7347, #26946)
* [youtube] Fix get_video_info request
version 2021.02.22
Core
+ [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowercase
(#28112)
Extractors
* [apa] Fix and improve extraction (#27750)
+ [youporn] Extract duration (#28019)
+ [peertube] Add support for canard.tube (#28190)
* [youtube] Fixup m4a_dash formats (#28165)
+ [samplefocus] Add support for samplefocus.com (#27763)
+ [vimeo] Add support for unlisted video source format extraction
* [viki] Improve extraction (#26522, #28203)
* Extract uploader URL and episode number
* Report login required error
+ Extract 480p formats
* Fix API v4 calls
* [ninegag] Unescape title (#28201)
* [youtube] Improve URL regular expression (#28193)
+ [youtube] Add support for redirect.invidious.io (#28193)
+ [dplay] Add support for de.hgtv.com (#28182)
+ [dplay] Add support for discoveryplus.com (#24698)
+ [simplecast] Add support for simplecast.com (#24107)
* [youtube] Fix uploader extraction in flat playlist mode (#28045)
* [yandexmusic:playlist] Request missing tracks in chunks (#27355, #28184)
+ [storyfire] Add support for storyfire.com (#25628, #26349)
+ [zhihu] Add support for zhihu.com (#28177)
* [youtube] Fix controversial videos when authenticated with cookies (#28174)
* [ccma] Fix timestamp parsing in python 2
+ [videopress] Add support for video.wordpress.com
* [kakao] Improve info extraction and detect geo restriction (#26577)
* [xboxclips] Fix extraction (#27151)
* [ard] Improve formats extraction (#28155)
+ [canvas] Add support for dagelijksekost.een.be (#28119)
version 2021.02.10
Extractors
* [youtube:tab] Improve grid continuation extraction (#28130)
* [ign] Fix extraction (#24771)
+ [xhamster] Extract format filesize
+ [xhamster] Extract formats from xplayer settings (#28114)
+ [youtube] Add support phone/tablet JS player (#26424)
* [archiveorg] Fix and improve extraction (#21330, #23586, #25277, #26780,
#27109, #27236, #28063)
+ [cda] Detect geo restricted videos (#28106)
* [urplay] Fix extraction (#28073, #28074)
* [youtube] Fix release date extraction (#28094)
+ [youtube] Extract abr and vbr (#28100)
* [youtube] Skip OTF formats (#28070)
version 2021.02.04.1
Extractors
* [youtube] Prefer DASH formats (#28070)
* [azmedien] Fix extraction (#28064)
version 2021.02.04
Extractors
* [pornhub] Implement lazy playlist extraction
* [svtplay] Fix video id extraction (#28058)
+ [pornhub] Add support for authentication (#18797, #21416, #24294)
* [pornhub:user] Improve paging
+ [pornhub:user] Add support for URLs unavailable via /videos page (#27853)
+ [bravotv] Add support for oxygen.com (#13357, #22500)
+ [youtube] Pass embed URL to get_video_info request
* [ccma] Improve metadata extraction (#27994)
+ Extract age limit, alt title, categories, series and episode number
* Fix timestamp multiple subtitles extraction
* [egghead] Update API domain (#28038)
- [vidzi] Remove extractor (#12629)
* [vidio] Improve metadata extraction
* [youtube] Improve subtitles extraction
* [youtube] Fix chapter extraction fallback
* [youtube] Rewrite extractor
* Improve format sorting
* Remove unused code
* Fix series metadata extraction
* Fix trailer video extraction
* Improve error reporting
+ Extract video location
+ [vvvvid] Add support for youtube embeds (#27825)
* [googledrive] Report download page errors (#28005)
* [vlive] Fix error message decoding for python 2 (#28004)
* [youtube] Improve DASH formats file size extraction
* [cda] Improve birth validation detection (#14022, #27929)
+ [awaan] Extract uploader id (#27963)
+ [medialaan] Add support DPG Media MyChannels based websites (#14871, #15597,
#16106, #16489)
* [abcnews] Fix extraction (#12394, #27920)
* [AMP] Fix upload date and timestamp extraction (#27970)
* [tv4] Relax URL regular expression (#27964)
+ [tv2] Add support for mtvuutiset.fi (#27744)
* [adn] Improve login warning reporting
* [zype] Fix uplynk id extraction (#27956)
+ [adn] Add support for authentication (#17091, #27841, #27937)
version 2021.01.24.1
Core
* Introduce --output-na-placeholder (#27896)

View File

@@ -287,7 +287,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files
## Thumbnail images:
## Thumbnail Options:
--write-thumbnail Write thumbnail image to disk
--write-all-thumbnails Write all thumbnail image formats to
disk
@@ -893,7 +893,7 @@ Since June 2012 ([#342](https://github.com/ytdl-org/youtube-dl/issues/342)) yout
### The exe throws an error due to missing `MSVCR100.dll`
To run the exe you need to install first the [Microsoft Visual C++ 2010 Redistributable Package (x86)](https://www.microsoft.com/en-US/download/details.aspx?id=5555).
To run the exe you need to install first the [Microsoft Visual C++ 2010 Service Pack 1 Redistributable Package (x86)](https://download.microsoft.com/download/1/6/5/165255E7-1014-4D0A-B094-B6A430A6BFFC/vcredist_x86.exe).
### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?
@@ -1069,9 +1069,11 @@ After you have ensured this site is distributing its content legally, you can fo
}
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test (actually, test case) then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note:
* the test names use the extractor class name **without the trailing `IE`**
* tests with `only_matching` key in test's dict are not counted.
8. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
9. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py

View File

@@ -1,9 +1,9 @@
# Supported sites
- **1tv**: Первый канал
- **1up.com**
- **20min**
- **220.ro**
- **23video**
- **247sports**
- **24video**
- **3qsdn**: 3Q SDN
- **3sat**
@@ -83,6 +83,7 @@
- **awaan:video**
- **AZMedien**: AZ Medien videos
- **BaiduVideo**: 百度视频
- **bandaichannel**
- **Bandcamp**
- **Bandcamp:album**
- **Bandcamp:weekly**
@@ -90,7 +91,8 @@
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:iplayer:episodes**
- **bbc.co.uk:iplayer:group**
- **bbc.co.uk:playlist**
- **BBVTV**
- **Beatport**
@@ -117,7 +119,6 @@
- **BitChuteChannel**
- **BleacherReport**
- **BleacherReportCMS**
- **blinkx**
- **Bloomberg**
- **BokeCC**
- **BongaCams**
@@ -159,7 +160,8 @@
- **cbsnews**: CBS News
- **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports**
- **cbssports**
- **cbssports:embed**
- **CCMA**
- **CCTV**: 央视网
- **CDA**
@@ -213,6 +215,7 @@
- **curiositystream**
- **curiositystream:collection**
- **CWTV**
- **DagelijkseKost**: dagelijksekost.een.be
- **DailyMail**
- **dailymotion**
- **dailymotion:playlist**
@@ -234,6 +237,7 @@
- **DiscoveryGo**
- **DiscoveryGoPlaylist**
- **DiscoveryNetworksDe**
- **DiscoveryPlus**
- **DiscoveryVR**
- **Disney**
- **dlive:stream**
@@ -329,6 +333,7 @@
- **Gaskrank**
- **Gazeta**
- **GDCVault**
- **GediDigital**
- **generic**: Generic downloader that works on some sites
- **Gfycat**
- **GiantBomb**
@@ -354,6 +359,7 @@
- **HentaiStigma**
- **hetklokhuis**
- **hgtv.com:show**
- **HGTVDe**
- **HiDive**
- **HistoricFilms**
- **history:player**
@@ -376,6 +382,8 @@
- **HungamaSong**
- **Hypem**
- **ign.com**
- **IGNArticle**
- **IGNVideo**
- **IHeartRadio**
- **iheartradio:podcast**
- **imdb**: Internet Movie Database trailers
@@ -456,14 +464,14 @@
- **limelight**
- **limelight:channel**
- **limelight:channel_list**
- **LineLive**
- **LineLiveChannel**
- **LineTV**
- **linkedin:learning**
- **linkedin:learning:course**
- **LinuxAcademy**
- **LiTV**
- **LiveJournal**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
- **livestream:original**
- **LnkGo**
@@ -481,6 +489,7 @@
- **mangomolo:live**
- **mangomolo:video**
- **ManyVids**
- **MaoriTV**
- **Markiza**
- **MarkizaPage**
- **massengeschmack.tv**
@@ -516,6 +525,7 @@
- **mixcloud:playlist**
- **mixcloud:user**
- **MLB**
- **MLBVideo**
- **Mnet**
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
@@ -537,6 +547,7 @@
- **mtv:video**
- **mtvjapan**
- **mtvservices:embedded**
- **MTVUutisetArticle**
- **MuenchenTV**: münchen.tv
- **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
@@ -670,12 +681,14 @@
- **OutsideTV**
- **PacktPub**
- **PacktPubCourse**
- **PalcoMP3:artist**
- **PalcoMP3:song**
- **PalcoMP3:video**
- **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos
- **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **PearVideo**
- **PeerTube**
- **People**
@@ -697,6 +710,7 @@
- **play.fm**
- **player.sky.it**
- **PlayPlusTV**
- **PlayStuff**
- **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid**
@@ -802,6 +816,7 @@
- **safari:course**: safaribooksonline.com online courses
- **SAKTV**
- **SaltTV**
- **SampleFocus**
- **Sapo**: SAPO Vídeos
- **savefrom.net**
- **SBS**: sbs.com.au
@@ -824,6 +839,9 @@
- **ShahidShow**
- **Shared**: shared.sx
- **ShowRoomLive**
- **simplecast**
- **simplecast:episode**
- **simplecast:podcast**
- **Sina**
- **sky.it**
- **sky:news**
@@ -876,6 +894,9 @@
- **Steam**
- **Stitcher**
- **StitcherShow**
- **StoryFire**
- **StoryFireSeries**
- **StoryFireUser**
- **Streamable**
- **streamcloud.eu**
- **StreamCZ**
@@ -1044,6 +1065,7 @@
- **Vidbit**
- **Viddler**
- **Videa**
- **video.arnes.si**: Arnes Video
- **video.google:search**: Google Video search
- **video.sky.it**
- **video.sky.it:live**
@@ -1058,7 +1080,6 @@
- **vidme**
- **vidme:user**
- **vidme:user:likes**
- **Vidzi**
- **vier**: vier.be and vijf.be
- **vier:videos**
- **viewlift**
@@ -1103,6 +1124,7 @@
- **vrv**
- **vrv:series**
- **VShare**
- **VTM**
- **VTXTV**
- **vube**: Vube.com
- **VuClip**
@@ -1138,7 +1160,7 @@
- **WWE**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing
- **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, WolfStream, XVideoSharing
- **XHamster**
- **XHamsterEmbed**
- **XHamsterUser**
@@ -1197,5 +1219,8 @@
- **ZattooLive**
- **ZDF**
- **ZDFChannel**
- **Zhihu**
- **zingmp3**: mp3.zing.vn
- **zingmp3:album**
- **zoom**
- **Zype**

View File

@@ -18,7 +18,6 @@
"noprogress": false,
"outtmpl": "%(id)s.%(ext)s",
"password": null,
"playlistend": -1,
"playliststart": 1,
"prefer_free_formats": false,
"quiet": false,

View File

@@ -997,6 +997,25 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(downloaded['extractor'], 'Video')
self.assertEqual(downloaded['extractor_key'], 'Video')
def test_default_times(self):
"""Test addition of missing upload/release/_date from /release_/timestamp"""
info = {
'id': '1234',
'url': TEST_URL,
'title': 'Title',
'ext': 'mp4',
'timestamp': 1631352900,
'release_timestamp': 1632995931,
}
params = {'simulate': True, }
ydl = FakeYDL(params)
out_info = ydl.process_ie_result(info)
self.assertTrue(isinstance(out_info['upload_date'], compat_str))
self.assertEqual(out_info['upload_date'], '20210911')
self.assertTrue(isinstance(out_info['release_date'], compat_str))
self.assertEqual(out_info['release_date'], '20210930')
if __name__ == '__main__':
unittest.main()

View File

@@ -66,18 +66,9 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:tab'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:tab'])
# def test_youtube_search_matching(self):
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_facebook_matching(self):
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))

View File

@@ -33,6 +33,7 @@ from youtube_dl.compat import (
from youtube_dl.utils import (
DownloadError,
ExtractorError,
error_to_compat_str,
format_bytes,
UnavailableVideoError,
)
@@ -100,27 +101,28 @@ def generator(test_case, tname):
def print_skipping(reason):
print('Skipping %s: %s' % (test_case['name'], reason))
self.skipTest(reason)
if not ie.working():
print_skipping('IE marked as not _WORKING')
return
for tc in test_cases:
info_dict = tc.get('info_dict', {})
if not (info_dict.get('id') and info_dict.get('ext')):
raise Exception('Test definition incorrect. The output file cannot be known. Are both \'id\' and \'ext\' keys present?')
raise Exception('Test definition (%s) requires both \'id\' and \'ext\' keys present to define the output file' % (tname, ))
if 'skip' in test_case:
print_skipping(test_case['skip'])
return
for other_ie in other_ies:
if not other_ie.working():
print_skipping('test depends on %sIE, marked as not WORKING' % other_ie.ie_key())
return
params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl']
if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', 'in_playlist')
params.setdefault('playlistend', test_case.get('playlist_mincount'))
params.setdefault('skip_download', True)
ydl = YoutubeDL(params, auto_init=False)
@@ -160,7 +162,9 @@ def generator(test_case, tname):
except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError, compat_http_client.BadStatusLine) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
raise
msg = getattr(err, 'msg', error_to_compat_str(err))
err.msg = '%s (%s)' % (msg, tname, )
raise err
if try_num == RETRIES:
report_warning('%s failed due to network errors, skipping...' % tname)

View File

@@ -39,6 +39,16 @@ class TestExecution(unittest.TestCase):
_, stderr = p.communicate()
self.assertFalse(stderr)
def test_lazy_extractors(self):
try:
subprocess.check_call([sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, 'test/test_all_urls.py'], cwd=rootDir, stdout=_DEV_NULL)
finally:
try:
os.remove('youtube_dl/extractor/lazy_extractors.py')
except (IOError, OSError):
pass
if __name__ == '__main__':
unittest.main()

View File

@@ -112,6 +112,72 @@ class TestJSInterpreter(unittest.TestCase):
''')
self.assertEqual(jsi.call_function('z'), 5)
def test_for_loop(self):
# function x() { a=0; for (i=0; i-10; i++) {a++} a }
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i = i + 1) {a++} a }
''')
self.assertEqual(jsi.call_function('x'), 10)
def test_switch(self):
jsi = JSInterpreter('''
function x(f) { switch(f){
case 1:f+=1;
case 2:f+=2;
case 3:f+=3;break;
case 4:f+=4;
default:f=0;
} return f }
''')
self.assertEqual(jsi.call_function('x', 1), 7)
self.assertEqual(jsi.call_function('x', 3), 6)
self.assertEqual(jsi.call_function('x', 5), 0)
def test_switch_default(self):
jsi = JSInterpreter('''
function x(f) { switch(f){
case 2: f+=2;
default: f-=1;
case 5:
case 6: f+=6;
case 0: break;
case 1: f+=1;
} return f }
''')
self.assertEqual(jsi.call_function('x', 1), 2)
self.assertEqual(jsi.call_function('x', 5), 11)
self.assertEqual(jsi.call_function('x', 9), 14)
def test_try(self):
jsi = JSInterpreter('''
function x() { try{return 10} catch(e){return 5} }
''')
self.assertEqual(jsi.call_function('x'), 10)
def test_for_loop_continue(self):
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) { continue; a++ } a }
''')
self.assertEqual(jsi.call_function('x'), 0)
def test_for_loop_break(self):
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) { break; a++ } a }
''')
self.assertEqual(jsi.call_function('x'), 0)
def test_literal_list(self):
jsi = JSInterpreter('''
function x() { [1, 2, "asdf", [5, 6, 7]][3] }
''')
self.assertEqual(jsi.call_function('x'), [5, 6, 7])
def test_comma(self):
jsi = JSInterpreter('''
function x() { a=5; a -= 1, a+=3; return a }
''')
self.assertEqual(jsi.call_function('x'), 7)
if __name__ == '__main__':
unittest.main()

View File

@@ -38,6 +38,9 @@ class BaseTestSubtitles(unittest.TestCase):
self.DL = FakeYDL()
self.ie = self.IE()
self.DL.add_info_extractor(self.ie)
if not self.IE.working():
print('Skipping: %s marked as not _WORKING' % self.IE.ie_key())
self.skipTest('IE marked as not _WORKING')
def getInfoDict(self):
info_dict = self.DL.extract_info(self.url, download=False)
@@ -56,6 +59,21 @@ class BaseTestSubtitles(unittest.TestCase):
class TestYoutubeSubtitles(BaseTestSubtitles):
# Available subtitles for QRS8MkLhQmM:
# Language formats
# ru vtt, ttml, srv3, srv2, srv1, json3
# fr vtt, ttml, srv3, srv2, srv1, json3
# en vtt, ttml, srv3, srv2, srv1, json3
# nl vtt, ttml, srv3, srv2, srv1, json3
# de vtt, ttml, srv3, srv2, srv1, json3
# ko vtt, ttml, srv3, srv2, srv1, json3
# it vtt, ttml, srv3, srv2, srv1, json3
# zh-Hant vtt, ttml, srv3, srv2, srv1, json3
# hi vtt, ttml, srv3, srv2, srv1, json3
# pt-BR vtt, ttml, srv3, srv2, srv1, json3
# es-MX vtt, ttml, srv3, srv2, srv1, json3
# ja vtt, ttml, srv3, srv2, srv1, json3
# pl vtt, ttml, srv3, srv2, srv1, json3
url = 'QRS8MkLhQmM'
IE = YoutubeIE
@@ -64,41 +82,60 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
self.assertEqual(md5(subtitles['it']), '6d752b98c31f1cf8d597050c7a2cb4b5')
self.assertEqual(md5(subtitles['en']), 'ae1bd34126571a77aabd4d276b28044d')
self.assertEqual(md5(subtitles['it']), '0e0b667ba68411d88fd1c5f4f4eab2f9')
for lang in ['fr', 'de']:
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
def test_youtube_subtitles_ttml_format(self):
def _test_subtitles_format(self, fmt, md5_hash, lang='en'):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'ttml'
self.DL.params['subtitlesformat'] = fmt
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), 'e306f8c42842f723447d9f63ad65df54')
self.assertEqual(md5(subtitles[lang]), md5_hash)
def test_youtube_subtitles_ttml_format(self):
self._test_subtitles_format('ttml', 'c97ddf1217390906fa9fbd34901f3da2')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
self._test_subtitles_format('vtt', 'ae1bd34126571a77aabd4d276b28044d')
def test_youtube_subtitles_json3_format(self):
self._test_subtitles_format('json3', '688dd1ce0981683867e7fe6fde2a224b')
def _test_automatic_captions(self, url, lang):
self.url = url
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = [lang]
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
self.assertTrue(subtitles[lang] is not None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
# Available automatic captions for 8YoUxe5ncPo:
# Language formats (all in vtt, ttml, srv3, srv2, srv1, json3)
# gu, zh-Hans, zh-Hant, gd, ga, gl, lb, la, lo, tt, tr,
# lv, lt, tk, th, tg, te, fil, haw, yi, ceb, yo, de, da,
# el, eo, en, eu, et, es, ru, rw, ro, bn, be, bg, uk, jv,
# bs, ja, or, xh, co, ca, cy, cs, ps, pt, pa, vi, pl, hy,
# hr, ht, hu, hmn, hi, ha, mg, uz, ml, mn, mi, mk, ur,
# mt, ms, mr, ug, ta, my, af, sw, is, am,
# *it*, iw, sv, ar,
# su, zu, az, id, ig, nl, no, ne, ny, fr, ku, fy, fa, fi,
# ka, kk, sr, sq, ko, kn, km, st, sk, si, so, sn, sm, sl,
# ky, sd
# ...
self._test_automatic_captions('8YoUxe5ncPo', 'it')
@unittest.skip('ASR subs all in all supported langs now')
def test_youtube_translated_subtitles(self):
# This video has a subtitles track, which can be translated
self.url = 'Ky9eprVWzlI'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
# This video has a subtitles track, which can be translated (#4555)
self._test_automatic_captions('Ky9eprVWzlI', 'it')
def test_youtube_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles')
self.url = 'n5BB19UTcdA'
# Available automatic captions for 8YoUxe5ncPo:
# ...
# 8YoUxe5ncPo has no subtitles
self.url = '8YoUxe5ncPo'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
@@ -128,6 +165,7 @@ class TestDailymotionSubtitles(BaseTestSubtitles):
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestTedSubtitles(BaseTestSubtitles):
url = 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html'
IE = TEDIE
@@ -152,18 +190,19 @@ class TestVimeoSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['de', 'en', 'es', 'fr']))
self.assertEqual(md5(subtitles['en']), '8062383cf4dec168fc40a088aa6d5888')
self.assertEqual(md5(subtitles['fr']), 'b6191146a6c5d3a452244d853fde6dc8')
self.assertEqual(md5(subtitles['en']), '386cbc9320b94e25cb364b97935e5dd1')
self.assertEqual(md5(subtitles['fr']), 'c9b69eef35bc6641c0d4da8a04f9dfac')
def test_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles')
self.url = 'http://vimeo.com/56015672'
self.url = 'http://vimeo.com/68093876'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestWallaSubtitles(BaseTestSubtitles):
url = 'http://vod.walla.co.il/movie/2705958/the-yes-men'
IE = WallaIE
@@ -185,6 +224,7 @@ class TestWallaSubtitles(BaseTestSubtitles):
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
url = 'http://www.ceskatelevize.cz/ivysilani/10600540290-u6-uzasny-svet-techniky'
IE = CeskaTelevizeIE
@@ -206,6 +246,7 @@ class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestLyndaSubtitles(BaseTestSubtitles):
url = 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html'
IE = LyndaIE
@@ -218,6 +259,7 @@ class TestLyndaSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['en']), '09bbe67222259bed60deaa26997d73a7')
@unittest.skip('IE broken')
class TestNPOSubtitles(BaseTestSubtitles):
url = 'http://www.npo.nl/nos-journaal/28-08-2014/POW_00722860'
IE = NPOIE
@@ -230,6 +272,7 @@ class TestNPOSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['nl']), 'fc6435027572b63fb4ab143abd5ad3f4')
@unittest.skip('IE broken')
class TestMTVSubtitles(BaseTestSubtitles):
url = 'http://www.cc.com/video-clips/p63lk0/adam-devine-s-house-party-chasing-white-swans'
IE = ComedyCentralIE
@@ -253,8 +296,8 @@ class TestNRKSubtitles(BaseTestSubtitles):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['no']))
self.assertEqual(md5(subtitles['no']), '544fa917d3197fcbee64634559221cc2')
self.assertEqual(set(subtitles.keys()), set(['nb-ttv']))
self.assertEqual(md5(subtitles['nb-ttv']), '67e06ff02d0deaf975e68f6cb8f6a149')
class TestRaiPlaySubtitles(BaseTestSubtitles):
@@ -277,6 +320,7 @@ class TestRaiPlaySubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['it']), '4b3264186fbb103508abe5311cfcb9cd')
@unittest.skip('IE broken - DRM only')
class TestVikiSubtitles(BaseTestSubtitles):
url = 'http://www.viki.com/videos/1060846v-punch-episode-18'
IE = VikiIE
@@ -303,6 +347,7 @@ class TestThePlatformSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['en']), '97e7670cbae3c4d26ae8bcc7fdd78d4b')
@unittest.skip('IE broken')
class TestThePlatformFeedSubtitles(BaseTestSubtitles):
url = 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207'
IE = ThePlatformFeedIE
@@ -338,7 +383,7 @@ class TestDemocracynowSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
self.assertEqual(md5(subtitles['en']), 'a3cc4c0b5eadd74d9974f1c1f5101045')
def test_subtitles_in_page(self):
self.url = 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree'
@@ -346,7 +391,7 @@ class TestDemocracynowSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
self.assertEqual(md5(subtitles['en']), 'a3cc4c0b5eadd74d9974f1c1f5101045')
if __name__ == '__main__':

View File

@@ -1,275 +0,0 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import expect_value
from youtube_dl.extractor import YoutubeIE
class TestYoutubeChapters(unittest.TestCase):
_TEST_CASES = [
(
# https://www.youtube.com/watch?v=A22oy8dFjqc
# pattern: 00:00 - <title>
'''This is the absolute ULTIMATE experience of Queen's set at LIVE AID, this is the best video mixed to the absolutely superior stereo radio broadcast. This vastly superior audio mix takes a huge dump on all of the official mixes. Best viewed in 1080p. ENJOY! ***MAKE SURE TO READ THE DESCRIPTION***<br /><a href="#" onclick="yt.www.watch.player.seekTo(00*60+36);return false;">00:36</a> - Bohemian Rhapsody<br /><a href="#" onclick="yt.www.watch.player.seekTo(02*60+42);return false;">02:42</a> - Radio Ga Ga<br /><a href="#" onclick="yt.www.watch.player.seekTo(06*60+53);return false;">06:53</a> - Ay Oh!<br /><a href="#" onclick="yt.www.watch.player.seekTo(07*60+34);return false;">07:34</a> - Hammer To Fall<br /><a href="#" onclick="yt.www.watch.player.seekTo(12*60+08);return false;">12:08</a> - Crazy Little Thing Called Love<br /><a href="#" onclick="yt.www.watch.player.seekTo(16*60+03);return false;">16:03</a> - We Will Rock You<br /><a href="#" onclick="yt.www.watch.player.seekTo(17*60+18);return false;">17:18</a> - We Are The Champions<br /><a href="#" onclick="yt.www.watch.player.seekTo(21*60+12);return false;">21:12</a> - Is This The World We Created...?<br /><br />Short song analysis:<br /><br />- "Bohemian Rhapsody": Although it's a short medley version, it's one of the best performances of the ballad section, with Freddie nailing the Bb4s with the correct studio phrasing (for the first time ever!).<br /><br />- "Radio Ga Ga": Although it's missing one chorus, this is one of - if not the best - the best versions ever, Freddie nails all the Bb4s and sounds very clean! Spike Edney's Roland Jupiter 8 also really shines through on this mix, compared to the DVD releases!<br /><br />- "Audience Improv": A great improv, Freddie sounds strong and confident. You gotta love when he sustains that A4 for 4 seconds!<br /><br />- "Hammer To Fall": Despite missing a verse and a chorus, it's a strong version (possibly the best ever). Freddie sings the song amazingly, and even ad-libs a C#5 and a C5! Also notice how heavy Brian's guitar sounds compared to the thin DVD mixes - it roars!<br /><br />- "Crazy Little Thing Called Love": A great version, the crowd loves the song, the jam is great as well! Only downside to this is the slight feedback issues.<br /><br />- "We Will Rock You": Although cut down to the 1st verse and chorus, Freddie sounds strong. He nails the A4, and the solo from Dr. May is brilliant!<br /><br />- "We Are the Champions": Perhaps the high-light of the performance - Freddie is very daring on this version, he sustains the pre-chorus Bb4s, nails the 1st C5, belts great A4s, but most importantly: He nails the chorus Bb4s, in all 3 choruses! This is the only time he has ever done so! It has to be said though, the last one sounds a bit rough, but that's a side effect of belting high notes for the past 18 minutes, with nodules AND laryngitis!<br /><br />- "Is This The World We Created... ?": Freddie and Brian perform a beautiful version of this, and it is one of the best versions ever. It's both sad and hilarious that a couple of BBC engineers are talking over the song, one of them being completely oblivious of the fact that he is interrupting the performance, on live television... Which was being televised to almost 2 billion homes.<br /><br /><br />All rights go to their respective owners!<br />-----Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for fair use for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use''',
1477,
[{
'start_time': 36,
'end_time': 162,
'title': 'Bohemian Rhapsody',
}, {
'start_time': 162,
'end_time': 413,
'title': 'Radio Ga Ga',
}, {
'start_time': 413,
'end_time': 454,
'title': 'Ay Oh!',
}, {
'start_time': 454,
'end_time': 728,
'title': 'Hammer To Fall',
}, {
'start_time': 728,
'end_time': 963,
'title': 'Crazy Little Thing Called Love',
}, {
'start_time': 963,
'end_time': 1038,
'title': 'We Will Rock You',
}, {
'start_time': 1038,
'end_time': 1272,
'title': 'We Are The Champions',
}, {
'start_time': 1272,
'end_time': 1477,
'title': 'Is This The World We Created...?',
}]
),
(
# https://www.youtube.com/watch?v=ekYlRhALiRQ
# pattern: <num>. <title> 0:00
'1. Those Beaten Paths of Confusion <a href="#" onclick="yt.www.watch.player.seekTo(0*60+00);return false;">0:00</a><br />2. Beyond the Shadows of Emptiness & Nothingness <a href="#" onclick="yt.www.watch.player.seekTo(11*60+47);return false;">11:47</a><br />3. Poison Yourself...With Thought <a href="#" onclick="yt.www.watch.player.seekTo(26*60+30);return false;">26:30</a><br />4. The Agents of Transformation <a href="#" onclick="yt.www.watch.player.seekTo(35*60+57);return false;">35:57</a><br />5. Drowning in the Pain of Consciousness <a href="#" onclick="yt.www.watch.player.seekTo(44*60+32);return false;">44:32</a><br />6. Deny the Disease of Life <a href="#" onclick="yt.www.watch.player.seekTo(53*60+07);return false;">53:07</a><br /><br />More info/Buy: http://crepusculonegro.storenvy.com/products/257645-cn-03-arizmenda-within-the-vacuum-of-infinity<br /><br />No copyright is intended. The rights to this video are assumed by the owner and its affiliates.',
4009,
[{
'start_time': 0,
'end_time': 707,
'title': '1. Those Beaten Paths of Confusion',
}, {
'start_time': 707,
'end_time': 1590,
'title': '2. Beyond the Shadows of Emptiness & Nothingness',
}, {
'start_time': 1590,
'end_time': 2157,
'title': '3. Poison Yourself...With Thought',
}, {
'start_time': 2157,
'end_time': 2672,
'title': '4. The Agents of Transformation',
}, {
'start_time': 2672,
'end_time': 3187,
'title': '5. Drowning in the Pain of Consciousness',
}, {
'start_time': 3187,
'end_time': 4009,
'title': '6. Deny the Disease of Life',
}]
),
(
# https://www.youtube.com/watch?v=WjL4pSzog9w
# pattern: 00:00 <title>
'<a href="https://arizmenda.bandcamp.com/merch/despairs-depths-descended-cd" class="yt-uix-servicelink " data-target-new-window="True" data-servicelink="CDAQ6TgYACITCNf1raqT2dMCFdRjGAod_o0CBSj4HQ" data-url="https://arizmenda.bandcamp.com/merch/despairs-depths-descended-cd" rel="nofollow noopener" target="_blank">https://arizmenda.bandcamp.com/merch/...</a><br /><br /><a href="#" onclick="yt.www.watch.player.seekTo(00*60+00);return false;">00:00</a> Christening Unborn Deformities <br /><a href="#" onclick="yt.www.watch.player.seekTo(07*60+08);return false;">07:08</a> Taste of Purity<br /><a href="#" onclick="yt.www.watch.player.seekTo(16*60+16);return false;">16:16</a> Sculpting Sins of a Universal Tongue<br /><a href="#" onclick="yt.www.watch.player.seekTo(24*60+45);return false;">24:45</a> Birth<br /><a href="#" onclick="yt.www.watch.player.seekTo(31*60+24);return false;">31:24</a> Neves<br /><a href="#" onclick="yt.www.watch.player.seekTo(37*60+55);return false;">37:55</a> Libations in Limbo',
2705,
[{
'start_time': 0,
'end_time': 428,
'title': 'Christening Unborn Deformities',
}, {
'start_time': 428,
'end_time': 976,
'title': 'Taste of Purity',
}, {
'start_time': 976,
'end_time': 1485,
'title': 'Sculpting Sins of a Universal Tongue',
}, {
'start_time': 1485,
'end_time': 1884,
'title': 'Birth',
}, {
'start_time': 1884,
'end_time': 2275,
'title': 'Neves',
}, {
'start_time': 2275,
'end_time': 2705,
'title': 'Libations in Limbo',
}]
),
(
# https://www.youtube.com/watch?v=o3r1sn-t3is
# pattern: <title> 00:00 <note>
'Download this show in MP3: <a href="http://sh.st/njZKK" class="yt-uix-servicelink " data-url="http://sh.st/njZKK" data-target-new-window="True" data-servicelink="CDAQ6TgYACITCK3j8_6o2dMCFVDCGAoduVAKKij4HQ" rel="nofollow noopener" target="_blank">http://sh.st/njZKK</a><br /><br />Setlist:<br />I-E-A-I-A-I-O <a href="#" onclick="yt.www.watch.player.seekTo(00*60+45);return false;">00:45</a><br />Suite-Pee <a href="#" onclick="yt.www.watch.player.seekTo(4*60+26);return false;">4:26</a> (Incomplete)<br />Attack <a href="#" onclick="yt.www.watch.player.seekTo(5*60+31);return false;">5:31</a> (First live performance since 2011)<br />Prison Song <a href="#" onclick="yt.www.watch.player.seekTo(8*60+42);return false;">8:42</a><br />Know <a href="#" onclick="yt.www.watch.player.seekTo(12*60+32);return false;">12:32</a> (First live performance since 2011)<br />Aerials <a href="#" onclick="yt.www.watch.player.seekTo(15*60+32);return false;">15:32</a><br />Soldier Side - Intro <a href="#" onclick="yt.www.watch.player.seekTo(19*60+13);return false;">19:13</a><br />B.Y.O.B. <a href="#" onclick="yt.www.watch.player.seekTo(20*60+09);return false;">20:09</a><br />Soil <a href="#" onclick="yt.www.watch.player.seekTo(24*60+32);return false;">24:32</a><br />Darts <a href="#" onclick="yt.www.watch.player.seekTo(27*60+48);return false;">27:48</a><br />Radio/Video <a href="#" onclick="yt.www.watch.player.seekTo(30*60+38);return false;">30:38</a><br />Hypnotize <a href="#" onclick="yt.www.watch.player.seekTo(35*60+05);return false;">35:05</a><br />Temper <a href="#" onclick="yt.www.watch.player.seekTo(38*60+08);return false;">38:08</a> (First live performance since 1999)<br />CUBErt <a href="#" onclick="yt.www.watch.player.seekTo(41*60+00);return false;">41:00</a><br />Needles <a href="#" onclick="yt.www.watch.player.seekTo(42*60+57);return false;">42:57</a><br />Deer Dance <a href="#" onclick="yt.www.watch.player.seekTo(46*60+27);return false;">46:27</a><br />Bounce <a href="#" onclick="yt.www.watch.player.seekTo(49*60+38);return false;">49:38</a><br />Suggestions <a href="#" onclick="yt.www.watch.player.seekTo(51*60+25);return false;">51:25</a><br />Psycho <a href="#" onclick="yt.www.watch.player.seekTo(53*60+52);return false;">53:52</a><br />Chop Suey! <a href="#" onclick="yt.www.watch.player.seekTo(58*60+13);return false;">58:13</a><br />Lonely Day <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+01*60+15);return false;">1:01:15</a><br />Question! <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+04*60+14);return false;">1:04:14</a><br />Lost in Hollywood <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+08*60+10);return false;">1:08:10</a><br />Vicinity of Obscenity <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+13*60+40);return false;">1:13:40</a>(First live performance since 2012)<br />Forest <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+16*60+17);return false;">1:16:17</a><br />Cigaro <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+20*60+02);return false;">1:20:02</a><br />Toxicity <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+23*60+57);return false;">1:23:57</a>(with Chino Moreno)<br />Sugar <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+27*60+53);return false;">1:27:53</a>',
5640,
[{
'start_time': 45,
'end_time': 266,
'title': 'I-E-A-I-A-I-O',
}, {
'start_time': 266,
'end_time': 331,
'title': 'Suite-Pee (Incomplete)',
}, {
'start_time': 331,
'end_time': 522,
'title': 'Attack (First live performance since 2011)',
}, {
'start_time': 522,
'end_time': 752,
'title': 'Prison Song',
}, {
'start_time': 752,
'end_time': 932,
'title': 'Know (First live performance since 2011)',
}, {
'start_time': 932,
'end_time': 1153,
'title': 'Aerials',
}, {
'start_time': 1153,
'end_time': 1209,
'title': 'Soldier Side - Intro',
}, {
'start_time': 1209,
'end_time': 1472,
'title': 'B.Y.O.B.',
}, {
'start_time': 1472,
'end_time': 1668,
'title': 'Soil',
}, {
'start_time': 1668,
'end_time': 1838,
'title': 'Darts',
}, {
'start_time': 1838,
'end_time': 2105,
'title': 'Radio/Video',
}, {
'start_time': 2105,
'end_time': 2288,
'title': 'Hypnotize',
}, {
'start_time': 2288,
'end_time': 2460,
'title': 'Temper (First live performance since 1999)',
}, {
'start_time': 2460,
'end_time': 2577,
'title': 'CUBErt',
}, {
'start_time': 2577,
'end_time': 2787,
'title': 'Needles',
}, {
'start_time': 2787,
'end_time': 2978,
'title': 'Deer Dance',
}, {
'start_time': 2978,
'end_time': 3085,
'title': 'Bounce',
}, {
'start_time': 3085,
'end_time': 3232,
'title': 'Suggestions',
}, {
'start_time': 3232,
'end_time': 3493,
'title': 'Psycho',
}, {
'start_time': 3493,
'end_time': 3675,
'title': 'Chop Suey!',
}, {
'start_time': 3675,
'end_time': 3854,
'title': 'Lonely Day',
}, {
'start_time': 3854,
'end_time': 4090,
'title': 'Question!',
}, {
'start_time': 4090,
'end_time': 4420,
'title': 'Lost in Hollywood',
}, {
'start_time': 4420,
'end_time': 4577,
'title': 'Vicinity of Obscenity (First live performance since 2012)',
}, {
'start_time': 4577,
'end_time': 4802,
'title': 'Forest',
}, {
'start_time': 4802,
'end_time': 5037,
'title': 'Cigaro',
}, {
'start_time': 5037,
'end_time': 5273,
'title': 'Toxicity (with Chino Moreno)',
}, {
'start_time': 5273,
'end_time': 5640,
'title': 'Sugar',
}]
),
(
# https://www.youtube.com/watch?v=PkYLQbsqCE8
# pattern: <num> - <title> [<latinized title>] 0:00:00
'''Затемно (Zatemno) is an Obscure Black Metal Band from Russia.<br /><br />"Во прах (Vo prakh)'' Into The Ashes", Debut mini-album released may 6, 2016, by Death Knell Productions<br />Released on 6 panel digipak CD, limited to 100 copies only<br />And digital format on Bandcamp<br /><br />Tracklist<br /><br />1 - Во прах [Vo prakh] <a href="#" onclick="yt.www.watch.player.seekTo(0*3600+00*60+00);return false;">0:00:00</a><br />2 - Искупление [Iskupleniye] <a href="#" onclick="yt.www.watch.player.seekTo(0*3600+08*60+10);return false;">0:08:10</a><br />3 - Из серпов луны...[Iz serpov luny] <a href="#" onclick="yt.www.watch.player.seekTo(0*3600+14*60+30);return false;">0:14:30</a><br /><br />Links:<br /><a href="https://deathknellprod.bandcamp.com/album/--2" class="yt-uix-servicelink " data-target-new-window="True" data-url="https://deathknellprod.bandcamp.com/album/--2" data-servicelink="CC8Q6TgYACITCNP234Kr2dMCFcNxGAodQqsIwSj4HQ" target="_blank" rel="nofollow noopener">https://deathknellprod.bandcamp.com/a...</a><br /><a href="https://www.facebook.com/DeathKnellProd/" class="yt-uix-servicelink " data-target-new-window="True" data-url="https://www.facebook.com/DeathKnellProd/" data-servicelink="CC8Q6TgYACITCNP234Kr2dMCFcNxGAodQqsIwSj4HQ" target="_blank" rel="nofollow noopener">https://www.facebook.com/DeathKnellProd/</a><br /><br /><br />I don't have any right about this artifact, my only intention is to spread the music of the band, all rights are reserved to the Затемно (Zatemno) and his producers, Death Knell Productions.<br /><br />------------------------------------------------------------------<br /><br />Subscribe for more videos like this.<br />My link: <a href="https://web.facebook.com/AttackOfTheDragons" class="yt-uix-servicelink " data-target-new-window="True" data-url="https://web.facebook.com/AttackOfTheDragons" data-servicelink="CC8Q6TgYACITCNP234Kr2dMCFcNxGAodQqsIwSj4HQ" target="_blank" rel="nofollow noopener">https://web.facebook.com/AttackOfTheD...</a>''',
1138,
[{
'start_time': 0,
'end_time': 490,
'title': '1 - Во прах [Vo prakh]',
}, {
'start_time': 490,
'end_time': 870,
'title': '2 - Искупление [Iskupleniye]',
}, {
'start_time': 870,
'end_time': 1138,
'title': '3 - Из серпов луны...[Iz serpov luny]',
}]
),
(
# https://www.youtube.com/watch?v=xZW70zEasOk
# time point more than duration
'''● LCS Spring finals: Saturday and Sunday from <a href="#" onclick="yt.www.watch.player.seekTo(13*60+30);return false;">13:30</a> outside the venue! <br />● PAX East: Fri, Sat & Sun - more info in tomorrows video on the main channel!''',
283,
[]
),
]
def test_youtube_chapters(self):
for description, duration, expected_chapters in self._TEST_CASES:
ie = YoutubeIE()
expect_value(
self, ie._extract_chapters_from_description(description, duration),
expected_chapters, None)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,4 +1,5 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
# Allow direct execution
@@ -9,10 +10,10 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor import (
YoutubePlaylistIE,
YoutubeIE,
YoutubePlaylistIE,
YoutubeTabIE,
)
@@ -24,47 +25,40 @@ class TestYoutubeLists(unittest.TestCase):
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
dl.params['format'] = 'best'
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
result = dl.extract_info(result['url'], download=False, ie_key=result.get('ie_key'), process=False)
self.assertEqual(YoutubeIE().extract_id(result['url']), 'FXxLjLQi3Fg')
def test_youtube_course(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
# TODO find a > 100 (paginating?) videos course
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = list(result['entries'])
self.assertEqual(YoutubeIE().extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
self.assertEqual(YoutubeIE().extract_id(entries[-1]['url']), 'rYefUsYuEp0')
def test_youtube_mix(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries']
self.assertTrue(len(entries) >= 50)
dl.params['format'] = 'best'
ie = YoutubeTabIE(dl)
result = dl.extract_info('https://www.youtube.com/watch?v=tyITL_exICo&list=RDCLAK5uy_kLWIr9gv1XLlPbaDS965-Db4TrBoUTxQ8',
download=False, ie_key=ie.ie_key(), process=True)
entries = (result or {}).get('entries', [{'id': 'not_found', }])
self.assertTrue(len(entries) >= 25)
original_video = entries[0]
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')
self.assertEqual(original_video['id'], 'tyITL_exICo')
def test_youtube_toptracks(self):
print('Skipping: The playlist page gives error 500')
return
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=MCUS')
entries = result['entries']
self.assertEqual(len(entries), 100)
def test_youtube_flat_playlist_titles(self):
def test_youtube_flat_playlist_extraction(self):
dl = FakeYDL()
dl.params['extract_flat'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PL-KKIb8rvtMSrAO9YFbeM6UQrAqoFTUWv')
ie = YoutubeTabIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc')
self.assertIsPlaylist(result)
for entry in result['entries']:
self.assertTrue(entry.get('title'))
entries = list(result['entries'])
self.assertTrue(len(entries) == 1)
video = entries[0]
self.assertEqual(video['_type'], 'url')
self.assertEqual(video['ie_key'], 'Youtube')
self.assertEqual(video['id'], 'BaW_jenozKc')
self.assertEqual(video['url'], 'BaW_jenozKc')
self.assertEqual(video['title'], 'youtube-dl test video "\'/\\ä↭𝕐')
self.assertEqual(video['duration'], 10)
self.assertEqual(video['uploader'], 'Philipp Hagemeister')
if __name__ == '__main__':

26
test/test_youtube_misc.py Normal file
View File

@@ -0,0 +1,26 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeIE
class TestYoutubeMisc(unittest.TestCase):
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
if __name__ == '__main__':
unittest.main()

View File

@@ -14,70 +14,93 @@ import string
from test.helper import FakeYDL
from youtube_dl.extractor import YoutubeIE
from youtube_dl.jsinterp import JSInterpreter
from youtube_dl.compat import compat_str, compat_urlretrieve
_TESTS = [
_SIG_TESTS = [
(
'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
'js',
86,
'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
),
(
'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
'js',
85,
'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
'https://s.ytimg.com/yts/jsbin/html5player-vfle-mVwz.js',
'js',
90,
']\\[@?>=<;:/.-,+*)(\'&%$#"hZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjiagfedcb39876',
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vfl0Cbn9e.js',
'js',
84,
'O1I3456789abcde0ghijklmnopqrstuvwxyzABCDEFGHfJKLMN2PQRSTUVW@YZ!"#$%&\'()*+,-./:;<=',
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js',
'js',
'2ACFC7A61CA478CD21425E5A57EBD73DDC78E22A.2094302436B2D377D14A3BBA23022D023B8BC25AA',
'A52CB8B320D22032ABB3A41D773D2B6342034902.A22E87CDD37DBE75A5E52412DC874AC16A7CFCA2',
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflBb0OQx.js',
'js',
84,
'123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQ0STUVWXYZ!"#$%&\'()*+,@./:;<=>'
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vfl9FYC6l.js',
'js',
83,
'123456789abcdefghijklmnopqr0tuvwxyzABCDETGHIJKLMNOPQRS>UVWXYZ!"#$%&\'()*+,-./:;<=F'
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflCGk6yw/html5player.js',
'js',
'4646B5181C6C3020DF1D9C7FCFEA.AD80ABF70C39BD369CCCAE780AFBB98FA6B6CB42766249D9488C288',
'82C8849D94266724DC6B6AF89BBFA087EACCD963.B93C07FBA084ACAEFCF7C9D1FD0203C6C1815B6B'
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js',
'js',
'312AA52209E3623129A412D56A40F11CB0AF14AE.3EE09501CB14E3BCDC3B2AE808BF3F1D14E7FBF12',
'112AA5220913623229A412D56A40F11CB0AF14AE.3EE0950FCB14EEBCDC3B2AE808BF331D14E7FBF3',
)
]
_NSIG_TESTS = [
(
'https://www.youtube.com/s/player/9216d1f7/player_ias.vflset/en_US/base.js',
'SLp9F5bwjAdhE9F-', 'gWnb9IK2DJ8Q1w',
),
(
'https://www.youtube.com/s/player/f8cb7a3b/player_ias.vflset/en_US/base.js',
'oBo2h5euWy6osrUt', 'ivXHpm7qJjJN',
),
(
'https://www.youtube.com/s/player/2dfe380c/player_ias.vflset/en_US/base.js',
'oBo2h5euWy6osrUt', '3DIBbn3qdQ',
),
(
'https://www.youtube.com/s/player/f1ca6900/player_ias.vflset/en_US/base.js',
'cu3wyu6LQn2hse', 'jvxetvmlI9AN9Q',
),
(
'https://www.youtube.com/s/player/8040e515/player_ias.vflset/en_US/base.js',
'wvOFaY-yjgDuIEg5', 'HkfBFDHmgw4rsw',
),
(
'https://www.youtube.com/s/player/e06dea74/player_ias.vflset/en_US/base.js',
'AiuodmaDDYw8d3y4bf', 'ankd8eza2T6Qmw',
),
]
class TestPlayerInfo(unittest.TestCase):
def test_youtube_extract_player_info(self):
PLAYER_URLS = (
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/fr_FR/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-de_DE.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-tablet-en_US.vflset/base.js', '64dddad9'),
# obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
@@ -86,59 +109,70 @@ class TestPlayerInfo(unittest.TestCase):
('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
)
for player_url, expected_player_id in PLAYER_URLS:
expected_player_type = player_url.split('.')[-1]
player_type, player_id = YoutubeIE._extract_player_info(player_url)
self.assertEqual(player_type, expected_player_type)
player_id = YoutubeIE._extract_player_info(player_url)
self.assertEqual(player_id, expected_player_id)
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata/sigs')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def tearDown(self):
try:
for f in os.listdir(self.TESTDATA_DIR):
os.remove(f)
except OSError:
pass
def make_tfunc(url, stype, sig_input, expected_sig):
m = re.match(r'.*-([a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?\.[a-z]+$', url)
assert m, '%r should follow URL format' % url
test_id = m.group(1)
def test_func(self):
basename = 'player-%s.%s' % (test_id, stype)
fn = os.path.join(self.TESTDATA_DIR, basename)
def t_factory(name, sig_func, url_pattern):
def make_tfunc(url, sig_input, expected_sig):
m = url_pattern.match(url)
assert m, '%r should follow URL format' % url
test_id = m.group('id')
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
def test_func(self):
basename = 'player-{0}-{1}.js'.format(name, test_id)
fn = os.path.join(self.TESTDATA_DIR, basename)
ydl = FakeYDL()
ie = YoutubeIE(ydl)
if stype == 'js':
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read()
func = ie._parse_sig_js(jscode)
else:
assert stype == 'swf'
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = (
compat_str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input)
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)
self.assertEqual(sig_func(jscode, sig_input), expected_sig)
test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
setattr(TestSignature, test_func.__name__, test_func)
test_func.__name__ = str('test_{0}_js_{1}'.format(name, test_id))
setattr(TestSignature, test_func.__name__, test_func)
return make_tfunc
for test_spec in _TESTS:
make_tfunc(*test_spec)
def signature(jscode, sig_input):
func = YoutubeIE(FakeYDL())._parse_sig_js(jscode)
src_sig = (
compat_str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input)
return func(src_sig)
def n_sig(jscode, sig_input):
funcname = YoutubeIE(FakeYDL())._extract_n_function_name(jscode)
return JSInterpreter(jscode).call_function(funcname, sig_input)
make_sig_test = t_factory(
'signature', signature, re.compile(r'.*-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?\.[a-z]+$'))
for test_spec in _SIG_TESTS:
make_sig_test(*test_spec)
make_nsig_test = t_factory(
'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_-]+)/.+.js$'))
for test_spec in _NSIG_TESTS:
make_nsig_test(*test_spec)
if __name__ == '__main__':

View File

@@ -73,6 +73,7 @@ from .utils import (
PostProcessingError,
preferredencoding,
prepend_extension,
process_communicate_or_kill,
register_socks_protocols,
render_table,
replace_extension,
@@ -773,11 +774,20 @@ class YoutubeDL(object):
def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True, force_generic_extractor=False):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
"""
Return a list with a dictionary for each video extracted.
Arguments:
url -- URL to extract
Keyword arguments:
download -- whether to download videos during extraction
ie_key -- extractor key hint
extra_info -- dictionary containing the extra values to add to each result
process -- whether to resolve all unresolved references (URLs, playlist items),
must be True for download to work.
force_generic_extractor -- force using the generic extractor
"""
if not ie_key and force_generic_extractor:
ie_key = 'Generic'
@@ -1511,14 +1521,18 @@ class YoutubeDL(object):
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
for ts_key, date_key in (
('timestamp', 'upload_date'),
('release_timestamp', 'release_date'),
):
if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict[date_key] = compat_str(upload_date.strftime('%Y%m%d'))
except (ValueError, OverflowError, OSError):
pass
# Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series.
@@ -1893,8 +1907,17 @@ class YoutubeDL(object):
if not self.params.get('skip_download', False):
try:
def checked_get_suitable_downloader(info_dict, params):
ed_args = params.get('external_downloader_args')
dler = get_suitable_downloader(info_dict, params)
if ed_args and not params.get('external_downloader_args'):
# external_downloader_args was cleared because external_downloader was rejected
self.report_warning('Requested external downloader cannot be used: '
'ignoring --external-downloader-args.')
return dler
def dl(name, info):
fd = get_suitable_downloader(info, self.params)(self, self.params)
fd = checked_get_suitable_downloader(info, self.params)(self, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
@@ -2301,7 +2324,7 @@ class YoutubeDL(object):
['git', 'rev-parse', '--short', 'HEAD'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
cwd=os.path.dirname(os.path.abspath(__file__)))
out, err = sp.communicate()
out, err = process_communicate_or_kill(sp)
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
self._write_string('[debug] Git HEAD: ' + out + '\n')

View File

@@ -303,7 +303,7 @@ def xor(data1, data2):
def rijndael_mul(a, b):
if(a == 0 or b == 0):
if (a == 0 or b == 0):
return 0
return RIJNDAEL_EXP_TABLE[(RIJNDAEL_LOG_TABLE[a] + RIJNDAEL_LOG_TABLE[b]) % 0xFF]

View File

@@ -21,6 +21,10 @@ import subprocess
import sys
import xml.etree.ElementTree
try:
import collections.abc as compat_collections_abc
except ImportError:
import collections as compat_collections_abc
try:
import urllib.request as compat_urllib_request
@@ -73,6 +77,15 @@ try:
except ImportError: # Python 2
import Cookie as compat_cookies
if sys.version_info[0] == 2:
class compat_cookies_SimpleCookie(compat_cookies.SimpleCookie):
def load(self, rawdata):
if isinstance(rawdata, compat_str):
rawdata = str(rawdata)
return super(compat_cookies_SimpleCookie, self).load(rawdata)
else:
compat_cookies_SimpleCookie = compat_cookies.SimpleCookie
try:
import html.entities as compat_html_entities
except ImportError: # Python 2
@@ -2877,6 +2890,7 @@ else:
_terminal_size = collections.namedtuple('terminal_size', ['columns', 'lines'])
def compat_get_terminal_size(fallback=(80, 24)):
from .utils import process_communicate_or_kill
columns = compat_getenv('COLUMNS')
if columns:
columns = int(columns)
@@ -2893,7 +2907,7 @@ else:
sp = subprocess.Popen(
['stty', 'size'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = sp.communicate()
out, err = process_communicate_or_kill(sp)
_lines, _columns = map(int, out.split())
except Exception:
_columns, _lines = _terminal_size(*fallback)
@@ -2953,6 +2967,25 @@ else:
compat_Struct = struct.Struct
# compat_map/filter() returning an iterator, supposedly the
# same versioning as for zip below
try:
from future_builtins import map as compat_map
except ImportError:
try:
from itertools import imap as compat_map
except ImportError:
compat_map = map
try:
from future_builtins import filter as compat_filter
except ImportError:
try:
from itertools import ifilter as compat_filter
except ImportError:
compat_filter = filter
try:
from future_builtins import zip as compat_zip
except ImportError: # not 2.6+ or is 3.x
@@ -2997,14 +3030,17 @@ __all__ = [
'compat_b64decode',
'compat_basestring',
'compat_chr',
'compat_collections_abc',
'compat_cookiejar',
'compat_cookiejar_Cookie',
'compat_cookies',
'compat_cookies_SimpleCookie',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element',
'compat_etree_fromstring',
'compat_etree_register_namespace',
'compat_expanduser',
'compat_filter',
'compat_get_terminal_size',
'compat_getenv',
'compat_getpass',
@@ -3016,6 +3052,7 @@ __all__ = [
'compat_integer_types',
'compat_itertools_count',
'compat_kwargs',
'compat_map',
'compat_numeric_types',
'compat_ord',
'compat_os_name',

View File

@@ -1,22 +1,31 @@
from __future__ import unicode_literals
from ..utils import (
determine_protocol,
)
def get_suitable_downloader(info_dict, params={}):
info_dict['protocol'] = determine_protocol(info_dict)
info_copy = info_dict.copy()
return _get_suitable_downloader(info_copy, params)
# Some of these require get_suitable_downloader
from .common import FileDownloader
from .dash import DashSegmentsFD
from .f4m import F4mFD
from .hls import HlsFD
from .http import HttpFD
from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .ism import IsmFD
from .niconico import NiconicoDmcFD
from .external import (
get_external_downloader,
FFmpegFD,
)
from ..utils import (
determine_protocol,
)
PROTOCOL_MAP = {
'rtmp': RtmpFD,
'm3u8_native': HlsFD,
@@ -26,13 +35,12 @@ PROTOCOL_MAP = {
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
'niconico_dmc': NiconicoDmcFD,
}
def get_suitable_downloader(info_dict, params={}):
def _get_suitable_downloader(info_dict, params={}):
"""Get the downloader class that can handle the info dict."""
protocol = determine_protocol(info_dict)
info_dict['protocol'] = protocol
# if (info_dict.get('start_time') or info_dict.get('end_time')) and not info_dict.get('requested_formats') and FFmpegFD.can_download(info_dict):
# return FFmpegFD
@@ -42,7 +50,11 @@ def get_suitable_downloader(info_dict, params={}):
ed = get_external_downloader(external_downloader)
if ed.can_download(info_dict):
return ed
# Avoid using unwanted args since external_downloader was rejected
if params.get('external_downloader_args'):
params['external_downloader_args'] = None
protocol = info_dict['protocol']
if protocol.startswith('m3u8') and info_dict.get('is_live'):
return FFmpegFD

View File

@@ -22,6 +22,7 @@ from ..utils import (
handle_youtubedl_headers,
check_executable,
is_outdated_version,
process_communicate_or_kill,
)
@@ -104,7 +105,7 @@ class ExternalFD(FileDownloader):
p = subprocess.Popen(
cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate()
_, stderr = process_communicate_or_kill(p)
if p.returncode != 0:
self.to_stderr(stderr.decode('utf-8', 'replace'))
return p.returncode
@@ -141,7 +142,7 @@ class CurlFD(ExternalFD):
# curl writes the progress to stderr so don't capture it.
p = subprocess.Popen(cmd)
p.communicate()
process_communicate_or_kill(p)
return p.returncode
@@ -336,14 +337,17 @@ class FFmpegFD(ExternalFD):
proc = subprocess.Popen(args, stdin=subprocess.PIPE, env=env)
try:
retval = proc.wait()
except KeyboardInterrupt:
# subprocces.run would send the SIGKILL signal to ffmpeg and the
except BaseException as e:
# subprocess.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
if isinstance(e, KeyboardInterrupt) and sys.platform != 'win32':
process_communicate_or_kill(proc, b'q')
else:
proc.kill()
proc.wait()
raise
return retval

View File

@@ -0,0 +1,66 @@
# coding: utf-8
from __future__ import unicode_literals
try:
import threading
except ImportError:
threading = None
from .common import FileDownloader
from ..downloader import get_suitable_downloader
from ..extractor.niconico import NiconicoIE
from ..utils import sanitized_Request
class NiconicoDmcFD(FileDownloader):
""" Downloading niconico douga from DMC with heartbeat """
FD_NAME = 'niconico_dmc'
def real_download(self, filename, info_dict):
self.to_screen('[%s] Downloading from DMC' % self.FD_NAME)
ie = NiconicoIE(self.ydl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)
fd = get_suitable_downloader(info_dict, params=self.params)(self.ydl, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if not threading:
self.to_screen('[%s] Threading for Heartbeat not available' % self.FD_NAME)
return fd.real_download(filename, info_dict)
success = download_complete = False
timer = [None]
heartbeat_lock = threading.Lock()
heartbeat_url = heartbeat_info_dict['url']
heartbeat_data = heartbeat_info_dict['data'].encode()
heartbeat_interval = heartbeat_info_dict.get('interval', 30)
request = sanitized_Request(heartbeat_url, heartbeat_data)
def heartbeat():
try:
self.ydl.urlopen(request).read()
except Exception:
self.to_screen('[%s] Heartbeat failed' % self.FD_NAME)
with heartbeat_lock:
if not download_complete:
timer[0] = threading.Timer(heartbeat_interval, heartbeat)
timer[0].start()
heartbeat_info_dict['ping']()
self.to_screen('[%s] Heartbeat with %d second interval ...' % (self.FD_NAME, heartbeat_interval))
try:
heartbeat()
if type(fd).__name__ == 'HlsFD':
info_dict.update(ie._extract_m3u8_formats(info_dict['url'], info_dict['id'])[0])
success = fd.real_download(filename, info_dict)
finally:
if heartbeat_lock:
with heartbeat_lock:
timer[0].cancel()
download_complete = True
return success

View File

@@ -89,11 +89,13 @@ class RtmpFD(FileDownloader):
self.to_screen('')
cursor_in_new_line = True
self.to_screen('[rtmpdump] ' + line)
finally:
if not cursor_in_new_line:
self.to_screen('')
return proc.wait()
except BaseException: # Including KeyboardInterrupt
proc.kill()
proc.wait()
if not cursor_in_new_line:
self.to_screen('')
return proc.returncode
raise
url = info_dict['url']
player_url = info_dict.get('player_url')

View File

@@ -1,14 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import re
import time
from .amp import AMPIE
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_urlparse
from ..utils import (
parse_duration,
parse_iso8601,
try_get,
)
class AbcNewsVideoIE(AMPIE):
@@ -18,8 +19,8 @@ class AbcNewsVideoIE(AMPIE):
(?:
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
(?:[^/]+/)*video/(?P<display_id>[0-9a-z-]+)-|
video/(?:embed|itemfeed)\?.*?\bid=
)|
fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
)
@@ -36,6 +37,8 @@ class AbcNewsVideoIE(AMPIE):
'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
'duration': 180,
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1380454200,
'upload_date': '20130929',
},
'params': {
# m3u8 download
@@ -47,6 +50,12 @@ class AbcNewsVideoIE(AMPIE):
}, {
'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
'only_matching': True,
}, {
'url': 'http://abcnews.go.com/video/itemfeed?id=46979033',
'only_matching': True,
}, {
'url': 'https://abcnews.go.com/GMA/News/video/history-christmas-story-67894761',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -67,28 +76,23 @@ class AbcNewsIE(InfoExtractor):
_VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
# Youtube Embeds
'url': 'https://abcnews.go.com/Entertainment/peter-billingsley-child-actor-christmas-story-hollywood-power/story?id=51286501',
'info_dict': {
'id': '10505354',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
'description': 'Nightline investigates the dangers that lurk at various jobs.',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20100428',
'timestamp': 1272412800,
'id': '51286501',
'title': "Peter Billingsley: From child actor in 'A Christmas Story' to Hollywood power player",
'description': 'Billingsley went from a child actor to Hollywood power player.',
},
'add_ie': ['AbcNewsVideo'],
'playlist_count': 5,
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '38897857',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',
'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.',
'upload_date': '20160515',
'timestamp': 1463329500,
'upload_date': '20160505',
'timestamp': 1462442280,
},
'params': {
# m3u8 download
@@ -100,49 +104,55 @@ class AbcNewsIE(InfoExtractor):
}, {
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}, {
# inline.type == 'video'
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_id = mobj.group('id')
story_id = self._match_id(url)
webpage = self._download_webpage(url, story_id)
story = self._parse_json(self._search_regex(
r"window\['__abcnews__'\]\s*=\s*({.+?});",
webpage, 'data'), story_id)['page']['content']['story']['everscroll'][0]
article_contents = story.get('articleContents') or {}
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
full_video_url = compat_urlparse.urljoin(url, video_url)
def entries():
featured_video = story.get('featuredVideo') or {}
feed = try_get(featured_video, lambda x: x['video']['feed'])
if feed:
yield {
'_type': 'url',
'id': featured_video.get('id'),
'title': featured_video.get('name'),
'url': feed,
'thumbnail': featured_video.get('images'),
'description': featured_video.get('description'),
'timestamp': parse_iso8601(featured_video.get('uploadDate')),
'duration': parse_duration(featured_video.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
youtube_url = YoutubeIE._extract_url(webpage)
for inline in (article_contents.get('inlines') or []):
inline_type = inline.get('type')
if inline_type == 'iframe':
iframe_url = try_get(inline, lambda x: x['attrs']['src'])
if iframe_url:
yield self.url_result(iframe_url)
elif inline_type == 'video':
video_id = inline.get('id')
if video_id:
yield {
'_type': 'url',
'id': video_id,
'url': 'http://abcnews.go.com/video/embed?id=' + video_id,
'thumbnail': inline.get('imgSrc') or inline.get('imgDefault'),
'description': inline.get('description'),
'duration': parse_duration(inline.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
timestamp = None
date_str = self._html_search_regex(
r'<span[^>]+class="timestamp">([^<]+)</span>',
webpage, 'timestamp', fatal=False)
if date_str:
tz_offset = 0
if date_str.endswith(' ET'): # Eastern Time
tz_offset = -5
date_str = date_str[:-3]
date_formats = ['%b. %d, %Y', '%b %d, %Y, %I:%M %p']
for date_format in date_formats:
try:
timestamp = calendar.timegm(time.strptime(date_str.strip(), date_format))
except ValueError:
continue
if timestamp is not None:
timestamp -= tz_offset * 3600
entry = {
'_type': 'url_transparent',
'ie_key': AbcNewsVideoIE.ie_key(),
'url': full_video_url,
'id': video_id,
'display_id': display_id,
'timestamp': timestamp,
}
if youtube_url:
entries = [entry, self.url_result(youtube_url, ie=YoutubeIE.ie_key())]
return self.playlist_result(entries)
return entry
return self.playlist_result(
entries(), story_id, article_contents.get('headline'),
article_contents.get('subHead'))

View File

@@ -26,6 +26,7 @@ from ..utils import (
strip_or_none,
try_get,
unified_strdate,
urlencode_postdata,
)
@@ -51,9 +52,12 @@ class ADNIE(InfoExtractor):
}
}
_NETRC_MACHINE = 'animedigitalnetwork'
_BASE_URL = 'http://animedigitalnetwork.fr'
_API_BASE_URL = 'https://gw.api.animedigitalnetwork.fr/'
_PLAYER_BASE_URL = _API_BASE_URL + 'player/'
_HEADERS = {}
_LOGIN_ERR_MESSAGE = 'Unable to log in'
_RSA_KEY = (0x9B42B08905199A5CCE2026274399CA560ECB209EE9878A708B1C0812E1BB8CB5D1FB7441861147C1A1F2F3A0476DD63A9CAC20D3E983613346850AA6CB38F16DC7D720FD7D86FC6E5B3D5BBC72E14CD0BF9E869F2CEA2CCAD648F1DCE38F1FF916CEFB2D339B64AA0264372344BC775E265E8A852F88144AB0BD9AA06C1A4ABB, 65537)
_POS_ALIGN_MAP = {
'start': 1,
@@ -129,19 +133,42 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
}])
return subtitles
def _real_initialize(self):
username, password = self._get_login_info()
if not username:
return
try:
access_token = (self._download_json(
self._API_BASE_URL + 'authentication/login', None,
'Logging in', self._LOGIN_ERR_MESSAGE, fatal=False,
data=urlencode_postdata({
'password': password,
'rememberMe': False,
'source': 'Web',
'username': username,
})) or {}).get('accessToken')
if access_token:
self._HEADERS = {'authorization': 'Bearer ' + access_token}
except ExtractorError as e:
message = None
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(
e.cause.read().decode(), None, fatal=False) or {}
message = resp.get('message') or resp.get('code')
self.report_warning(message or self._LOGIN_ERR_MESSAGE)
def _real_extract(self, url):
video_id = self._match_id(url)
video_base_url = self._PLAYER_BASE_URL + 'video/%s/' % video_id
player = self._download_json(
video_base_url + 'configuration', video_id,
'Downloading player config JSON metadata')['player']
'Downloading player config JSON metadata',
headers=self._HEADERS)['player']
options = player['options']
user = options['user']
if not user.get('hasAccess'):
raise ExtractorError(
'This video is only available for paying users', expected=True)
# self.raise_login_required() # FIXME: Login is not implemented
self.raise_login_required()
token = self._download_json(
user.get('refreshTokenUrl') or (self._PLAYER_BASE_URL + 'refresh/token'),
@@ -188,8 +215,7 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
message = error.get('message')
if e.cause.code == 403 and error.get('code') == 'player-bad-geolocation-country':
self.raise_geo_restricted(msg=message)
else:
raise ExtractorError(message)
raise ExtractorError(message)
else:
raise ExtractorError('Giving up retrying')

View File

@@ -20,8 +20,8 @@ class AENetworksBaseIE(ThePlatformIE):
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/'''
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
_THEPLATFORM_KEY = '43jXaGRQud'
_THEPLATFORM_SECRET = 'S10BPXHMlb'
_DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'),
'aetv.com': ('AETV', 'aetv'),
@@ -252,7 +252,7 @@ class AENetworksShowIE(AENetworksListBaseIE):
_TESTS = [{
'url': 'http://www.history.com/shows/ancient-aliens',
'info_dict': {
'id': 'SH012427480000',
'id': 'SERIES1574',
'title': 'Ancient Aliens',
'description': 'md5:3f6d74daf2672ff3ae29ed732e37ea7f',
},

View File

@@ -18,7 +18,7 @@ class AliExpressLiveIE(InfoExtractor):
'id': '2800002704436634',
'ext': 'mp4',
'title': 'CASIMA7.22',
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'uploader': 'CASIMA Official Store',
'timestamp': 1500717600,
'upload_date': '20170722',

View File

@@ -0,0 +1,89 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
dict_get,
get_element_by_class,
int_or_none,
unified_strdate,
url_or_none,
)
class Alsace20TVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?alsace20\.tv/(?:[\w-]+/)+[\w-]+-(?P<id>[\w]+)'
_TESTS = [{
'url': 'https://www.alsace20.tv/VOD/Actu/JT/Votre-JT-jeudi-3-fevrier-lyNHCXpYJh.html',
# 'md5': 'd91851bf9af73c0ad9b2cdf76c127fbb',
'info_dict': {
'id': 'lyNHCXpYJh',
'ext': 'mp4',
'description': 'md5:fc0bc4a0692d3d2dba4524053de4c7b7',
'title': 'Votre JT du jeudi 3 février',
'upload_date': '20220203',
'thumbnail': r're:https?://.+\.jpg',
'duration': 1073,
'view_count': int,
},
'params': {
'format': 'bestvideo',
},
}]
def _extract_video(self, video_id, url=None):
info = self._download_json(
'https://www.alsace20.tv/visionneuse/visio_v9_js.php?key=%s&habillage=0&mode=html' % (video_id, ),
video_id) or {}
title = info['titre']
formats = []
for res, fmt_url in (info.get('files') or {}).items():
formats.extend(
self._extract_smil_formats(fmt_url, video_id, fatal=False)
if '/smil:_' in fmt_url
else self._extract_mpd_formats(fmt_url, video_id, mpd_id=res, fatal=False))
self._sort_formats(formats)
webpage = (url and self._download_webpage(url, video_id, fatal=False)) or ''
thumbnail = url_or_none(dict_get(info, ('image', 'preview', )) or self._og_search_thumbnail(webpage))
upload_date = self._search_regex(r'/(\d{6})_', thumbnail, 'upload_date', default=None)
upload_date = unified_strdate('20%s-%s-%s' % (upload_date[:2], upload_date[2:4], upload_date[4:])) if upload_date else None
return {
'id': video_id,
'title': title,
'formats': formats,
'description': clean_html(get_element_by_class('wysiwyg', webpage)),
'upload_date': upload_date,
'thumbnail': thumbnail,
'duration': int_or_none(self._og_search_property('video:duration', webpage) if webpage else None),
'view_count': int_or_none(info.get('nb_vues')),
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_video(video_id, url)
class Alsace20TVEmbedIE(Alsace20TVIE):
_VALID_URL = r'https?://(?:www\.)?alsace20\.tv/emb/(?P<id>[\w]+)'
_TESTS = [{
'url': 'https://www.alsace20.tv/emb/lyNHCXpYJh',
# 'md5': 'd91851bf9af73c0ad9b2cdf76c127fbb',
'info_dict': {
'id': 'lyNHCXpYJh',
'ext': 'mp4',
'title': 'Votre JT du jeudi 3 février',
'upload_date': '20220203',
'thumbnail': r're:https?://.+\.jpg',
'view_count': int,
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_video(video_id)

View File

@@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
mimetype2ext,
parse_iso8601,
unified_timestamp,
url_or_none,
)
@@ -88,7 +89,7 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
timestamp = unified_timestamp(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return {
'id': video_id,

View File

@@ -6,25 +6,21 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
int_or_none,
url_or_none,
)
class APAIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_VALID_URL = r'(?P<base_url>https?://[^/]+\.apa\.at)/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
'info_dict': {
'id': 'jjv85FdZ',
'id': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'title': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
}, {
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
@@ -46,9 +42,11 @@ class APAIE(InfoExtractor):
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id, base_url = mobj.group('id', 'base_url')
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'%s/player/%s' % (base_url, video_id), video_id)
jwplatform_id = self._search_regex(
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
@@ -59,16 +57,18 @@ class APAIE(InfoExtractor):
'jwplatform:' + jwplatform_id, ie='JWPlatform',
video_id=video_id)
sources = self._parse_json(
self._search_regex(
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'),
video_id, transform_source=js_to_json)
def extract(field, name=None):
return self._search_regex(
r'\b%s["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % field,
webpage, name or field, default=None, group='value')
title = extract('title') or video_id
description = extract('description')
thumbnail = extract('poster', 'thumbnail')
formats = []
for source in sources:
if not isinstance(source, dict):
continue
source_url = url_or_none(source.get('file'))
for format_id in ('hls', 'progressive'):
source_url = url_or_none(extract(format_id))
if not source_url:
continue
ext = determine_ext(source_url)
@@ -77,18 +77,19 @@ class APAIE(InfoExtractor):
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
height = int_or_none(self._search_regex(
r'(\d+)\.mp4', source_url, 'height', default=None))
formats.append({
'url': source_url,
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
return {
'id': video_id,
'title': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -9,10 +9,10 @@ from ..utils import (
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/(?:id)?sa\.(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': 'e7c38568a01ea45402570e6029206723',
'md5': 'c1d41f72c8bcaf222e089434619316e4',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
@@ -22,7 +22,10 @@ class AppleConnectIE(InfoExtractor):
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
}, {
'url': 'https://itunes.apple.com/us/post/sa.0fe0229f-2457-11e5-9f40-1bb645f2d5d9',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -36,7 +39,7 @@ class AppleConnectIE(InfoExtractor):
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count', default=None))
return {
'id': video_id,

View File

@@ -3,8 +3,11 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
clean_podcast_url,
get_element_by_class,
int_or_none,
parse_codecs,
parse_iso8601,
try_get,
)
@@ -14,16 +17,17 @@ class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'md5': 'df02e6acb11c10e844946a39e7222b08',
'md5': '41dc31cd650143e530d9423b6b5a344f',
'info_dict': {
'id': '1000482637777',
'ext': 'mp3',
'title': '207 - Whitney Webb Returns',
'description': 'md5:13a73bade02d2e43737751e3987e1399',
'description': 'md5:75ef4316031df7b41ced4e7b987f79c6',
'upload_date': '20200705',
'timestamp': 1593921600,
'duration': 6425,
'timestamp': 1593932400,
'duration': 6454,
'series': 'The Tim Dillon Show',
'thumbnail': 're:.+[.](png|jpe?g|webp)',
}
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
@@ -39,18 +43,40 @@ class ApplePodcastsIE(InfoExtractor):
def _real_extract(self, url):
episode_id = self._match_id(url)
webpage = self._download_webpage(url, episode_id)
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
episode = ember_data['data']['attributes']
episode_data = {}
ember_data = {}
# new page type 2021-11
amp_data = self._parse_json(self._search_regex(
r'(?s)id="shoebox-media-api-cache-amp-podcasts"[^>]*>\s*({.+?})\s*<',
webpage, 'AMP data', default='{}'), episode_id, fatal=False) or {}
amp_data = try_get(amp_data,
lambda a: self._parse_json(
next(a[x] for x in iter(a) if episode_id in x),
episode_id),
dict) or {}
amp_data = amp_data.get('d') or []
episode_data = try_get(
amp_data,
lambda a: next(x for x in a
if x['type'] == 'podcast-episodes' and x['id'] == episode_id),
dict)
if not episode_data:
# try pre 2021-11 page type: TODO: consider deleting if no longer used
ember_data = self._parse_json(self._search_regex(
r'(?s)id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id) or {}
ember_data = ember_data.get(episode_id) or ember_data
episode_data = try_get(ember_data, lambda x: x['data'], dict)
episode = episode_data['attributes']
description = episode.get('description') or {}
series = None
for inc in (ember_data.get('included') or []):
for inc in (amp_data or ember_data.get('included') or []):
if inc.get('type') == 'media/podcast':
series = try_get(inc, lambda x: x['attributes']['name'])
series = series or clean_html(get_element_by_class('podcast-header__identity', webpage))
return {
info = [{
'id': episode_id,
'title': episode['name'],
'url': clean_podcast_url(episode['assetUrl']),
@@ -58,4 +84,10 @@ class ApplePodcastsIE(InfoExtractor):
'timestamp': parse_iso8601(episode.get('releaseDateTime')),
'duration': int_or_none(episode.get('durationInMilliseconds'), 1000),
'series': series,
}
'thumbnail': self._og_search_thumbnail(webpage),
}]
self._sort_formats(info)
info = info[0]
codecs = parse_codecs(info.get('ext', 'mp3'))
info.update(codecs)
return info

View File

@@ -2,15 +2,17 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
unified_strdate,
clean_html,
extract_attributes,
unified_strdate,
unified_timestamp,
)
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db',
@@ -19,8 +21,11 @@ class ArchiveOrgIE(InfoExtractor):
'ext': 'ogg',
'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210',
'uploader': 'SRI International'
'creator': 'SRI International',
'release_date': '19681210',
'uploader': 'SRI International',
'timestamp': 1268695290,
'upload_date': '20100315',
}
}, {
'url': 'https://archive.org/details/Cops1922',
@@ -29,22 +34,43 @@ class ArchiveOrgIE(InfoExtractor):
'id': 'Cops1922',
'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:89e7c77bf5d965dd5c0372cfb49470f6',
'description': 'md5:43a603fd6c5b4b90d12a96b921212b9c',
'timestamp': 1387699629,
'upload_date': '20131222',
}
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}, {
'url': 'https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
playlist = None
play8 = self._search_regex(
r'(<[^>]+\bclass=["\']js-play8-playlist[^>]+>)', webpage,
'playlist', default=None)
if play8:
attrs = extract_attributes(play8)
playlist = attrs.get('value')
if not playlist:
# Old jwplayer fallback
playlist = self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist', default='[]')
jwplayer_playlist = self._parse_json(playlist, video_id, fatal=False)
if jwplayer_playlist:
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
else:
# HTML5 media fallback
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
info['id'] = video_id
def get_optional(metadata, field):
return metadata.get(field, [None])[0]
@@ -58,8 +84,12 @@ class ArchiveOrgIE(InfoExtractor):
'description': clean_html(get_optional(metadata, 'description')),
})
if info.get('_type') != 'playlist':
creator = get_optional(metadata, 'creator')
info.update({
'uploader': get_optional(metadata, 'creator'),
'upload_date': unified_strdate(get_optional(metadata, 'date')),
'creator': creator,
'release_date': unified_strdate(get_optional(metadata, 'date')),
'uploader': get_optional(metadata, 'publisher') or creator,
'timestamp': unified_timestamp(get_optional(metadata, 'publicdate')),
'language': get_optional(metadata, 'language'),
})
return info

View File

@@ -249,14 +249,14 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?:video-?)?(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/(?:[^/?#&]+/)+(?P<id>[^/?#&]+))\.html'
_TESTS = [{
# available till 7.01.2022
'url': 'https://www.daserste.de/information/talk/maischberger/videos/maischberger-die-woche-video100.html',
'md5': '867d8aa39eeaf6d76407c5ad1bb0d4c1',
'info_dict': {
'display_id': 'maischberger-die-woche',
'id': '100',
'id': 'maischberger-die-woche-video100',
'display_id': 'maischberger-die-woche-video100',
'ext': 'mp4',
'duration': 3687.0,
'title': 'maischberger. die woche vom 7. Januar 2021',
@@ -264,16 +264,25 @@ class ARDIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'url': 'https://www.daserste.de/information/politik-weltgeschehen/morgenmagazin/videosextern/dominik-kahun-aus-der-nhl-direkt-zur-weltmeisterschaft-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/information/nachrichten-wetter/tagesthemen/videosextern/tagesthemen-17736.html',
'only_matching': True,
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/serie/in-aller-freundschaft-die-jungen-aerzte/Drehpause-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/film/filmmittwoch-im-ersten/videos/making-ofwendezeit-video-100.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
display_id = mobj.group('id')
player_url = mobj.group('mainurl') + '~playerXml.xml'
doc = self._download_xml(player_url, display_id)
@@ -284,26 +293,63 @@ class ARDIE(InfoExtractor):
formats = []
for a in video_node.findall('.//asset'):
file_name = xpath_text(a, './fileName', default=None)
if not file_name:
continue
format_type = a.attrib.get('type')
format_url = url_or_none(file_name)
if format_url:
ext = determine_ext(file_name)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_type or 'hls', fatal=False))
continue
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(format_url, {'hdcore': '3.7.0'}),
display_id, f4m_id=format_type or 'hds', fatal=False))
continue
f = {
'format_id': a.attrib['type'],
'width': int_or_none(a.find('./frameWidth').text),
'height': int_or_none(a.find('./frameHeight').text),
'vbr': int_or_none(a.find('./bitrateVideo').text),
'abr': int_or_none(a.find('./bitrateAudio').text),
'vcodec': a.find('./codecVideo').text,
'tbr': int_or_none(a.find('./totalBitrate').text),
'format_id': format_type,
'width': int_or_none(xpath_text(a, './frameWidth')),
'height': int_or_none(xpath_text(a, './frameHeight')),
'vbr': int_or_none(xpath_text(a, './bitrateVideo')),
'abr': int_or_none(xpath_text(a, './bitrateAudio')),
'vcodec': xpath_text(a, './codecVideo'),
'tbr': int_or_none(xpath_text(a, './totalBitrate')),
}
if a.find('./serverPrefix').text:
f['url'] = a.find('./serverPrefix').text
f['playpath'] = a.find('./fileName').text
server_prefix = xpath_text(a, './serverPrefix', default=None)
if server_prefix:
f.update({
'url': server_prefix,
'playpath': file_name,
})
else:
f['url'] = a.find('./fileName').text
if not format_url:
continue
f['url'] = format_url
formats.append(f)
self._sort_formats(formats)
_SUB_FORMATS = (
('./dataTimedText', 'ttml'),
('./dataTimedTextNoOffset', 'ttml'),
('./dataTimedTextVtt', 'vtt'),
)
subtitles = {}
for subsel, subext in _SUB_FORMATS:
for node in video_node.findall(subsel):
subtitles.setdefault('de', []).append({
'url': node.attrib['url'],
'ext': subext,
})
return {
'id': mobj.group('id'),
'id': xpath_text(video_node, './videoId', default=display_id),
'formats': formats,
'subtitles': subtitles,
'display_id': display_id,
'title': video_node.find('./title').text,
'duration': parse_duration(video_node.find('./duration').text),
@@ -313,7 +359,7 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?:[^/]+/)?(?:player|live|video)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
@@ -343,22 +389,22 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}, {
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/video/coronavirus-update-ndr-info/astrazeneca-kurz-lockdown-und-pims-syndrom-81/ndr/Y3JpZDovL25kci5kZS84NzE0M2FjNi0wMWEwLTQ5ODEtOTE5NS1mOGZhNzdhOTFmOTI/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
video_id = self._match_id(url)
player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({
video_id, data=json.dumps({
'query': '''{
playerPage(client:"%s", clipId: "%s") {
playerPage(client: "ard", clipId: "%s") {
blockedByFsk
broadcastedOn
maturityContentRating
@@ -388,7 +434,7 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}
}
}
}''' % (mobj.group('client'), video_id),
}''' % video_id,
}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
@@ -413,7 +459,6 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({
'age_limit': age_limit,
'display_id': display_id,
'title': title,
'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),

View File

@@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View File

@@ -12,6 +12,7 @@ from ..utils import (
ExtractorError,
int_or_none,
qualities,
strip_or_none,
try_get,
unified_strdate,
url_or_none,
@@ -252,3 +253,49 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText')
return self.playlist_result(entries, playlist_id, title, description)
class ArteTVCategoryIE(ArteTVBaseIE):
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>%s)/videos/(?P<id>[\w-]+(?:/[\w-]+)*)/?\s*$' % ArteTVBaseIE._ARTE_LANGUAGES
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/politics-and-society/',
'info_dict': {
'id': 'politics-and-society',
'title': 'Politics and society',
'description': 'Investigative documentary series, geopolitical analysis, and international commentary',
},
'playlist_mincount': 13,
},
]
@classmethod
def suitable(cls, url):
return (
not any(ie.suitable(url) for ie in (ArteTVIE, ArteTVPlaylistIE, ))
and super(ArteTVCategoryIE, cls).suitable(url))
def _real_extract(self, url):
lang, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id)
items = []
for video in re.finditer(
r'<a\b[^>]*?href\s*=\s*(?P<q>"|\'|\b)(?P<url>https?://www\.arte\.tv/%s/videos/[\w/-]+)(?P=q)' % lang,
webpage):
video = video.group('url')
if video == url:
continue
if any(ie.suitable(video) for ie in (ArteTVIE, ArteTVPlaylistIE, )):
items.append(video)
if items:
title = (self._og_search_title(webpage, default=None)
or self._html_search_regex(r'<title\b[^>]*>([^<]+)</title>', default=None))
title = strip_or_none(title.rsplit('|', 1)[0]) or self._generic_title(url)
result = self.playlist_from_matches(items, playlist_id=playlist_id, playlist_title=title)
if result:
description = self._og_search_description(webpage, default=None)
if description:
result['description'] = description
return result

View File

@@ -14,7 +14,7 @@ from ..utils import (
class AudiomackIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/song/(?P<id>[\w/-]+)'
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/(?:song/|(?=.+/song/))(?P<id>[\w/-]+)'
IE_NAME = 'audiomack'
_TESTS = [
# hosted on audiomack
@@ -29,25 +29,27 @@ class AudiomackIE(InfoExtractor):
}
},
# audiomack wrapper around soundcloud song
# Needs new test URL.
{
'add_ie': ['Soundcloud'],
'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
'info_dict': {
'id': '258901379',
'ext': 'mp3',
'description': 'mamba day freestyle for the legend Kobe Bryant ',
'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
'uploader': 'ILOVEMAKONNEN',
'upload_date': '20160414',
}
'only_matching': True,
# 'info_dict': {
# 'id': '258901379',
# 'ext': 'mp3',
# 'description': 'mamba day freestyle for the legend Kobe Bryant ',
# 'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
# 'uploader': 'ILOVEMAKONNEN',
# 'upload_date': '20160414',
# }
},
]
def _real_extract(self, url):
# URLs end with [uploader name]/[uploader title]
# URLs end with [uploader name]/song/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
album_url_tag = self._match_id(url).replace('/song/', '/')
# Request the extended version of the api for extra fields like artist and title
api_response = self._download_json(
@@ -73,13 +75,13 @@ class AudiomackIE(InfoExtractor):
class AudiomackAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/album/(?P<id>[\w/-]+)'
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/(?:album/|(?=.+/album/))(?P<id>[\w/-]+)'
IE_NAME = 'audiomack:album'
_TESTS = [
# Standard album playlist
{
'url': 'http://www.audiomack.com/album/flytunezcom/tha-tour-part-2-mixtape',
'playlist_count': 15,
'playlist_count': 11,
'info_dict':
{
'id': '812251',
@@ -95,24 +97,24 @@ class AudiomackAlbumIE(InfoExtractor):
},
'playlist': [{
'info_dict': {
'title': 'PPP (Pistol P Project) - 9. Heaven or Hell (CHIMACA) ft Zuse (prod by DJ FU)',
'id': '837577',
'title': 'PPP (Pistol P Project) - 10. 4 Minutes Of Hell Part 4 (prod by DY OF 808 MAFIA)',
'id': '837580',
'ext': 'mp3',
'uploader': 'Lil Herb a.k.a. G Herbo',
}
}],
'params': {
'playliststart': 9,
'playlistend': 9,
'playliststart': 2,
'playlistend': 2,
}
}
]
def _real_extract(self, url):
# URLs end with [uploader name]/[uploader title]
# URLs end with [uploader name]/album/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
album_url_tag = self._match_id(url).replace('/album/', '/')
result = {'_type': 'playlist', 'entries': []}
# There is no one endpoint for album metadata - instead it is included/repeated in each song's metadata
# Therefore we don't know how many songs the album has and must infi-loop until failure
@@ -134,7 +136,7 @@ class AudiomackAlbumIE(InfoExtractor):
# Pull out the album metadata and add to result (if it exists)
for resultkey, apikey in [('id', 'album_id'), ('title', 'album_title')]:
if apikey in api_response and resultkey not in result:
result[resultkey] = api_response[apikey]
result[resultkey] = compat_str(api_response[apikey])
song_id = url_basename(api_response['url']).rpartition('.')[0]
result['entries'].append({
'id': compat_str(api_response.get('id', song_id)),

View File

@@ -48,6 +48,7 @@ class AWAANBaseIE(InfoExtractor):
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,
'uploader_id': video_data.get('user_id'),
}
@@ -107,6 +108,7 @@ class AWAANLiveIE(AWAANBaseIE):
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107',
'timestamp': 1420588800,
'uploader_id': '71',
},
'params': {
# m3u8 download

View File

@@ -47,7 +47,7 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/a4016f65fe62b81dc6664dd9f4910e4ab40383be'
_PARTNER_ID = '1719221'
def _real_extract(self, url):

View File

@@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from ..utils import extract_attributes
class BandaiChannelIE(BrightcoveNewIE):
IE_NAME = 'bandaichannel'
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
_TESTS = [{
'url': 'https://www.b-ch.com/titles/514/001',
'md5': 'a0f2d787baa5729bed71108257f613a4',
'info_dict': {
'id': '6128044564001',
'ext': 'mp4',
'title': 'メタルファイターMIKU 第1話',
'timestamp': 1580354056,
'uploader_id': '5797077852001',
'upload_date': '20200130',
'duration': 1387.733,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
attrs = extract_attributes(self._search_regex(
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
bc = self._download_json(
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
return self._parse_brightcove_metadata(bc, bc['id'])

View File

@@ -49,6 +49,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Ben Prunty',
'timestamp': 1396508491,
'upload_date': '20140403',
'release_timestamp': 1396483200,
'release_date': '20140403',
'duration': 260.877,
'track': 'Lanius (Battle)',
@@ -69,6 +70,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Mastodon',
'timestamp': 1322005399,
'upload_date': '20111122',
'release_timestamp': 1076112000,
'release_date': '20040207',
'duration': 120.79,
'track': 'Hail to Fire',
@@ -197,7 +199,7 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail,
'uploader': artist,
'timestamp': timestamp,
'release_date': unified_strdate(tralbum.get('album_release_date')),
'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
'duration': duration,
'track': track,
'track_number': track_number,

View File

@@ -1,37 +1,46 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_error,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
dict_get,
ExtractorError,
float_or_none,
get_element_by_class,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
strip_or_none,
try_get,
unescapeHTML,
unified_timestamp,
url_or_none,
urlencode_postdata,
urljoin,
)
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_urlparse,
)
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_ID_REGEX = r'(?:[pbm][\da-z]{7}|w[\da-z]{7,14})'
_ID_REGEX = r'(?:[pbml][\da-z]{7}|w[\da-z]{7,14})'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?bbc\.co\.uk/
@@ -387,9 +396,17 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_mpd_formats(
href, programme_id, mpd_id=format_id, fatal=False))
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
# TODO: let expected_status be passed into _extract_xxx_formats() instead
try:
fmts = self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
except ExtractorError as e:
if not (isinstance(e.exc_info[1], compat_urllib_error.HTTPError)
and e.exc_info[1].code in (403, 404)):
raise
fmts = []
formats.extend(fmts)
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
@@ -756,23 +773,44 @@ class BBCIE(BBCCoUkIE):
'only_matching': True,
}, {
# custom redirection to www.bbc.com
# also, video with window.__INITIAL_DATA__
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
'info_dict': {
'id': 'p02xzws1',
'ext': 'mp4',
'title': "Pluto may have 'nitrogen glaciers'",
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, {
# video with window.__INITIAL_DATA__ and value as JSON string
'url': 'https://www.bbc.com/news/av/world-europe-59468682',
'info_dict': {
'id': 'p0b71qth',
'ext': 'mp4',
'title': 'Why France is making this woman a national hero',
'description': 'md5:7affdfab80e9c3a1f976230a1ff4d5e4',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1638230731,
'upload_date': '20211130',
},
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True,
}, {
# bbcthreeConfig
'url': 'https://www.bbc.co.uk/bbcthree/clip/73d0bbd0-abc3-4cea-b3c0-cdae21905eb1',
'info_dict': {
'id': 'p06556y7',
'ext': 'mp4',
'title': 'Transfers: Cristiano Ronaldo to Man Utd, Arsenal to spend?',
'description': 'md5:4b7dfd063d5a789a1512e99662be3ddd',
'title': 'Things Not To Say to people that live on council estates',
'description': "From being labelled a 'chav', to the presumption that they're 'scroungers', people who live on council estates encounter all kinds of prejudices and false assumptions about themselves, their families, and their lifestyles. Here, eight people discuss the common statements, misconceptions, and clichés that they're tired of hearing.",
'duration': 360,
'thumbnail': r're:https?://.+/.+\.jpg',
},
'params': {
'skip_download': True,
}
}, {
# window.__PRELOADED_STATE__
'url': 'https://www.bbc.co.uk/radio/play/b0b9z4yl',
@@ -793,11 +831,25 @@ class BBCIE(BBCCoUkIE):
'description': 'Learn English words and phrases from this story',
},
'add_ie': [BBCCoUkIE.ie_key()],
}, {
# BBC Reel
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
'info_dict': {
'id': 'p07c6sb9',
'ext': 'mp4',
'title': 'How positive thinking is harming your happiness',
'alt_title': 'The downsides of positive thinking',
'description': 'md5:fad74b31da60d83b8265954ee42d85b4',
'duration': 235,
'thumbnail': r're:https?://.+/p07c9dsr.jpg',
'upload_date': '20190604',
'categories': ['Psychology'],
},
}]
@classmethod
def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
@@ -929,7 +981,7 @@ class BBCIE(BBCCoUkIE):
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
except ExtractorError as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
@@ -980,6 +1032,37 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
# bbc reel (e.g. https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness)
initial_data = self._parse_json(self._html_search_regex(
r'<script[^>]+id=(["\'])initial-data\1[^>]+data-json=(["\'])(?P<json>(?:(?!\2).)+)',
webpage, 'initial data', default='{}', group='json'), playlist_id, fatal=False)
if initial_data:
init_data = try_get(
initial_data, lambda x: x['initData']['items'][0], dict) or {}
smp_data = init_data.get('smpData') or {}
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
version_id = clip_data.get('versionID')
if version_id:
title = smp_data['title']
formats, subtitles = self._download_media_selector(version_id)
self._sort_formats(formats)
image_url = smp_data.get('holdingImageURL')
display_date = init_data.get('displayDate')
topic_title = init_data.get('topicTitle')
return {
'id': version_id,
'title': title,
'formats': formats,
'alt_title': init_data.get('shortTitle'),
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
'description': smp_data.get('summary') or init_data.get('shortSummary'),
'upload_date': display_date.replace('-', '') if display_date else None,
'subtitles': subtitles,
'duration': int_or_none(clip_data.get('duration')),
'categories': [topic_title] if topic_title else None,
}
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
@@ -1041,7 +1124,7 @@ class BBCIE(BBCCoUkIE):
thumbnail = None
image_url = current_programme.get('image_url')
if image_url:
thumbnail = image_url.replace('{recipe}', '1920x1920')
thumbnail = image_url.replace('{recipe}', 'raw')
return {
'id': programme_id,
'title': title,
@@ -1100,9 +1183,16 @@ class BBCIE(BBCCoUkIE):
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
initial_data = self._parse_json(self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), playlist_id, fatal=False)
initial_data = self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*("{.+?}")\s*;', webpage,
'quoted preload state', default=None)
if initial_data is None:
initial_data = self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?})\s*;', webpage,
'preload state', default={})
else:
initial_data = self._parse_json(initial_data or '"{}"', playlist_id, fatal=False)
initial_data = self._parse_json(initial_data, playlist_id, fatal=False)
if initial_data:
def parse_media(media):
if not media:
@@ -1114,19 +1204,39 @@ class BBCIE(BBCCoUkIE):
continue
formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats)
item_desc = None
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
if blocks:
summary = []
for block in blocks:
text = try_get(block, lambda x: x['model']['text'], compat_str)
if text:
summary.append(text)
if summary:
item_desc = '\n\n'.join(summary)
item_time = None
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
if try_get(meta, lambda x: x['label']) == 'Published':
item_time = unified_timestamp(meta.get('timestamp'))
break
entries.append({
'id': item_id,
'title': item_title,
'thumbnail': item.get('holdingImageUrl'),
'formats': formats,
'subtitles': subtitles,
'timestamp': item_time,
'description': strip_or_none(item_desc),
})
for resp in (initial_data.get('data') or {}).values():
name = resp.get('name')
if name == 'media-experience':
parse_media(try_get(resp, lambda x: x['data']['initialItem']['mediaItem'], dict))
elif name == 'article':
for block in (try_get(resp, lambda x: x['data']['blocks'], list) or []):
for block in (try_get(resp,
(lambda x: x['data']['blocks'],
lambda x: x['data']['content']['model']['blocks'],),
list) or []):
if block.get('type') != 'media':
continue
parse_media(block.get('model'))
@@ -1293,21 +1403,149 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
@staticmethod
def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
},
'playlist_mincount': 6,
'skip': 'This programme is not currently available on BBC iPlayer',
'playlist_mincount': 8,
}, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
@@ -1316,14 +1554,56 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
def _get_episode_image(self, episode):
return self._get_default(episode, 'images', 'standard')
def _get_episode_field(self, episode, field):
return episode.get(field)
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View File

@@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import ExtractorError, urlencode_postdata
class BigoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bigo\.tv/(?:[a-z]{2,}/)?(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.bigo.tv/ja/221338632',
'info_dict': {
'id': '6576287577575737440',
'ext': 'mp4',
'title': '土よ〜💁‍♂️ 休憩室/REST room',
'thumbnail': r're:https?://.+',
'uploader': '✨Shin💫',
'uploader_id': '221338632',
'is_live': True,
},
'skip': 'livestream',
}, {
'url': 'https://www.bigo.tv/th/Tarlerm1304',
'only_matching': True,
}, {
'url': 'https://bigo.tv/115976881',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
info_raw = self._download_json(
'https://ta.bigo.tv/official_website/studio/getInternalStudioInfo',
user_id, data=urlencode_postdata({'siteId': user_id}))
if not isinstance(info_raw, dict):
raise ExtractorError('Received invalid JSON data')
if info_raw.get('code'):
raise ExtractorError(
'Bigo says: %s (code %s)' % (info_raw.get('msg'), info_raw.get('code')), expected=True)
info = info_raw.get('data') or {}
if not info.get('alive'):
raise ExtractorError('This user is offline.', expected=True)
return {
'id': info.get('roomId') or user_id,
'title': info.get('roomTopic') or info.get('nick_name') or user_id,
'formats': [{
'url': info.get('hls_src'),
'ext': 'mp4',
'protocol': 'm3u8',
}],
'thumbnail': info.get('snapshot'),
'uploader': info.get('nick_name'),
'uploader_id': user_id,
'is_live': True,
}

View File

@@ -156,6 +156,7 @@ class BiliBiliIE(InfoExtractor):
cid = js['result']['cid']
headers = {
'Accept': 'application/json',
'Referer': url
}
headers.update(self.geo_verification_headers())
@@ -232,7 +233,7 @@ class BiliBiliIE(InfoExtractor):
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader': uploader_mobj.group('name').strip(),
'uploader_id': uploader_mobj.group('id'),
})
if not info.get('uploader'):
@@ -368,6 +369,11 @@ class BilibiliAudioIE(BilibiliAudioBaseIE):
'filesize': int_or_none(play_data.get('size')),
}]
for a_format in formats:
a_format.setdefault('http_headers', {}).update({
'Referer': url,
})
song = self._call_api('song/info', au_id)
title = song['title']
statistic = song.get('statistic') or {}

View File

@@ -90,13 +90,19 @@ class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
'md5': '670b2d73f48549da032861130488c681',
'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'ext': 'flv',
'ext': 'mp4',
'title': 'Cena vs. Rollins Would Expose the Heavyweight Division',
'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e',
'upload_date': '20150723',
'timestamp': 1437679032,
},
'expected_warnings': [
'Unable to download f4m manifest'
]
}]
def _real_extract(self, url):

View File

@@ -1,86 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
remove_start,
int_or_none,
)
class BlinkxIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
IE_NAME = 'blinkx'
_TEST = {
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
'md5': '337cf7a344663ec79bf93a526a2e06c7',
'info_dict': {
'id': 'Da0Gw3xc',
'ext': 'mp4',
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
'uploader': 'IGN News',
'upload_date': '20150217',
'timestamp': 1424215740,
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
'duration': 47.743333,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id[:8]
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
+ 'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
duration = None
thumbnails = []
formats = []
for m in data['media']:
if m['type'] == 'jpg':
thumbnails.append({
'url': m['link'],
'width': int(m['w']),
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
tbr = vbr + abr if vbr and abr else None
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
'vcodec': vcodec,
'acodec': acodec,
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(m.get('w')),
'height': int_or_none(m.get('h')),
})
self._sort_formats(formats)
return {
'id': display_id,
'fullid': video_id,
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,
}

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<req_id>bravotv|oxygen)\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'md5': 'e34684cfea2a96cd2ee1ef3a60909de9',
@@ -28,10 +28,13 @@ class BravoTVIE(AdobePassIE):
}, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(self._search_regex(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>({.+?})</script>', webpage, 'drupal settings'),
@@ -53,11 +56,14 @@ class BravoTVIE(AdobePassIE):
tp_path = release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('tve_adobe_auth', {})
if site == 'bravotv':
site = 'bravo'
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
adobe_pass.get('adobePassResourceId') or site,
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
url, release_pid,
adobe_pass.get('adobePassRequestorId') or site, resource)
else:
shared_playlist = settings['ls_playlist']
account_pid = shared_playlist['account_pid']

View File

@@ -7,19 +7,21 @@ from .common import InfoExtractor
from .gigya import GigyaBaseIE
from ..compat import compat_HTTPError
from ..utils import (
extract_attributes,
ExtractorError,
strip_or_none,
clean_html,
extract_attributes,
float_or_none,
get_element_by_class,
int_or_none,
merge_dicts,
str_or_none,
strip_or_none,
url_or_none,
)
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza|dako)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '68993eda72ef62386a15ea2cf3c93107',
@@ -332,3 +334,51 @@ class VrtNUIE(GigyaBaseIE):
'display_id': display_id,
'season_number': int_or_none(page.get('episode_season')),
})
class DagelijkseKostIE(InfoExtractor):
IE_DESC = 'dagelijksekost.een.be'
_VALID_URL = r'https?://dagelijksekost\.een\.be/gerechten/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://dagelijksekost.een.be/gerechten/hachis-parmentier-met-witloof',
'md5': '30bfffc323009a3e5f689bef6efa2365',
'info_dict': {
'id': 'md-ast-27a4d1ff-7d7b-425e-b84f-a4d227f592fa',
'display_id': 'hachis-parmentier-met-witloof',
'ext': 'mp4',
'title': 'Hachis parmentier met witloof',
'description': 'md5:9960478392d87f63567b5b117688cdc5',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 283.02,
},
'expected_warnings': ['is not a supported codec'],
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = strip_or_none(get_element_by_class(
'dish-metadata__title', webpage
) or self._html_search_meta(
'twitter:title', webpage))
description = clean_html(get_element_by_class(
'dish-description', webpage)
) or self._html_search_meta(
('description', 'twitter:description', 'og:description'),
webpage)
video_id = self._html_search_regex(
r'data-url=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id',
group='id')
return {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/dako/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(),
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
}

View File

@@ -27,7 +27,7 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@@ -52,6 +52,9 @@ class CBSIE(CBSBaseIE):
}, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):

View File

@@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0]
-zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View File

@@ -1,38 +1,113 @@
from __future__ import unicode_literals
from .cbs import CBSBaseIE
import re
# from .cbs import CBSBaseIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
)
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)'
# class CBSSportsEmbedIE(CBSBaseIE):
class CBSSportsEmbedIE(InfoExtractor):
IE_NAME = 'cbssports:embed'
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
(?:
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
pcid%3D(?P<pcid>\d+)
)'''
_TESTS = [{
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/',
'info_dict': {
'id': '1214315075735',
'ext': 'mp4',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
'only_matching': True,
}, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/',
'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
'only_matching': True,
}]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
# def _extract_video_info(self, filter_query, video_id):
# return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
uuid, pcid = re.match(self._VALID_URL, url).groups()
query = {'id': uuid} if uuid else {'pcid': pcid}
video = self._download_json(
'https://www.cbssports.com/api/content/video/',
uuid or pcid, query=query)[0]
video_id = video['id']
title = video['title']
metadata = video.get('metaData') or {}
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
formats = self._extract_m3u8_formats(
metadata['files'][0]['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
image = video.get('image')
thumbnails = None
if image:
image_path = image.get('path')
if image_path:
thumbnails = [{
'url': image_path,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
'filesize': int_or_none(image.get('size')),
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video.get('description'),
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
'duration': int_or_none(metadata.get('duration')),
}
class CBSSportsBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'],
webpage, 'video id')
return self._extract_video_info('byId=%s' % video_id, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
webpage, 'embed url')
return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
class CBSSportsIE(CBSSportsBaseIE):
IE_NAME = 'cbssports'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
'info_dict': {
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
'ext': 'mp4',
'title': 'Cover 3: Stanford Spring Gleaning',
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
'timestamp': 1617218398,
'upload_date': '20210331',
'duration': 502,
},
}]
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
IE_NAME = '247sports'
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
'info_dict': {
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
'ext': 'mp4',
'title': '2021 QB Jake Garcia senior highlights through five games',
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
'timestamp': 1607114223,
'upload_date': '20201204',
'duration': 208,
},
}]

View File

@@ -1,15 +1,18 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_timezone,
int_or_none,
parse_duration,
parse_iso8601,
parse_resolution,
try_get,
url_or_none,
)
@@ -24,8 +27,9 @@ class CCMAIE(InfoExtractor):
'ext': 'mp4',
'title': 'L\'espot de La Marató de TV3',
'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
'timestamp': 1470918540,
'upload_date': '20160811',
'timestamp': 1478608140,
'upload_date': '20161108',
'age_limit': 0,
}
}, {
'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/',
@@ -35,8 +39,24 @@ class CCMAIE(InfoExtractor):
'ext': 'mp3',
'title': 'El Consell de Savis analitza el derbi',
'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
'upload_date': '20171205',
'timestamp': 1512507300,
'upload_date': '20170512',
'timestamp': 1494622500,
'vcodec': 'none',
'categories': ['Esports'],
}
}, {
'url': 'http://www.ccma.cat/tv3/alacarta/crims/crims-josep-tallada-lespereu-me-capitol-1/video/6031387/',
'md5': 'b43c3d3486f430f3032b5b160d80cbc3',
'info_dict': {
'id': '6031387',
'ext': 'mp4',
'title': 'Crims - Josep Talleda, l\'"Espereu-me" (capítol 1)',
'description': 'md5:7cbdafb640da9d0d2c0f62bad1e74e60',
'timestamp': 1582577700,
'upload_date': '20200224',
'subtitles': 'mincount:4',
'age_limit': 16,
'series': 'Crims',
}
}]
@@ -72,17 +92,28 @@ class CCMAIE(InfoExtractor):
informacio = media['informacio']
title = informacio['titol']
durada = informacio.get('durada', {})
durada = informacio.get('durada') or {}
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
tematica = try_get(informacio, lambda x: x['tematica']['text'])
timestamp = None
data_utc = try_get(informacio, lambda x: x['data_emissio']['utc'])
try:
timezone, data_utc = extract_timezone(data_utc)
timestamp = calendar.timegm((datetime.datetime.strptime(
data_utc, '%Y-%d-%mT%H:%M:%S') - timezone).timetuple())
except TypeError:
pass
subtitles = {}
subtitols = media.get('subtitols', {})
if subtitols:
sub_url = subtitols.get('url')
subtitols = media.get('subtitols') or []
if isinstance(subtitols, dict):
subtitols = [subtitols]
for st in subtitols:
sub_url = st.get('url')
if sub_url:
subtitles.setdefault(
subtitols.get('iso') or subtitols.get('text') or 'ca', []).append({
st.get('iso') or st.get('text') or 'ca', []).append({
'url': sub_url,
})
@@ -97,6 +128,16 @@ class CCMAIE(InfoExtractor):
'height': int_or_none(imatges.get('alcada')),
}]
age_limit = None
codi_etic = try_get(informacio, lambda x: x['codi_etic']['id'])
if codi_etic:
codi_etic_s = codi_etic.split('_')
if len(codi_etic_s) == 2:
if codi_etic_s[1] == 'TP':
age_limit = 0
else:
age_limit = int_or_none(codi_etic_s[1])
return {
'id': media_id,
'title': title,
@@ -106,4 +147,9 @@ class CCMAIE(InfoExtractor):
'thumbnails': thumbnails,
'subtitles': subtitles,
'formats': formats,
'age_limit': age_limit,
'alt_title': informacio.get('titol_complet'),
'episode_number': int_or_none(informacio.get('capitol')),
'categories': [tematica] if tematica else None,
'series': informacio.get('programa'),
}

View File

@@ -95,8 +95,11 @@ class CDAIE(InfoExtractor):
if 'Ten film jest dostępny dla użytkowników premium' in webpage:
raise ExtractorError('This video is only available for premium users.', expected=True)
if re.search(r'niedostępn[ey] w(?:&nbsp;|\s+)Twoim kraju\s*<', webpage):
self.raise_geo_restricted()
need_confirm_age = False
if self._html_search_regex(r'(<form[^>]+action="/a/validatebirth")',
if self._html_search_regex(r'(<form[^>]+action="[^"]*/a/validatebirth[^"]*")',
webpage, 'birthday validate form', default=None):
webpage = self._download_age_confirm_page(
url, video_id, note='Confirming age')
@@ -130,6 +133,8 @@ class CDAIE(InfoExtractor):
'age_limit': 18 if need_confirm_age else 0,
}
info = self._search_json_ld(webpage, video_id, default={})
# Source: https://www.cda.pl/js/player.js?t=1606154898
def decrypt_file(a):
for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'):
@@ -194,7 +199,7 @@ class CDAIE(InfoExtractor):
handler = self._download_webpage
webpage = handler(
self._BASE_URL + href, video_id,
urljoin(self._BASE_URL, href), video_id,
'Downloading %s version information' % resolution, fatal=False)
if not webpage:
# Manually report warning because empty page is returned when
@@ -206,6 +211,4 @@ class CDAIE(InfoExtractor):
self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts(info_dict, info)

View File

@@ -17,7 +17,7 @@ import math
from ..compat import (
compat_cookiejar_Cookie,
compat_cookies,
compat_cookies_SimpleCookie,
compat_etree_Element,
compat_etree_fromstring,
compat_getpass,
@@ -230,8 +230,10 @@ class InfoExtractor(object):
uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available.
timestamp: UNIX timestamp of the moment the video became available
(uploaded).
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
@@ -1273,6 +1275,7 @@ class InfoExtractor(object):
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
author = e.get('author')
info.update({
'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')),
@@ -1280,7 +1283,11 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
# author can be an instance of 'Organization' or 'Person' types.
# both types can have 'name' property(inherited from 'Thing' type). [1]
# however some websites are using 'Text' type instead.
# 1. https://schema.org/VideoObject
'uploader': author.get('name') if isinstance(author, dict) else author if isinstance(author, compat_str) else None,
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
@@ -2064,7 +2071,7 @@ class InfoExtractor(object):
})
return entries
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}, data=None, headers={}, query={}):
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
res = self._download_xml_handle(
mpd_url, video_id,
note=note or 'Downloading MPD manifest',
@@ -2078,10 +2085,9 @@ class InfoExtractor(object):
mpd_base_url = base_url(urlh.geturl())
return self._parse_mpd_formats(
mpd_doc, mpd_id=mpd_id, mpd_base_url=mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
mpd_doc, mpd_id, mpd_base_url, mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', mpd_url=None):
"""
Parse formats from MPD manifest.
References:
@@ -2359,15 +2365,7 @@ class InfoExtractor(object):
else:
# Assuming direct URL to unfragmented media.
f['url'] = base_url
# According to [1, 5.3.5.2, Table 7, page 35] @id of Representation
# is not necessarily unique within a Period thus formats with
# the same `format_id` are quite possible. There are numerous examples
# of such manifests (see https://github.com/ytdl-org/youtube-dl/issues/15111,
# https://github.com/ytdl-org/youtube-dl/issues/13919)
full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f)
formats.append(full_info)
formats.append(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
@@ -2903,10 +2901,10 @@ class InfoExtractor(object):
self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
""" Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
return compat_cookies_SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""

View File

@@ -0,0 +1,148 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
try_get,
unified_timestamp,
update_url_query,
urljoin,
)
# compat_range
try:
if callable(xrange):
range = xrange
except (NameError, TypeError):
pass
class CPACIE(InfoExtractor):
IE_NAME = 'cpac'
_VALID_URL = r'https?://(?:www\.)?cpac\.ca/(?P<fr>l-)?episode\?id=(?P<id>[\da-f]{8}(?:-[\da-f]{4}){3}-[\da-f]{12})'
_TEST = {
# 'url': 'http://www.cpac.ca/en/programs/primetime-politics/episodes/65490909',
'url': 'https://www.cpac.ca/episode?id=fc7edcae-4660-47e1-ba61-5b7f29a9db0f',
'md5': 'e46ad699caafd7aa6024279f2614e8fa',
'info_dict': {
'id': 'fc7edcae-4660-47e1-ba61-5b7f29a9db0f',
'ext': 'mp4',
'upload_date': '20220215',
'title': 'News Conference to Celebrate National Kindness Week February 15, 2022',
'description': 'md5:466a206abd21f3a6f776cdef290c23fb',
'timestamp': 1644901200,
},
'params': {
'format': 'bestvideo',
'hls_prefer_native': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
url_lang = 'fr' if '/l-episode?' in url else 'en'
content = self._download_json(
'https://www.cpac.ca/api/1/services/contentModel.json?url=/site/website/episode/index.xml&crafterSite=cpacca&id=' + video_id,
video_id)
video_url = try_get(content, lambda x: x['page']['details']['videoUrl'], compat_str)
formats = []
if video_url:
content = content['page']
title = str_or_none(content['details']['title_%s_t' % (url_lang, )])
formats = self._extract_m3u8_formats(video_url, video_id, m3u8_id='hls', ext='mp4')
for fmt in formats:
# prefer language to match URL
fmt_lang = fmt.get('language')
if fmt_lang == url_lang:
fmt['language_preference'] = 10
elif not fmt_lang:
fmt['language_preference'] = -1
else:
fmt['language_preference'] = -10
self._sort_formats(formats)
category = str_or_none(content['details']['category_%s_t' % (url_lang, )])
def is_live(v_type):
return (v_type == 'live') if v_type is not None else None
return {
'id': video_id,
'formats': formats,
'title': title,
'description': str_or_none(content['details'].get('description_%s_t' % (url_lang, ))),
'timestamp': unified_timestamp(content['details'].get('liveDateTime')),
'category': [category] if category else None,
'thumbnail': urljoin(url, str_or_none(content['details'].get('image_%s_s' % (url_lang, )))),
'is_live': is_live(content['details'].get('type')),
}
class CPACPlaylistIE(InfoExtractor):
IE_NAME = 'cpac:playlist'
_VALID_URL = r'(?i)https?://(?:www\.)?cpac\.ca/(?:program|search|(?P<fr>emission|rechercher))\?(?:[^&]+&)*?(?P<id>(?:id=\d+|programId=\d+|key=[^&]+))'
_TESTS = [{
'url': 'https://www.cpac.ca/program?id=6',
'info_dict': {
'id': 'id=6',
'title': 'Headline Politics',
'description': 'Watch CPACs signature long-form coverage of the days pressing political events as they unfold.',
},
'playlist_count': 10,
}, {
'url': 'https://www.cpac.ca/search?key=hudson&type=all&order=desc',
'info_dict': {
'id': 'key=hudson',
'title': 'hudson',
},
'playlist_count': 22,
}, {
'url': 'https://www.cpac.ca/search?programId=50',
'info_dict': {
'id': 'programId=50',
'title': '50',
},
'playlist_count': 9,
}, {
'url': 'https://www.cpac.ca/emission?id=6',
'only_matching': True,
}, {
'url': 'https://www.cpac.ca/rechercher?key=hudson&type=all&order=desc',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
url_lang = 'fr' if any(x in url for x in ('/emission?', '/rechercher?')) else 'en'
pl_type, list_type = ('program', 'itemList') if any(x in url for x in ('/program?', '/emission?')) else ('search', 'searchResult')
api_url = (
'https://www.cpac.ca/api/1/services/contentModel.json?url=/site/website/%s/index.xml&crafterSite=cpacca&%s'
% (pl_type, video_id, ))
content = self._download_json(api_url, video_id)
entries = []
total_pages = int_or_none(try_get(content, lambda x: x['page'][list_type]['totalPages']), default=1)
for page in range(1, total_pages + 1):
if page > 1:
api_url = update_url_query(api_url, {'page': '%d' % (page, ), })
content = self._download_json(
api_url, video_id,
note='Downloading continuation - %d' % (page, ),
fatal=False)
for item in try_get(content, lambda x: x['page'][list_type]['item'], list) or []:
episode_url = urljoin(url, try_get(item, lambda x: x['url_%s_s' % (url_lang, )]))
if episode_url:
entries.append(episode_url)
return self.playlist_result(
(self.url_result(entry) for entry in entries),
playlist_id=video_id,
playlist_title=try_get(content, lambda x: x['page']['program']['title_%s_t' % (url_lang, )]) or video_id.split('=')[-1],
playlist_description=try_get(content, lambda x: x['page']['program']['description_%s_t' % (url_lang, )]),
)

View File

@@ -25,12 +25,12 @@ class CuriosityStreamBaseIE(InfoExtractor):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id):
def _call_api(self, path, video_id, query=None):
headers = {}
if self._auth_token:
headers['X-Auth-Token'] = self._auth_token
result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers)
self._API_BASE_URL + path, video_id, headers=headers, query=query)
self._handle_errors(result)
return result['data']
@@ -52,62 +52,75 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
},
'params': {
'format': 'bestvideo',
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title']
formats = []
for encoding in media.get('encodings', []):
m3u8_url = encoding.get('master_playlist_url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
encoding_url = encoding.get('url')
file_url = encoding.get('file_url')
if not encoding_url and not file_url:
continue
f = {
'width': int_or_none(encoding.get('width')),
'height': int_or_none(encoding.get('height')),
'vbr': int_or_none(encoding.get('video_bitrate')),
'abr': int_or_none(encoding.get('audio_bitrate')),
'filesize': int_or_none(encoding.get('size_in_bytes')),
'vcodec': encoding.get('video_codec'),
'acodec': encoding.get('audio_codec'),
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
for encoding_format in ('m3u8', 'mpd'):
media = self._call_api('media/' + video_id, video_id, query={
'encodingsNew': 'true',
'encodingsFormat': encoding_format,
})
for encoding in media.get('encodings', []):
playlist_url = encoding.get('master_playlist_url')
if encoding_format == 'm3u8':
# use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
formats.extend(self._extract_m3u8_formats(
playlist_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif encoding_format == 'mpd':
formats.extend(self._extract_mpd_formats(
playlist_url, video_id, mpd_id='dash', fatal=False))
encoding_url = encoding.get('url')
file_url = encoding.get('file_url')
if not encoding_url and not file_url:
continue
fmt = f.copy()
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
f = {
'width': int_or_none(encoding.get('width')),
'height': int_or_none(encoding.get('height')),
'vbr': int_or_none(encoding.get('video_bitrate')),
'abr': int_or_none(encoding.get('audio_bitrate')),
'filesize': int_or_none(encoding.get('size_in_bytes')),
'vcodec': encoding.get('video_codec'),
'acodec': encoding.get('audio_codec'),
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
continue
fmt = f.copy()
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
self._sort_formats(formats)
title = media['title']
subtitles = {}
for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file')
@@ -132,7 +145,7 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collections?|series)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
@@ -140,10 +153,13 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 17,
'playlist_mincount': 16,
}, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}, {
'url': 'https://curiositystream.com/collections/36',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -32,6 +32,18 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True,
}, {
# From https://gdcvault.com/play/1016624, empty speakerVideo
'url': 'https://sevt.dispeak.com/ubm/gdc/online12/xml/201210-822101_1349794556671DDDD.xml',
'info_dict': {
'id': '201210-822101_1349794556671DDDD',
'ext': 'flv',
'title': 'Pre-launch - Preparing to Take the Plunge',
},
}, {
# From http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru, empty slideVideo
'url': 'http://events.digitallyspeaking.com/gdc/project25/xml/p25-miyamoto1999_1282467389849HSVB.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):
@@ -84,26 +96,20 @@ class DigitallySpeakingIE(InfoExtractor):
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
for video_key, format_id, preference in (
('slide', 'slides', -2), ('speaker', 'speaker', -1)):
video_path = xpath_text(metadata, './%sVideo' % video_key)
if not video_path:
continue
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(video_path, '.flv'),
'ext': 'flv',
'format_note': '%s video' % video_key,
'quality': preference,
'preference': preference,
'format_id': format_id,
})
return formats
def _real_extract(self, url):

View File

@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@@ -10,11 +11,13 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
strip_or_none,
unified_timestamp,
)
class DPlayIE(InfoExtractor):
_PATH_REGEX = r'/(?P<id>[^/]+/[^/?#]+)'
_VALID_URL = r'''(?x)https?://
(?P<domain>
(?:www\.)?(?P<host>d
@@ -24,7 +27,7 @@ class DPlayIE(InfoExtractor):
)
)|
(?P<subdomain_country>es|it)\.dplay\.com
)/[^/]+/(?P<id>[^/]+/[^/?#]+)'''
)/[^/]+''' + _PATH_REGEX
_TESTS = [{
# non geo restricted, via secure api, unsigned download hls URL
@@ -151,56 +154,79 @@ class DPlayIE(InfoExtractor):
'only_matching': True,
}]
def _process_errors(self, e, geo_countries):
info = self._parse_json(e.cause.read().decode('utf-8'), None)
error = info['errors'][0]
error_code = error.get('code')
if error_code == 'access.denied.geoblocked':
self.raise_geo_restricted(countries=geo_countries)
elif error_code in ('access.denied.missingpackage', 'invalid.token'):
raise ExtractorError(
'This video is only available for registered users. You may want to use --cookies.', expected=True)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['Authorization'] = 'Bearer ' + self._download_json(
disco_base + 'token', display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
def _download_video_playback_info(self, disco_base, video_id, headers):
streaming = self._download_json(
disco_base + 'playback/videoPlaybackInfo/' + video_id,
video_id, headers=headers)['data']['attributes']['streaming']
streaming_list = []
for format_id, format_dict in streaming.items():
streaming_list.append({
'type': format_id,
'url': format_dict.get('url'),
})
return streaming_list
def _get_disco_api_info(self, url, display_id, disco_host, realm, country):
geo_countries = [country.upper()]
self._initialize_geo_bypass({
'countries': geo_countries,
})
disco_base = 'https://%s/' % disco_host
token = self._download_json(
disco_base + 'token', display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
headers = {
'Referer': url,
'Authorization': 'Bearer ' + token,
}
video = self._download_json(
disco_base + 'content/videos/' + display_id, display_id,
headers=headers, query={
'fields[channel]': 'name',
'fields[image]': 'height,src,width',
'fields[show]': 'name',
'fields[tag]': 'name',
'fields[video]': 'description,episodeNumber,name,publishStart,seasonNumber,videoDuration',
'include': 'images,primaryChannel,show,tags'
})
self._update_disco_api_headers(headers, disco_base, display_id, realm)
try:
video = self._download_json(
disco_base + 'content/videos/' + display_id, display_id,
headers=headers, query={
'fields[channel]': 'name',
'fields[image]': 'height,src,width',
'fields[show]': 'name',
'fields[tag]': 'name',
'fields[video]': 'description,episodeNumber,name,publishStart,seasonNumber,videoDuration',
'include': 'images,primaryChannel,show,tags'
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
self._process_errors(e, geo_countries)
raise
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name'].strip()
formats = []
try:
streaming = self._download_json(
disco_base + 'playback/videoPlaybackInfo/' + video_id,
display_id, headers=headers)['data']['attributes']['streaming']
streaming = self._download_video_playback_info(
disco_base, video_id, headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
error_code = error.get('code')
if error_code == 'access.denied.geoblocked':
self.raise_geo_restricted(countries=geo_countries)
elif error_code == 'access.denied.missingpackage':
self.raise_login_required()
raise ExtractorError(info['errors'][0]['detail'], expected=True)
self._process_errors(e, geo_countries)
raise
for format_id, format_dict in streaming.items():
for format_dict in streaming:
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
format_id = format_dict.get('type')
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
@@ -248,7 +274,7 @@ class DPlayIE(InfoExtractor):
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'description': strip_or_none(info.get('description')),
'duration': float_or_none(info.get('videoDuration'), 1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
@@ -268,3 +294,76 @@ class DPlayIE(InfoExtractor):
host = 'disco-api.' + domain if domain[0] == 'd' else 'eu2-prod.disco-api.com'
return self._get_disco_api_info(
url, display_id, host, 'dplay' + country, country)
class DiscoveryPlusIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/video' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family',
'info_dict': {
'id': '1140794',
'display_id': 'property-brothers-forever-home/food-and-family',
'ext': 'mp4',
'title': 'Food and Family',
'description': 'The brothers help a Richmond family expand their single-level home.',
'duration': 2583.113,
'timestamp': 1609304400,
'upload_date': '20201230',
'creator': 'HGTV',
'series': 'Property Brothers: Forever Home',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}]
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-client'] = 'WEB:UNKNOWN:dplus_us:15.0.0'
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
'wisteriaProperties': {
'platform': 'desktop',
'product': 'dplus_us',
},
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'us1-prod-direct.discoveryplus.com', 'go', 'us')
class HGTVDeIE(DPlayIE):
_VALID_URL = r'https?://de\.hgtv\.com/sendungen' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://de.hgtv.com/sendungen/tiny-house-klein-aber-oho/wer-braucht-schon-eine-toilette/',
'info_dict': {
'id': '151205',
'display_id': 'tiny-house-klein-aber-oho/wer-braucht-schon-eine-toilette',
'ext': 'mp4',
'title': 'Wer braucht schon eine Toilette',
'description': 'md5:05b40a27e7aed2c9172de34d459134e2',
'duration': 1177.024,
'timestamp': 1595705400,
'upload_date': '20200725',
'creator': 'HGTV',
'series': 'Tiny House - klein, aber oho',
'season_number': 3,
'episode_number': 3,
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'eu1-prod.disco-api.com', 'hgtv', 'de')

View File

@@ -1,193 +1,43 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
xpath_text,
determine_ext,
float_or_none,
ExtractorError,
)
from .zdf import ZDFIE
class DreiSatIE(InfoExtractor):
class DreiSatIE(ZDFIE):
IE_NAME = '3sat'
_GEO_COUNTRIES = ['DE']
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
_TESTS = [
{
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
'md5': 'be37228896d30a88f315b638900a026e',
'info_dict': {
'id': '45918',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': 'SCHWEIZWEIT',
'uploader_id': '100000210',
'upload_date': '20140913'
},
'params': {
'skip_download': True, # m3u8 downloads
}
_VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
_TESTS = [{
# Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html
'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html',
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
'info_dict': {
'id': '141007_ab18_10wochensommer_film',
'ext': 'mp4',
'title': 'Ab 18! - 10 Wochen Sommer',
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26',
'duration': 2660,
'timestamp': 1608604200,
'upload_date': '20201222',
},
{
'url': 'http://www.3sat.de/mediathek/mediathek.php?mode=play&obj=51066',
'only_matching': True,
}, {
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html',
'info_dict': {
'id': '140913_sendung_schweizweit',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'timestamp': 1410623100,
'upload_date': '20140913'
},
]
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
param_groups = {}
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
group_id = param_group.get(self._xpath_ns(
'id', 'http://www.w3.org/XML/1998/namespace'))
params = {}
for param in param_group:
params[param.get('name')] = param.get('value')
param_groups[group_id] = params
formats = []
for video in smil.findall(self._xpath_ns('.//video', namespace)):
src = video.get('src')
if not src:
continue
bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
group_id = video.get('paramGroup')
param_group = param_groups[group_id]
for proto in param_group['protocols'].split(','):
formats.append({
'url': '%s://%s' % (proto, param_group['host']),
'app': param_group['app'],
'play_path': src,
'ext': 'flv',
'format_id': '%s-%d' % (proto, bitrate),
'tbr': bitrate,
})
self._sort_formats(formats)
return formats
def extract_from_xml_url(self, video_id, xml_url):
doc = self._download_xml(
xml_url, video_id,
note='Downloading video info',
errnote='Failed to download video info')
status_code = xpath_text(doc, './status/statuscode')
if status_code and status_code != 'ok':
if status_code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id
else:
message = '%s returned error: %s' % (self.IE_NAME, status_code)
raise ExtractorError(message, expected=True)
title = xpath_text(doc, './/information/title', 'title', True)
urls = []
formats = []
for fnode in doc.findall('.//formitaeten/formitaet'):
video_url = xpath_text(fnode, 'url')
if not video_url or video_url in urls:
continue
urls.append(video_url)
is_available = 'http://www.metafilegenerator' not in video_url
geoloced = 'static_geoloced_online' in video_url
if not is_available or geoloced:
continue
format_id = fnode.attrib['basetype']
format_m = re.match(r'''(?x)
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
''', format_id)
ext = determine_ext(video_url, None) or format_m.group('container')
if ext == 'meta':
continue
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
video_url, video_id, fatal=False))
elif ext == 'm3u8':
# the certificates are misconfigured (see
# https://github.com/ytdl-org/youtube-dl/issues/8665)
if video_url.startswith('https://'):
continue
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id, fatal=False))
else:
quality = xpath_text(fnode, './quality')
if quality:
format_id += '-' + quality
abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
tbr = int_or_none(self._search_regex(
r'_(\d+)k', video_url, 'bitrate', None))
if tbr and vbr and not abr:
abr = tbr - vbr
formats.append({
'format_id': format_id,
'url': video_url,
'ext': ext,
'acodec': format_m.group('acodec'),
'vcodec': format_m.group('vcodec'),
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(xpath_text(fnode, './width')),
'height': int_or_none(xpath_text(fnode, './height')),
'filesize': int_or_none(xpath_text(fnode, './filesize')),
'protocol': format_m.group('proto').lower(),
})
geolocation = xpath_text(doc, './/details/geolocation')
if not formats and geolocation and geolocation != 'none':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
thumbnails = []
for node in doc.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
return {
'id': video_id,
'title': title,
'description': xpath_text(doc, './/information/detail'),
'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
'thumbnails': thumbnails,
'uploader': xpath_text(doc, './/details/originChannelTitle'),
'uploader_id': xpath_text(doc, './/details/originChannelId'),
'upload_date': upload_date,
'formats': formats,
'params': {
'skip_download': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
return self.extract_from_xml_url(video_id, details_url)
}, {
# Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
'only_matching': True,
}, {
# Same as https://www.zdf.de/wissen/nano/nano-21-mai-2019-102.html, equal media ids
'url': 'https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html',
'only_matching': True,
}]

View File

@@ -12,26 +12,35 @@ from ..utils import (
)
class EggheadCourseIE(InfoExtractor):
class EggheadBaseIE(InfoExtractor):
def _call_api(self, path, video_id, resource, fatal=True):
return self._download_json(
'https://app.egghead.io/api/v1/' + path,
video_id, 'Downloading %s JSON' % resource, fatal=fatal)
class EggheadCourseIE(EggheadBaseIE):
IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:course|playlist)s/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29,
'info_dict': {
'id': '72',
'id': '432655',
'title': 'Professor Frisby Introduces Composable Functional JavaScript',
'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
},
}
}, {
'url': 'https://app.egghead.io/playlists/professor-frisby-introduces-composable-functional-javascript',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
lessons = self._download_json(
'https://egghead.io/api/v1/series/%s/lessons' % playlist_id,
playlist_id, 'Downloading course lessons JSON')
series_path = 'series/' + playlist_id
lessons = self._call_api(
series_path + '/lessons', playlist_id, 'course lessons')
entries = []
for lesson in lessons:
@@ -44,9 +53,8 @@ class EggheadCourseIE(InfoExtractor):
entries.append(self.url_result(
lesson_url, ie=EggheadLessonIE.ie_key(), video_id=lesson_id))
course = self._download_json(
'https://egghead.io/api/v1/series/%s' % playlist_id,
playlist_id, 'Downloading course JSON', fatal=False) or {}
course = self._call_api(
series_path, playlist_id, 'course', False) or {}
playlist_id = course.get('id')
if playlist_id:
@@ -57,10 +65,10 @@ class EggheadCourseIE(InfoExtractor):
course.get('description'))
class EggheadLessonIE(InfoExtractor):
class EggheadLessonIE(EggheadBaseIE):
IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': {
@@ -74,7 +82,7 @@ class EggheadLessonIE(InfoExtractor):
'upload_date': '20161209',
'duration': 304,
'view_count': 0,
'tags': ['javascript', 'free'],
'tags': 'count:2',
},
'params': {
'skip_download': True,
@@ -83,13 +91,16 @@ class EggheadLessonIE(InfoExtractor):
}, {
'url': 'https://egghead.io/api/v1/lessons/react-add-redux-to-a-react-application',
'only_matching': True,
}, {
'url': 'https://app.egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
lesson = self._download_json(
'https://egghead.io/api/v1/lessons/%s' % display_id, display_id)
lesson = self._call_api(
'lessons/' + display_id, display_id, 'lesson')
lesson_id = compat_str(lesson['id'])
title = lesson['title']

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
unescapeHTML
merge_dicts,
)
@@ -24,7 +24,8 @@ class EroProfileIE(InfoExtractor):
'title': 'sexy babe softcore',
'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
}
},
'skip': 'Video not found',
}, {
'url': 'http://www.eroprofile.com/m/videos/view/Try-It-On-Pee_cut_2-wmv-4shared-com-file-sharing-download-movie-file',
'md5': '1baa9602ede46ce904c431f5418d8916',
@@ -77,19 +78,15 @@ class EroProfileIE(InfoExtractor):
[r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'],
webpage, 'video id', default=None)
video_url = unescapeHTML(self._search_regex(
r'<source src="([^"]+)', webpage, 'video url'))
title = self._html_search_regex(
r'Title:</th><td>([^<]+)</td>', webpage, 'title')
thumbnail = self._search_regex(
r'onclick="showVideoPlayer\(\)"><img src="([^"]+)',
webpage, 'thumbnail', fatal=False)
(r'Title:</th><td>([^<]+)</td>', r'<h1[^>]*>(.+?)</h1>'),
webpage, 'title')
return {
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'age_limit': 18,
}
})

View File

@@ -51,6 +51,10 @@ from .anvato import AnvatoIE
from .aol import AolIE
from .allocine import AllocineIE
from .aliexpress import AliExpressLiveIE
from .alsace20tv import (
Alsace20TVIE,
Alsace20TVEmbedIE,
)
from .apa import APAIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
@@ -71,7 +75,9 @@ from .arte import (
ArteTVIE,
ArteTVEmbedIE,
ArteTVPlaylistIE,
ArteTVCategoryIE,
)
from .arnes import ArnesIE
from .asiancrush import (
AsianCrushIE,
AsianCrushPlaylistIE,
@@ -90,11 +96,13 @@ from .awaan import (
)
from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bandaichannel import BandaiChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
BBCCoUkIPlayerPlaylistIE,
BBCCoUkIPlayerEpisodesIE,
BBCCoUkIPlayerGroupIE,
BBCCoUkPlaylistIE,
BBCIE,
)
@@ -111,6 +119,7 @@ from .bfmtv import (
)
from .bibeltv import BibelTVIE
from .bigflix import BigflixIE
from .bigo import BigoIE
from .bild import BildIE
from .bilibili import (
BiliBiliIE,
@@ -129,7 +138,6 @@ from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
)
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE
@@ -163,6 +171,7 @@ from .canvas import (
CanvasIE,
CanvasEenIE,
VrtNUIE,
DagelijkseKostIE,
)
from .carambatv import (
CarambaTVIE,
@@ -187,7 +196,11 @@ from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .cbssports import (
CBSSportsEmbedIE,
CBSSportsIE,
TwentyFourSevenSportsIE,
)
from .ccc import (
CCCIE,
CCCPlaylistIE,
@@ -246,6 +259,10 @@ from .commonprotocols import (
from .condenast import CondeNastIE
from .contv import CONtvIE
from .corus import CorusIE
from .cpac import (
CPACIE,
CPACPlaylistIE,
)
from .cracked import CrackedIE
from .crackle import CrackleIE
from .crooksandliars import CrooksAndLiarsIE
@@ -287,7 +304,11 @@ from .douyutv import (
DouyuShowIE,
DouyuTVIE,
)
from .dplay import DPlayIE
from .dplay import (
DPlayIE,
DiscoveryPlusIE,
HGTVDeIE,
)
from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE
@@ -416,6 +437,7 @@ from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE
from .gedidigital import GediDigitalIE
from .generic import GenericIE
from .gfycat import GfycatIE
from .giantbomb import GiantBombIE
@@ -457,6 +479,7 @@ from .hotstar import (
)
from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE
from .hrfernsehen import HRFernsehenIE
from .hrti import (
HRTiIE,
HRTiPlaylistIE,
@@ -470,8 +493,8 @@ from .hungama import (
from .hypem import HypemIE
from .ign import (
IGNIE,
OneUPIE,
PCMagIE,
IGNVideoIE,
IGNArticleIE,
)
from .iheart import (
IHeartRadioIE,
@@ -535,6 +558,7 @@ from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE
from .konserthusetplay import KonserthusetPlayIE
from .krasview import KrasViewIE
from .kth import KTHIE
from .ku6 import Ku6IE
from .kusi import KUSIIE
from .kuwo import (
@@ -586,7 +610,11 @@ from .limelight import (
LimelightChannelIE,
LimelightChannelListIE,
)
from .line import LineTVIE
from .line import (
LineTVIE,
LineLiveIE,
LineLiveChannelIE,
)
from .linkedin import (
LinkedInLearningIE,
LinkedInLearningCourseIE,
@@ -594,10 +622,6 @@ from .linkedin import (
from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
)
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
@@ -623,6 +647,7 @@ from .mangomolo import (
MangomoloLiveIE,
)
from .manyvids import ManyVidsIE
from .maoritv import MaoriTVIE
from .markiza import (
MarkizaIE,
MarkizaPageIE,
@@ -666,7 +691,10 @@ from .mixcloud import (
MixcloudUserIE,
MixcloudPlaylistIE,
)
from .mlb import MLBIE
from .mlb import (
MLBIE,
MLBVideoIE,
)
from .mnet import MnetIE
from .moevideo import MoeVideoIE
from .mofosex import (
@@ -773,7 +801,14 @@ from .nick import (
NickNightIE,
NickRuIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .niconico import (
NiconicoIE,
NiconicoPlaylistIE,
NiconicoUserIE,
NicovideoSearchIE,
NicovideoSearchDateIE,
NicovideoSearchURLIE,
)
from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE
from .ninenow import NineNowIE
@@ -867,6 +902,11 @@ from .packtpub import (
PacktPubIE,
PacktPubCourseIE,
)
from .palcomp3 import (
PalcoMP3IE,
PalcoMP3ArtistIE,
PalcoMP3VideoIE,
)
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
@@ -900,6 +940,7 @@ from .platzi import (
from .playfm import PlayFMIE
from .playplustv import PlayPlusTVIE
from .plays import PlaysTVIE
from .playstuff import PlayStuffIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
@@ -1024,6 +1065,7 @@ from .safari import (
SafariApiIE,
SafariCourseIE,
)
from .samplefocus import SampleFocusIE
from .sapo import SapoIE
from .savefrom import SaveFromIE
from .sbs import SBSIE
@@ -1056,6 +1098,11 @@ from .shared import (
VivoIE,
)
from .showroomlive import ShowRoomLiveIE
from .simplecast import (
SimplecastIE,
SimplecastEpisodeIE,
SimplecastPodcastIE,
)
from .sina import SinaIE
from .sixplay import SixPlayIE
from .skyit import (
@@ -1140,6 +1187,11 @@ from .srgssr import (
from .srmediathek import SRMediathekIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .storyfire import (
StoryFireIE,
StoryFireUserIE,
StoryFireSeriesIE,
)
from .streamable import StreamableIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
@@ -1260,6 +1312,7 @@ from .tv2 import (
TV2IE,
TV2ArticleIE,
KatsomoIE,
MTVUutisetArticleIE,
)
from .tv2dk import (
TV2DKIE,
@@ -1398,7 +1451,6 @@ from .vidme import (
VidmeUserIE,
VidmeUserLikesIE,
)
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE
from .viewlift import (
ViewLiftIE,
@@ -1458,6 +1510,7 @@ from .vrv import (
VRVSeriesIE,
)
from .vshare import VShareIE
from .vtm import VTMIE
from .medialaan import MedialaanIE
from .vube import VubeIE
from .vuclip import VuClipIE
@@ -1572,7 +1625,7 @@ from .youtube import (
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
#YoutubeSearchURLIE,
YoutubeSearchURLIE,
YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
@@ -1601,5 +1654,10 @@ from .zattoo import (
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE
from .zhihu import ZhihuIE
from .zingmp3 import (
ZingMp3IE,
ZingMp3AlbumIE,
)
from .zoom import ZoomIE
from .zype import ZypeIE

View File

@@ -521,7 +521,10 @@ class FacebookIE(InfoExtractor):
raise ExtractorError(
'The video is not available, Facebook said: "%s"' % m_msg.group(1),
expected=True)
elif '>You must log in to continue' in webpage:
elif any(p in webpage for p in (
'>You must log in to continue',
'id="login_form"',
'id="loginbutton"')):
self.raise_login_required()
if not video_data and '/watchparty/' in url:

View File

@@ -5,29 +5,23 @@ from .common import InfoExtractor
class Formula1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?formula1\.com/(?:content/fom-website/)?en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
_TESTS = [{
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'md5': '8c79e54be72078b26b89e0e111c0502b',
_VALID_URL = r'https?://(?:www\.)?formula1\.com/en/latest/video\.[^.]+\.(?P<id>\d+)\.html'
_TEST = {
'url': 'https://www.formula1.com/en/latest/video.race-highlights-spain-2016.6060988138001.html',
'md5': 'be7d3a8c2f804eb2ab2aa5d941c359f8',
'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'id': '6060988138001',
'ext': 'mp4',
'title': 'Race highlights - Spain 2016',
'timestamp': 1463332814,
'upload_date': '20160515',
'uploader_id': '6057949432001',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'only_matching': True,
}]
'add_ie': ['BrightcoveNew'],
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/6057949432001/S1WMrhjlh_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
ooyala_embed_code = self._search_regex(
r'data-videoid="([^"]+)"', webpage, 'ooyala embed code')
bc_id = self._match_id(url)
return self.url_result(
'ooyala:%s' % ooyala_embed_code, 'Ooyala', ooyala_embed_code)
self.BRIGHTCOVE_URL_TEMPLATE % bc_id, 'BrightcoveNew', bc_id)

View File

@@ -383,6 +383,10 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
}, {
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
'only_matching': True,
}, {
# "<figure id=" pattern (#28792)
'url': 'https://www.francetvinfo.fr/culture/patrimoine/incendie-de-notre-dame-de-paris/notre-dame-de-paris-de-l-incendie-de-la-cathedrale-a-sa-reconstruction_4372291.html',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -399,7 +403,8 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id')
return self._make_url_result(video_id)

View File

@@ -17,7 +17,7 @@ class FujiTVFODPlus7IE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id)
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id, 'mp4')
for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh:

View File

@@ -16,7 +16,7 @@ from ..utils import (
class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/shows/[^/]+/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:[^/]+/)?shows/[^/]+/(?P<id>[^/?#&]+)'
_NETRC_MACHINE = 'funimation'
_TOKEN = None
@@ -51,6 +51,10 @@ class FunimationIE(InfoExtractor):
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}]
def _login(self):

View File

@@ -6,6 +6,7 @@ from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
HEADRequest,
remove_start,
sanitized_Request,
smuggle_url,
urlencode_postdata,
@@ -102,6 +103,26 @@ class GDCVaultIE(InfoExtractor):
'format': 'mp4-408',
},
},
{
# Kaltura embed, whitespace between quote and embedded URL in iframe's src
'url': 'https://www.gdcvault.com/play/1025699',
'info_dict': {
'id': '0_zagynv0a',
'ext': 'mp4',
'title': 'Tech Toolbox',
'upload_date': '20190408',
'uploader_id': 'joe@blazestreaming.com',
'timestamp': 1554764629,
},
'params': {
'skip_download': True,
},
},
{
# HTML5 video
'url': 'http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru',
'only_matching': True,
},
]
def _login(self, webpage_url, display_id):
@@ -175,7 +196,18 @@ class GDCVaultIE(InfoExtractor):
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename')
start_page, 'xml filename', default=None)
if not xml_name:
info = self._parse_html5_media_entries(url, start_page, video_id)[0]
info.update({
'title': remove_start(self._search_regex(
r'>Session Name:\s*<.*?>\s*<td>(.+?)</td>', start_page,
'title', default=None) or self._og_search_title(
start_page, default=None), 'GDC Vault - '),
'id': video_id,
'display_id': display_id,
})
return info
embed_url = '%s/xml/%s' % (xml_root, xml_name)
ie_key = 'DigitallySpeaking'

View File

@@ -0,0 +1,161 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
)
class GediDigitalIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://video\.
(?:
(?:
(?:espresso\.)?repubblica
|lastampa
|ilsecoloxix
)|
(?:
iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)\.gelocal
)\.it(?:/[^/]+){2,3}?/(?P<id>\d+)(?:[/?&#]|$)'''
_TESTS = [{
'url': 'https://video.lastampa.it/politica/il-paradosso-delle-regionali-la-lega-vince-ma-sembra-aver-perso/121559/121683',
'md5': '84658d7fb9e55a6e57ecc77b73137494',
'info_dict': {
'id': '121559',
'ext': 'mp4',
'title': 'Il paradosso delle Regionali: ecco perché la Lega vince ma sembra aver perso',
'description': 'md5:de7f4d6eaaaf36c153b599b10f8ce7ca',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-full-.+?\.jpg$',
'duration': 125,
},
}, {
'url': 'https://video.espresso.repubblica.it/embed/tutti-i-video/01-ted-villa/14772/14870&width=640&height=360',
'only_matching': True,
}, {
'url': 'https://video.repubblica.it/motori/record-della-pista-a-spa-francorchamps-la-pagani-huayra-roadster-bc-stupisce/367415/367963',
'only_matching': True,
}, {
'url': 'https://video.ilsecoloxix.it/sport/cassani-e-i-brividi-azzurri-ai-mondiali-di-imola-qui-mi-sono-innamorato-del-ciclismo-da-ragazzino-incredibile-tornarci-da-ct/66184/66267',
'only_matching': True,
}, {
'url': 'https://video.iltirreno.gelocal.it/sport/dentro-la-notizia-ferrari-cosa-succede-a-maranello/141059/142723',
'only_matching': True,
}, {
'url': 'https://video.messaggeroveneto.gelocal.it/locale/maria-giovanna-elmi-covid-vaccino/138155/139268',
'only_matching': True,
}, {
'url': 'https://video.ilpiccolo.gelocal.it/dossier/big-john/dinosauro-big-john-al-via-le-visite-guidate-a-trieste/135226/135751',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimantova.gelocal.it/locale/dal-ponte-visconteo-di-valeggio-l-and-8217sos-dei-ristoratori-aprire-anche-a-cena/137310/137818',
'only_matching': True,
}, {
'url': 'https://video.mattinopadova.gelocal.it/dossier/coronavirus-in-veneto/covid-a-vo-un-anno-dopo-un-cuore-tricolore-per-non-dimenticare/138402/138964',
'only_matching': True,
}, {
'url': 'https://video.laprovinciapavese.gelocal.it/locale/mede-zona-rossa-via-alle-vaccinazioni-per-gli-over-80/137545/138120',
'only_matching': True,
}, {
'url': 'https://video.tribunatreviso.gelocal.it/dossier/coronavirus-in-veneto/ecco-le-prima-vaccinazioni-di-massa-nella-marca/134485/135024',
'only_matching': True,
}, {
'url': 'https://video.nuovavenezia.gelocal.it/locale/camion-troppo-alto-per-il-ponte-ferroviario-perde-il-carico/135734/136266',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimodena.gelocal.it/locale/modena-scoperta-la-proteina-che-predice-il-livello-di-gravita-del-covid/139109/139796',
'only_matching': True,
}, {
'url': 'https://video.lanuovaferrara.gelocal.it/locale/due-bombole-di-gpl-aperte-e-abbandonate-i-vigili-bruciano-il-gas/134391/134957',
'only_matching': True,
}, {
'url': 'https://video.corrierealpi.gelocal.it/dossier/cortina-2021-i-mondiali-di-sci-alpino/mondiali-di-sci-il-timelapse-sulla-splendida-olympia/133760/134331',
'only_matching': True,
}, {
'url': 'https://video.lasentinella.gelocal.it/locale/vestigne-centra-un-auto-e-si-ribalta/138931/139466',
'only_matching': True,
}, {
'url': 'https://video.espresso.repubblica.it/tutti-i-video/01-ted-villa/14772',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
['twitter:title', 'og:title'], webpage, fatal=True)
player_data = re.findall(
r"PlayerFactory\.setParam\('(?P<type>format|param)',\s*'(?P<name>[^']+)',\s*'(?P<val>[^']+)'\);",
webpage)
formats = []
duration = thumb = None
for t, n, v in player_data:
if t == 'format':
if n in ('video-hds-vod-ec', 'video-hls-vod-ec', 'video-viralize', 'video-youtube-pfp'):
continue
elif n.endswith('-vod-ak'):
formats.extend(self._extract_akamai_formats(
v, video_id, {'http': 'media.gedidigital.it'}))
else:
ext = determine_ext(v)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
v, video_id, 'mp4', 'm3u8_native', m3u8_id=n, fatal=False))
continue
f = {
'format_id': n,
'url': v,
}
if ext == 'mp3':
abr = int_or_none(self._search_regex(
r'-mp3-audio-(\d+)', v, 'abr', default=None))
f.update({
'abr': abr,
'tbr': abr,
'vcodec': 'none'
})
else:
mobj = re.match(r'^video-rrtv-(\d+)(?:-(\d+))?$', n)
if mobj:
f.update({
'height': int(mobj.group(1)),
'vbr': int_or_none(mobj.group(2)),
})
if not f.get('vbr'):
f['vbr'] = int_or_none(self._search_regex(
r'-video-rrtv-(\d+)', v, 'abr', default=None))
formats.append(f)
elif t == 'param':
if n in ['image_full', 'image']:
thumb = v
elif n == 'videoDuration':
duration = int_or_none(v)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(
['twitter:description', 'og:description', 'description'], webpage),
'thumbnail': thumb or self._og_search_thumbnail(webpage),
'formats': formats,
'duration': duration,
}

View File

@@ -84,7 +84,6 @@ from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .arkena import ArkenaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .kaltura import KalturaIE
@@ -126,8 +125,11 @@ from .viqeo import ViqeoIE
from .expressen import ExpressenIE
from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE
from .vk import VKIE
from .kinja import KinjaEmbedIE
from .arcpublishing import ArcPublishingIE
from .medialaan import MedialaanIE
from .simplecast import SimplecastIE
class GenericIE(InfoExtractor):
@@ -1626,31 +1628,6 @@ class GenericIE(InfoExtractor):
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': '7619da8c820e835bef21a1efa2a0fc71',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Another LiveLeak embed pattern (#13336)
{
'url': 'https://milo.yiannopoulos.net/2017/06/concealed-carry-robbery/',
'info_dict': {
'id': '2eb_1496309988',
'ext': 'mp4',
'title': 'Thief robs place where everyone was armed',
'description': 'md5:694d73ee79e535953cf2488562288eee',
'uploader': 'brazilwtf',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Duplicated embedded video URLs
{
'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
@@ -2223,6 +2200,34 @@ class GenericIE(InfoExtractor):
'duration': 1581,
},
},
{
# MyChannels SDK embed
# https://www.24kitchen.nl/populair/deskundige-dit-waarom-sommigen-gevoelig-zijn-voor-voedselallergieen
'url': 'https://www.demorgen.be/nieuws/burgemeester-rotterdam-richt-zich-in-videoboodschap-tot-relschoppers-voelt-het-goed~b0bcfd741/',
'md5': '90c0699c37006ef18e198c032d81739c',
'info_dict': {
'id': '194165',
'ext': 'mp4',
'title': 'Burgemeester Aboutaleb spreekt relschoppers toe',
'timestamp': 1611740340,
'upload_date': '20210127',
'duration': 159,
},
},
{
# Simplecast player embed
'url': 'https://www.bio.org/podcast',
'info_dict': {
'id': 'podcast',
'title': 'I AM BIO Podcast | BIO',
},
'playlist_mincount': 52,
},
{
# Sibnet embed (https://help.sibnet.ru/?sibnet_video_embed)
'url': 'https://phpbb3.x-tk.ru/bbcode-video-sibnet-t24.html',
'only_matching': True,
},
]
def report_following_redirect(self, new_url):
@@ -2462,6 +2467,9 @@ class GenericIE(InfoExtractor):
webpage = self._webpage_read_content(
full_response, url, video_id, prefix=first_bytes)
if '<title>DPG Media Privacy Gate</title>' in webpage:
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
# Is it an RSS feed, a SMIL file, an XSPF playlist or a MPD manifest?
@@ -2593,6 +2601,11 @@ class GenericIE(InfoExtractor):
if arc_urls:
return self.playlist_from_matches(arc_urls, video_id, video_title, ie=ArcPublishingIE.ie_key())
mychannels_urls = MedialaanIE._extract_urls(webpage)
if mychannels_urls:
return self.playlist_from_matches(
mychannels_urls, video_id, video_title, ie=MedialaanIE.ie_key())
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@@ -2744,6 +2757,11 @@ class GenericIE(InfoExtractor):
if odnoklassniki_url:
return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
# Look for sibnet embedded player
sibnet_urls = VKIE._extract_sibnet_urls(webpage)
if sibnet_urls:
return self.playlist_from_matches(sibnet_urls, video_id, video_title)
# Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
if mobj is not None:
@@ -2769,6 +2787,12 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
matches, video_id, video_title, getter=unescapeHTML, ie='FunnyOrDie')
# Look for Simplecast embeds
simplecast_urls = SimplecastIE._extract_urls(webpage)
if simplecast_urls:
return self.playlist_from_matches(
simplecast_urls, video_id, video_title)
# Look for BBC iPlayer embed
matches = re.findall(r'setPlaylist\("(https?://www\.bbc\.co\.uk/iplayer/[^/]+/[\da-z]{8})"\)', webpage)
if matches:
@@ -2914,7 +2938,7 @@ class GenericIE(InfoExtractor):
webpage)
if not mobj:
mobj = re.search(
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)',
r'data-video-link=["\'](?P<url>http://m\.mlb\.com/video/[^"\']+)',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB')
@@ -3129,11 +3153,6 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_urls = LiveLeakIE._extract_urls(webpage)
if liveleak_urls:
return self.playlist_from_matches(liveleak_urls, video_id, video_title)
# Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
if threeqsdn_url:
@@ -3361,6 +3380,9 @@ class GenericIE(InfoExtractor):
'url': src,
'ext': (mimetype2ext(src_type)
or ext if ext in KNOWN_EXTENSIONS else 'mp4'),
'http_headers': {
'Referer': full_response.geturl(),
},
})
if formats:
self._sort_formats(formats)
@@ -3429,7 +3451,7 @@ class GenericIE(InfoExtractor):
m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
found = filter_video(re.findall(r'<meta.*?property="og:(?:video|audio)".*?content="(.*?)"', webpage))
if not found:
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
found = re.search(

View File

@@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re
from .adobepass import AdobePassIE
from ..compat import compat_str
from ..utils import (
int_or_none,
determine_ext,
parse_age_limit,
try_get,
urlencode_postdata,
ExtractorError,
)
@@ -116,6 +118,18 @@ class GoIE(AdobePassIE):
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot',
'info_dict': {
'id': 'VDKA22600213',
'ext': 'mp4',
'title': 'Pilot',
'description': 'md5:74306df917cfc199d76d061d66bebdb4',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True,
@@ -149,14 +163,30 @@ class GoIE(AdobePassIE):
brand = site_info.get('brand')
if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id)
video_id = self._search_regex(
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
), webpage, 'video id', default=video_id)
data = self._parse_json(
self._search_regex(
r'["\']__abc_com__["\']\s*\]\s*=\s*({.+?})\s*;', webpage,
'data', default='{}'),
display_id or video_id, fatal=False)
# https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot
layout = try_get(data, lambda x: x['page']['content']['video']['layout'], dict)
video_id = None
if layout:
video_id = try_get(
layout,
(lambda x: x['videoid'], lambda x: x['video']['id']),
compat_str)
if not video_id:
video_id = self._search_regex(
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# page.analytics.videoIdCode
r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
), webpage, 'video id', default=video_id)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',

View File

@@ -7,6 +7,7 @@ from ..compat import compat_parse_qs
from ..utils import (
determine_ext,
ExtractorError,
get_element_by_class,
int_or_none,
lowercase_escape,
try_get,
@@ -237,7 +238,7 @@ class GoogleDriveIE(InfoExtractor):
if confirmation_webpage:
confirm = self._search_regex(
r'confirm=([^&"\']+)', confirmation_webpage,
'confirmation code', fatal=False)
'confirmation code', default=None)
if confirm:
confirmed_source_url = update_url_query(source_url, {
'confirm': confirm,
@@ -245,6 +246,11 @@ class GoogleDriveIE(InfoExtractor):
urlh = request_source_file(confirmed_source_url, 'confirmed source')
if urlh and urlh.headers.get('Content-Disposition'):
add_source_format(urlh)
else:
self.report_warning(
get_element_by_class('uc-error-subcaption', confirmation_webpage)
or get_element_by_class('uc-error-caption', confirmation_webpage)
or 'unable to extract confirmation code')
if not formats and reason:
raise ExtractorError(reason, expected=True)

View File

@@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from ..utils import (
int_or_none,
unified_timestamp,
unescapeHTML
)
from .common import InfoExtractor
class HRFernsehenIE(InfoExtractor):
IE_NAME = 'hrfernsehen'
_VALID_URL = r'^https?://www\.(?:hr-fernsehen|hessenschau)\.de/.*,video-(?P<id>[0-9]{6})\.html'
_TESTS = [{
'url': 'https://www.hessenschau.de/tv-sendung/hessenschau-vom-26082020,video-130546.html',
'md5': '5c4e0ba94677c516a2f65a84110fc536',
'info_dict': {
'id': '130546',
'ext': 'mp4',
'description': 'Sturmtief Kirsten fegt über Hessen / Die Corona-Pandemie eine Chronologie / '
'Sterbehilfe: Die Lage in Hessen / Miss Hessen leitet zwei eigene Unternehmen / '
'Pop-Up Museum zeigt Schwarze Unterhaltung und Black Music',
'subtitles': {'de': [{
'url': 'https://hr-a.akamaihd.net/video/as/hessenschau/2020_08/hrLogo_200826200407_L385592_512x288-25p-500kbit.vtt'
}]},
'timestamp': 1598470200,
'upload_date': '20200826',
'thumbnail': 'https://www.hessenschau.de/tv-sendung/hs_ganz-1554~_t-1598465545029_v-16to9__medium.jpg',
'title': 'hessenschau vom 26.08.2020'
}
}, {
'url': 'https://www.hr-fernsehen.de/sendungen-a-z/mex/sendungen/fair-und-gut---was-hinter-aldis-eigenem-guetesiegel-steckt,video-130544.html',
'only_matching': True
}]
_GEO_COUNTRIES = ['DE']
def extract_airdate(self, loader_data):
airdate_str = loader_data.get('mediaMetadata', {}).get('agf', {}).get('airdate')
if airdate_str is None:
return None
return unified_timestamp(airdate_str)
def extract_formats(self, loader_data):
stream_formats = []
for stream_obj in loader_data["videoResolutionLevels"]:
stream_format = {
'format_id': str(stream_obj['verticalResolution']) + "p",
'height': stream_obj['verticalResolution'],
'url': stream_obj['url'],
}
quality_information = re.search(r'([0-9]{3,4})x([0-9]{3,4})-([0-9]{2})p-([0-9]{3,4})kbit',
stream_obj['url'])
if quality_information:
stream_format['width'] = int_or_none(quality_information.group(1))
stream_format['height'] = int_or_none(quality_information.group(2))
stream_format['fps'] = int_or_none(quality_information.group(3))
stream_format['tbr'] = int_or_none(quality_information.group(4))
stream_formats.append(stream_format)
self._sort_formats(stream_formats)
return stream_formats
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
['og:title', 'twitter:title', 'name'], webpage)
description = self._html_search_meta(
['description'], webpage)
loader_str = unescapeHTML(self._search_regex(r"data-new-hr-mediaplayer-loader='([^']*)'", webpage, "ardloader"))
loader_data = json.loads(loader_str)
info = {
'id': video_id,
'title': title,
'description': description,
'formats': self.extract_formats(loader_data),
'timestamp': self.extract_airdate(loader_data)
}
if "subtitle" in loader_data:
info["subtitles"] = {"de": [{"url": loader_data["subtitle"]}]}
thumbnails = list(set([t for t in loader_data.get("previewImageUrl", {}).values()]))
if len(thumbnails) > 0:
info["thumbnails"] = [{"url": t} for t in thumbnails]
return info

View File

@@ -3,230 +3,255 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
HEADRequest,
determine_ext,
int_or_none,
parse_iso8601,
strip_or_none,
try_get,
)
class IGNIE(InfoExtractor):
class IGNBaseIE(InfoExtractor):
def _call_api(self, slug):
return self._download_json(
'http://apis.ign.com/{0}/v3/{0}s/slug/{1}'.format(self._PAGE_TYPE, slug), slug)
class IGNIE(IGNBaseIE):
"""
Extractor for some of the IGN sites, like www.ign.com, es.ign.com de.ign.com.
Some videos of it.ign.com are also supported
"""
_VALID_URL = r'https?://.+?\.ign\.com/(?:[^/]+/)?(?P<type>videos|show_videos|articles|feature|(?:[^/]+/\d+/video))(/.+)?/(?P<name_or_id>.+)'
_VALID_URL = r'https?://(?:.+?\.ign|www\.pcmag)\.com/videos/(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[^/?&#]+)'
IE_NAME = 'ign.com'
_PAGE_TYPE = 'video'
_API_URL_TEMPLATE = 'http://apis.ign.com/video/v3/videos/%s'
_EMBED_RE = r'<iframe[^>]+?["\']((?:https?:)?//.+?\.ign\.com.+?/embed.+?)["\']'
_TESTS = [
{
'url': 'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
'md5': 'febda82c4bafecd2d44b6e1a18a595f8',
'info_dict': {
'id': '8f862beef863986b2785559b9e1aa599',
'ext': 'mp4',
'title': 'The Last of Us Review',
'description': 'md5:c8946d4260a4d43a00d5ae8ed998870c',
'timestamp': 1370440800,
'upload_date': '20130605',
'uploader_id': 'cberidon@ign.com',
}
},
{
'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
'info_dict': {
'id': '100-little-things-in-gta-5-that-will-blow-your-mind',
},
'playlist': [
{
'info_dict': {
'id': '5ebbd138523268b93c9141af17bec937',
'ext': 'mp4',
'title': 'GTA 5 Video Review',
'description': 'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
'timestamp': 1379339880,
'upload_date': '20130916',
'uploader_id': 'danieljkrupa@gmail.com',
},
},
{
'info_dict': {
'id': '638672ee848ae4ff108df2a296418ee2',
'ext': 'mp4',
'title': '26 Twisted Moments from GTA 5 in Slow Motion',
'description': 'The twisted beauty of GTA 5 in stunning slow motion.',
'timestamp': 1386878820,
'upload_date': '20131212',
'uploader_id': 'togilvie@ign.com',
},
},
],
'params': {
'skip_download': True,
},
},
{
'url': 'http://www.ign.com/articles/2014/08/15/rewind-theater-wild-trailer-gamescom-2014?watch',
'md5': '618fedb9c901fd086f6f093564ef8558',
'info_dict': {
'id': '078fdd005f6d3c02f63d795faa1b984f',
'ext': 'mp4',
'title': 'Rewind Theater - Wild Trailer Gamescom 2014',
'description': 'Brian and Jared explore Michel Ancel\'s captivating new preview.',
'timestamp': 1408047180,
'upload_date': '20140814',
'uploader_id': 'jamesduggan1990@gmail.com',
},
},
{
'url': 'http://me.ign.com/en/videos/112203/video/how-hitman-aims-to-be-different-than-every-other-s',
'only_matching': True,
},
{
'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
'only_matching': True,
},
{
# videoId pattern
'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned',
'only_matching': True,
},
]
def _find_video_id(self, webpage):
res_id = [
r'"video_id"\s*:\s*"(.*?)"',
r'class="hero-poster[^"]*?"[^>]*id="(.+?)"',
r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
r'videoId&quot;\s*:\s*&quot;(.+?)&quot;',
r'videoId["\']\s*:\s*["\']([^"\']+?)["\']',
]
return self._search_regex(res_id, webpage, 'video id', default=None)
_TESTS = [{
'url': 'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
'md5': 'd2e1586d9987d40fad7867bf96a018ea',
'info_dict': {
'id': '8f862beef863986b2785559b9e1aa599',
'ext': 'mp4',
'title': 'The Last of Us Review',
'description': 'md5:c8946d4260a4d43a00d5ae8ed998870c',
'timestamp': 1370440800,
'upload_date': '20130605',
'tags': 'count:9',
}
}, {
'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data',
'md5': 'f1581a6fe8c5121be5b807684aeac3f6',
'info_dict': {
'id': 'ee10d774b508c9b8ec07e763b9125b91',
'ext': 'mp4',
'title': 'What\'s New Now: Is GoGo Snooping on Your Data?',
'description': 'md5:817a20299de610bd56f13175386da6fa',
'timestamp': 1420571160,
'upload_date': '20150106',
'tags': 'count:4',
}
}, {
'url': 'https://www.ign.com/videos/is-a-resident-evil-4-remake-on-the-way-ign-daily-fix',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name_or_id = mobj.group('name_or_id')
page_type = mobj.group('type')
webpage = self._download_webpage(url, name_or_id)
if page_type != 'video':
multiple_urls = re.findall(
r'<param name="flashvars"[^>]*value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)
if multiple_urls:
entries = [self.url_result(u, ie='IGN') for u in multiple_urls]
return {
'_type': 'playlist',
'id': name_or_id,
'entries': entries,
}
video_id = self._find_video_id(webpage)
if not video_id:
return self.url_result(self._search_regex(
self._EMBED_RE, webpage, 'embed url'))
return self._get_video_info(video_id)
def _get_video_info(self, video_id):
api_data = self._download_json(
self._API_URL_TEMPLATE % video_id, video_id)
display_id = self._match_id(url)
video = self._call_api(display_id)
video_id = video['videoId']
metadata = video['metadata']
title = metadata.get('longTitle') or metadata.get('title') or metadata['name']
formats = []
m3u8_url = api_data['refs'].get('m3uUrl')
refs = video.get('refs') or {}
m3u8_url = refs.get('m3uUrl')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
f4m_url = api_data['refs'].get('f4mUrl')
f4m_url = refs.get('f4mUrl')
if f4m_url:
formats.extend(self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False))
for asset in api_data['assets']:
for asset in (video.get('assets') or []):
asset_url = asset.get('url')
if not asset_url:
continue
formats.append({
'url': asset['url'],
'tbr': asset.get('actual_bitrate_kbps'),
'fps': asset.get('frame_rate'),
'url': asset_url,
'tbr': int_or_none(asset.get('bitrate'), 1000),
'fps': int_or_none(asset.get('frame_rate')),
'height': int_or_none(asset.get('height')),
'width': int_or_none(asset.get('width')),
})
mezzanine_url = try_get(video, lambda x: x['system']['mezzanineUrl'])
if mezzanine_url:
formats.append({
'ext': determine_ext(mezzanine_url, 'mp4'),
'format_id': 'mezzanine',
'preference': 1,
'url': mezzanine_url,
})
self._sort_formats(formats)
thumbnails = [{
'url': thumbnail['url']
} for thumbnail in api_data.get('thumbnails', [])]
thumbnails = []
for thumbnail in (video.get('thumbnails') or []):
thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
})
metadata = api_data['metadata']
tags = []
for tag in (video.get('tags') or []):
display_name = tag.get('displayName')
if not display_name:
continue
tags.append(display_name)
return {
'id': api_data.get('videoId') or video_id,
'title': metadata.get('longTitle') or metadata.get('name') or metadata.get['title'],
'description': metadata.get('description'),
'id': video_id,
'title': title,
'description': strip_or_none(metadata.get('description')),
'timestamp': parse_iso8601(metadata.get('publishDate')),
'duration': int_or_none(metadata.get('duration')),
'display_id': metadata.get('slug') or video_id,
'uploader_id': metadata.get('creator'),
'display_id': display_id,
'thumbnails': thumbnails,
'formats': formats,
'tags': tags,
}
class OneUPIE(IGNIE):
_VALID_URL = r'https?://gamevideos\.1up\.com/(?P<type>video)/id/(?P<name_or_id>.+)\.html'
IE_NAME = '1up.com'
class IGNVideoIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.ign\.com/(?:[a-z]{2}/)?[^/]+/(?P<id>\d+)/(?:video|trailer)/'
_TESTS = [{
'url': 'http://gamevideos.1up.com/video/id/34976.html',
'md5': 'c9cc69e07acb675c31a16719f909e347',
'url': 'http://me.ign.com/en/videos/112203/video/how-hitman-aims-to-be-different-than-every-other-s',
'md5': 'dd9aca7ed2657c4e118d8b261e5e9de1',
'info_dict': {
'id': '34976',
'id': 'e9be7ea899a9bbfc0674accc22a36cc8',
'ext': 'mp4',
'title': 'Sniper Elite V2 - Trailer',
'description': 'md5:bf0516c5ee32a3217aa703e9b1bc7826',
'timestamp': 1313099220,
'upload_date': '20110811',
'uploader_id': 'IGN',
'title': 'How Hitman Aims to Be Different Than Every Other Stealth Game - NYCC 2015',
'description': 'Taking out assassination targets in Hitman has never been more stylish.',
'timestamp': 1444665600,
'upload_date': '20151012',
}
}, {
'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
'only_matching': True,
}, {
# Youtube embed
'url': 'https://me.ign.com/ar/ratchet-clank-rift-apart/144327/trailer/embed',
'only_matching': True,
}, {
# Twitter embed
'url': 'http://adria.ign.com/sherlock-season-4/9687/trailer/embed',
'only_matching': True,
}, {
# Vimeo embed
'url': 'https://kr.ign.com/bic-2018/3307/trailer/embed',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
result = super(OneUPIE, self)._real_extract(url)
result['id'] = mobj.group('name_or_id')
return result
video_id = self._match_id(url)
req = HEADRequest(url.rsplit('/', 1)[0] + '/embed')
url = self._request_webpage(req, video_id).geturl()
ign_url = compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('url', [None])[0]
if ign_url:
return self.url_result(ign_url, IGNIE.ie_key())
return self.url_result(url)
class PCMagIE(IGNIE):
_VALID_URL = r'https?://(?:www\.)?pcmag\.com/(?P<type>videos|article2)(/.+)?/(?P<name_or_id>.+)'
IE_NAME = 'pcmag'
_EMBED_RE = r'iframe\.setAttribute\("src",\s*__util.objToUrlString\("http://widgets\.ign\.com/video/embed/content\.html?[^"]*url=([^"]+)["&]'
class IGNArticleIE(IGNBaseIE):
_VALID_URL = r'https?://.+?\.ign\.com/(?:articles(?:/\d{4}/\d{2}/\d{2})?|(?:[a-z]{2}/)?feature/\d+)/(?P<id>[^/?&#]+)'
_PAGE_TYPE = 'article'
_TESTS = [{
'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data',
'md5': '212d6154fd0361a2781075f1febbe9ad',
'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
'info_dict': {
'id': 'ee10d774b508c9b8ec07e763b9125b91',
'ext': 'mp4',
'title': '010615_What\'s New Now: Is GoGo Snooping on Your Data?',
'description': 'md5:a7071ae64d2f68cc821c729d4ded6bb3',
'timestamp': 1420571160,
'upload_date': '20150106',
'uploader_id': 'cozzipix@gmail.com',
}
'id': '524497489e4e8ff5848ece34',
'title': '100 Little Things in GTA 5 That Will Blow Your Mind',
},
'playlist': [
{
'info_dict': {
'id': '5ebbd138523268b93c9141af17bec937',
'ext': 'mp4',
'title': 'GTA 5 Video Review',
'description': 'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
'timestamp': 1379339880,
'upload_date': '20130916',
},
},
{
'info_dict': {
'id': '638672ee848ae4ff108df2a296418ee2',
'ext': 'mp4',
'title': '26 Twisted Moments from GTA 5 in Slow Motion',
'description': 'The twisted beauty of GTA 5 in stunning slow motion.',
'timestamp': 1386878820,
'upload_date': '20131212',
},
},
],
'params': {
'playlist_items': '2-3',
'skip_download': True,
},
}, {
'url': 'http://www.pcmag.com/article2/0,2817,2470156,00.asp',
'md5': '94130c1ca07ba0adb6088350681f16c1',
'url': 'http://www.ign.com/articles/2014/08/15/rewind-theater-wild-trailer-gamescom-2014?watch',
'info_dict': {
'id': '042e560ba94823d43afcb12ddf7142ca',
'ext': 'mp4',
'title': 'HTC\'s Weird New Re Camera - What\'s New Now',
'description': 'md5:53433c45df96d2ea5d0fda18be2ca908',
'timestamp': 1412953920,
'upload_date': '20141010',
'uploader_id': 'chris_snyder@pcmag.com',
}
'id': '53ee806780a81ec46e0790f8',
'title': 'Rewind Theater - Wild Trailer Gamescom 2014',
},
'playlist_count': 2,
}, {
# videoId pattern
'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned',
'only_matching': True,
}, {
# Youtube embed
'url': 'https://www.ign.com/articles/2021-mvp-named-in-puppy-bowl-xvii',
'only_matching': True,
}, {
# IMDB embed
'url': 'https://www.ign.com/articles/2014/08/07/sons-of-anarchy-final-season-trailer',
'only_matching': True,
}, {
# Facebook embed
'url': 'https://www.ign.com/articles/2017/09/20/marvels-the-punisher-watch-the-new-trailer-for-the-netflix-series',
'only_matching': True,
}, {
# Brightcove embed
'url': 'https://www.ign.com/articles/2016/01/16/supergirl-goes-flying-with-martian-manhunter-in-new-clip',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
article = self._call_api(display_id)
def entries():
media_url = try_get(article, lambda x: x['mediaRelations'][0]['media']['metadata']['url'])
if media_url:
yield self.url_result(media_url, IGNIE.ie_key())
for content in (article.get('content') or []):
for video_url in re.findall(r'(?:\[(?:ignvideo\s+url|youtube\s+clip_id)|<iframe[^>]+src)="([^"]+)"', content):
yield self.url_result(video_url)
return self.playlist_result(
entries(), article.get('articleId'),
strip_or_none(try_get(article, lambda x: x['metadata']['headline'])))

View File

@@ -12,6 +12,7 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
float_or_none,
get_element_by_attribute,
int_or_none,
lowercase_escape,
@@ -32,6 +33,7 @@ class InstagramIE(InfoExtractor):
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1371748545,
'upload_date': '20130620',
'uploader_id': 'naomipq',
@@ -48,6 +50,7 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4',
'title': 'Video by britneyspears',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1453760977,
'upload_date': '20160125',
'uploader_id': 'britneyspears',
@@ -86,6 +89,24 @@ class InstagramIE(InfoExtractor):
'title': 'Post by instagram',
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
},
}, {
# IGTV
'url': 'https://www.instagram.com/tv/BkfuX9UB-eK/',
'info_dict': {
'id': 'BkfuX9UB-eK',
'ext': 'mp4',
'title': 'Fingerboarding Tricks with @cass.fb',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 53.83,
'timestamp': 1530032919,
'upload_date': '20180626',
'uploader_id': 'instagram',
'uploader': 'Instagram',
'like_count': int,
'comment_count': int,
'comments': list,
'description': 'Meet Cass Hirst (@cass.fb), a fingerboarding pro who can perform tiny ollies and kickflips while blindfolded.',
}
}, {
'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True,
@@ -159,7 +180,9 @@ class InstagramIE(InfoExtractor):
description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption')
title = media.get('title')
thumbnail = media.get('display_src') or media.get('display_url')
duration = float_or_none(media.get('video_duration'))
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
@@ -200,9 +223,10 @@ class InstagramIE(InfoExtractor):
continue
entries.append({
'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num,
'title': node.get('title') or 'Video %d' % edge_num,
'url': node_video_url,
'thumbnail': node.get('display_url'),
'duration': float_or_none(node.get('video_duration')),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')),
@@ -239,8 +263,9 @@ class InstagramIE(InfoExtractor):
'id': video_id,
'formats': formats,
'ext': 'mp4',
'title': 'Video by %s' % uploader_id,
'title': title or 'Video by %s' % uploader_id,
'description': description,
'duration': duration,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader_id': uploader_id,

View File

@@ -29,34 +29,51 @@ class JamendoIE(InfoExtractor):
'id': '196219',
'display_id': 'stories-from-emona-i',
'ext': 'flac',
'title': 'Maya Filipič - Stories from Emona I',
'artist': 'Maya Filipič',
# 'title': 'Maya Filipič - Stories from Emona I',
'title': 'Stories from Emona I',
# 'artist': 'Maya Filipič',
'track': 'Stories from Emona I',
'duration': 210,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1217438117,
'upload_date': '20080730',
'license': 'by-nc-nd',
'view_count': int,
'like_count': int,
'average_rating': int,
'tags': ['piano', 'peaceful', 'newage', 'strings', 'upbeat'],
}
}, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True,
}]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url):
track_id, display_id = self._VALID_URL_RE.match(url).groups()
webpage = self._download_webpage(
'https://www.jamendo.com/track/' + track_id, track_id)
models = self._parse_json(self._html_search_regex(
r"data-bundled-models='([^']+)",
webpage, 'bundled models'), track_id)
track = models['track']['models'][0]
# webpage = self._download_webpage(
# 'https://www.jamendo.com/track/' + track_id, track_id)
# models = self._parse_json(self._html_search_regex(
# r"data-bundled-models='([^']+)",
# webpage, 'bundled models'), track_id)
# track = models['track']['models'][0]
track = self._call_api('track', track_id)
title = track_name = track['name']
get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
artist = get_model('artist')
artist_name = artist.get('name')
if artist_name:
title = '%s - %s' % (artist_name, title)
album = get_model('album')
# get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
# artist = get_model('artist')
# artist_name = artist.get('name')
# if artist_name:
# title = '%s - %s' % (artist_name, title)
# album = get_model('album')
formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@@ -74,7 +91,7 @@ class JamendoIE(InfoExtractor):
urls = []
thumbnails = []
for _, covers in track.get('cover', {}).items():
for covers in (track.get('cover') or {}).values():
for cover_id, cover_url in covers.items():
if not cover_url or cover_url in urls:
continue
@@ -88,13 +105,14 @@ class JamendoIE(InfoExtractor):
})
tags = []
for tag in track.get('tags', []):
for tag in (track.get('tags') or []):
tag_name = tag.get('name')
if not tag_name:
continue
tags.append(tag_name)
stats = track.get('stats') or {}
license = track.get('licenseCC') or []
return {
'id': track_id,
@@ -103,11 +121,11 @@ class JamendoIE(InfoExtractor):
'title': title,
'description': track.get('description'),
'duration': int_or_none(track.get('duration')),
'artist': artist_name,
# 'artist': artist_name,
'track': track_name,
'album': album.get('name'),
# 'album': album.get('name'),
'formats': formats,
'license': '-'.join(track.get('licenseCC', [])) or None,
'license': '-'.join(license) if license else None,
'timestamp': int_or_none(track.get('dateCreated')),
'view_count': int_or_none(stats.get('listenedAll')),
'like_count': int_or_none(stats.get('favorited')),
@@ -116,9 +134,9 @@ class JamendoIE(InfoExtractor):
}
class JamendoAlbumIE(InfoExtractor):
class JamendoAlbumIE(JamendoIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
_TEST = {
_TESTS = [{
'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': {
'id': '121486',
@@ -151,17 +169,7 @@ class JamendoAlbumIE(InfoExtractor):
'params': {
'playlistend': 2
}
}
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
}]
def _real_extract(self, url):
album_id = self._match_id(url)
@@ -169,7 +177,7 @@ class JamendoAlbumIE(InfoExtractor):
album_name = album.get('name')
entries = []
for track in album.get('tracks', []):
for track in (album.get('tracks') or []):
track_id = track.get('id')
if not track_id:
continue

View File

@@ -3,10 +3,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
str_or_none,
strip_or_none,
try_get,
unified_timestamp,
update_url_query,
)
@@ -23,7 +26,7 @@ class KakaoIE(InfoExtractor):
'id': '301965083',
'ext': 'mp4',
'title': '乃木坂46 バナナマン 「3期生紹介コーナーが始動顔高低差GPも」 『乃木坂工事中』',
'uploader_id': 2671005,
'uploader_id': '2671005',
'uploader': '그랑그랑이',
'timestamp': 1488160199,
'upload_date': '20170227',
@@ -36,11 +39,15 @@ class KakaoIE(InfoExtractor):
'ext': 'mp4',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
'uploader_id': 2653210,
'uploader_id': '2653210',
'uploader': '쇼! 음악중심',
'timestamp': 1485684628,
'upload_date': '20170129',
}
}, {
# geo restricted
'url': 'https://tv.kakao.com/channel/3643855/cliplink/412069491',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -68,8 +75,7 @@ class KakaoIE(InfoExtractor):
'fields': ','.join([
'-*', 'tid', 'clipLink', 'displayTitle', 'clip', 'title',
'description', 'channelId', 'createTime', 'duration', 'playCount',
'likeCount', 'commentCount', 'tagList', 'channel', 'name',
'clipChapterThumbnailList', 'thumbnailUrl', 'timeInSec', 'isDefault',
'likeCount', 'commentCount', 'tagList', 'channel', 'name', 'thumbnailUrl',
'videoOutputList', 'width', 'height', 'kbps', 'profile', 'label'])
}
@@ -82,24 +88,28 @@ class KakaoIE(InfoExtractor):
title = clip.get('title') or clip_link.get('displayTitle')
query['tid'] = impress.get('tid', '')
query.update({
'fields': '-*,code,message,url',
'tid': impress.get('tid') or '',
})
formats = []
for fmt in clip.get('videoOutputList', []):
for fmt in (clip.get('videoOutputList') or []):
try:
profile_name = fmt['profile']
if profile_name == 'AUDIO':
continue
query.update({
'profile': profile_name,
'fields': '-*,url',
})
fmt_url_json = self._download_json(
api_base + 'raw/videolocation', display_id,
'Downloading video URL for profile %s' % profile_name,
query=query, headers=player_header, fatal=False)
if fmt_url_json is None:
query['profile'] = profile_name
try:
fmt_url_json = self._download_json(
api_base + 'raw/videolocation', display_id,
'Downloading video URL for profile %s' % profile_name,
query=query, headers=player_header)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
resp = self._parse_json(e.cause.read().decode(), video_id)
if resp.get('code') == 'GeoBlocked':
self.raise_geo_restricted()
continue
fmt_url = fmt_url_json['url']
@@ -116,27 +126,13 @@ class KakaoIE(InfoExtractor):
pass
self._sort_formats(formats)
thumbs = []
for thumb in clip.get('clipChapterThumbnailList', []):
thumbs.append({
'url': thumb.get('thumbnailUrl'),
'id': compat_str(thumb.get('timeInSec')),
'preference': -1 if thumb.get('isDefault') else 0
})
top_thumbnail = clip.get('thumbnailUrl')
if top_thumbnail:
thumbs.append({
'url': top_thumbnail,
'preference': 10,
})
return {
'id': display_id,
'title': title,
'description': strip_or_none(clip.get('description')),
'uploader': clip_link.get('channel', {}).get('name'),
'uploader_id': clip_link.get('channelId'),
'thumbnails': thumbs,
'uploader': try_get(clip_link, lambda x: x['channel']['name']),
'uploader_id': str_or_none(clip_link.get('channelId')),
'thumbnail': clip.get('thumbnailUrl'),
'timestamp': unified_timestamp(clip_link.get('createTime')),
'duration': int_or_none(clip.get('duration')),
'view_count': int_or_none(clip.get('playCount')),

View File

@@ -120,7 +120,7 @@ class KalturaIE(InfoExtractor):
def _extract_urls(webpage):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
finditer = (
re.finditer(
list(re.finditer(
r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\(
\{.*?
@@ -128,8 +128,8 @@ class KalturaIE(InfoExtractor):
(?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage)
or re.finditer(
""", webpage))
or list(re.finditer(
r'''(?xs)
(?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@@ -142,16 +142,16 @@ class KalturaIE(InfoExtractor):
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
)
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage)
or re.finditer(
''', webpage))
or list(re.finditer(
r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])\s*
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)*
(?P=q1)
''', webpage)
''', webpage))
)
urls = []
for mobj in finditer:
@@ -373,5 +373,5 @@ class KalturaIE(InfoExtractor):
'duration': info.get('duration'),
'timestamp': info.get('createdAt'),
'uploader_id': info.get('userId') if info.get('userId') != 'None' else None,
'view_count': info.get('plays'),
'view_count': int_or_none(info.get('plays')),
}

View File

@@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class KTHIE(InfoExtractor):
_VALID_URL = r'https?://play\.kth\.se/(?:[^/]+/)+(?P<id>[a-z0-9_]+)'
_TEST = {
'url': 'https://play.kth.se/media/Lunch+breakA+De+nya+aff%C3%A4rerna+inom+Fordonsdalen/0_uoop6oz9',
'md5': 'd83ada6d00ca98b73243a88efe19e8a6',
'info_dict': {
'id': '0_uoop6oz9',
'ext': 'mp4',
'title': 'md5:bd1d6931facb6828762a33e6ce865f37',
'thumbnail': 're:https?://.+/thumbnail/.+',
'duration': 3516,
'timestamp': 1647345358,
'upload_date': '20220315',
'uploader_id': 'md5:0ec23e33a89e795a4512930c8102509f',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
result = self.url_result(
smuggle_url('kaltura:308:%s' % video_id, {
'service_url': 'https://api.kaltura.nordu.net'}),
'Kaltura')
return result

View File

@@ -6,8 +6,10 @@ import json
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
from ..utils import (
determine_ext,
@@ -60,6 +62,7 @@ class LBRYBaseIE(InfoExtractor):
'description': stream_value.get('description'),
'license': stream_value.get('license'),
'timestamp': int_or_none(stream.get('timestamp')),
'release_timestamp': int_or_none(stream_value.get('release_time')),
'tags': stream_value.get('tags'),
'duration': int_or_none(media.get('duration')),
'channel': try_get(signing_channel, lambda x: x['value']['title']),
@@ -92,6 +95,8 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:f6cb5c704b332d37f5119313c2c98f51',
'timestamp': 1595694354,
'upload_date': '20200725',
'release_timestamp': 1595340697,
'release_date': '20200721',
'width': 1280,
'height': 720,
}
@@ -106,6 +111,8 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:661ac4f1db09f31728931d7b88807a61',
'timestamp': 1591312601,
'upload_date': '20200604',
'release_timestamp': 1591312421,
'release_date': '20200604',
'tags': list,
'duration': 2570,
'channel': 'The LBRY Foundation',
@@ -113,6 +120,26 @@ class LBRYIE(LBRYBaseIE):
'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212',
'vcodec': 'none',
}
}, {
# HLS
'url': 'https://odysee.com/@gardeningincanada:b/plants-i-will-never-grow-again.-the:e',
'md5': 'fc82f45ea54915b1495dd7cb5cc1289f',
'info_dict': {
'id': 'e51671357333fe22ae88aad320bde2f6f96b1410',
'ext': 'mp4',
'title': 'PLANTS I WILL NEVER GROW AGAIN. THE BLACK LIST PLANTS FOR A CANADIAN GARDEN | Gardening in Canada 🍁',
'description': 'md5:9c539c6a03fb843956de61a4d5288d5e',
'timestamp': 1618254123,
'upload_date': '20210412',
'release_timestamp': 1618254002,
'release_date': '20210412',
'tags': list,
'duration': 554,
'channel': 'Gardening In Canada',
'channel_id': 'b8be0e93b423dad221abe29545fbe8ec36e806bc',
'channel_url': 'https://odysee.com/@gardeningincanada:b8be0e93b423dad221abe29545fbe8ec36e806bc',
'formats': 'mincount:3',
}
}, {
'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e',
'only_matching': True,
@@ -156,10 +183,18 @@ class LBRYIE(LBRYBaseIE):
streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
info = self._parse_stream(result, url)
urlh = self._request_webpage(
streaming_url, display_id, note='Downloading streaming redirect url info')
if determine_ext(urlh.geturl()) == 'm3u8':
info['formats'] = self._extract_m3u8_formats(
urlh.geturl(), display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(info['formats'])
else:
info['url'] = streaming_url
info.update({
'id': claim_id,
'title': title,
'url': streaming_url,
})
return info
@@ -181,17 +216,18 @@ class LBRYChannelIE(LBRYBaseIE):
}]
_PAGE_SIZE = 50
def _fetch_page(self, claim_id, url, page):
def _fetch_page(self, claim_id, url, params, page):
page += 1
page_params = {
'channel_ids': [claim_id],
'claim_type': 'stream',
'no_totals': True,
'page': page,
'page_size': self._PAGE_SIZE,
}
page_params.update(params)
result = self._call_api_proxy(
'claim_search', claim_id, {
'channel_ids': [claim_id],
'claim_type': 'stream',
'no_totals': True,
'page': page,
'page_size': self._PAGE_SIZE,
'stream_types': self._SUPPORTED_STREAM_TYPES,
}, 'page %d' % page)
'claim_search', claim_id, page_params, 'page %d' % page)
for item in (result.get('items') or []):
stream_claim_name = item.get('name')
stream_claim_id = item.get('claim_id')
@@ -212,8 +248,31 @@ class LBRYChannelIE(LBRYBaseIE):
result = self._resolve_url(
'lbry://' + display_id, display_id, 'channel')
claim_id = result['claim_id']
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
content = qs.get('content', [None])[0]
params = {
'fee_amount': qs.get('fee_amount', ['>=0'])[0],
'order_by': {
'new': ['release_time'],
'top': ['effective_amount'],
'trending': ['trending_group', 'trending_mixed'],
}[qs.get('order', ['new'])[0]],
'stream_types': [content] if content in ['audio', 'video'] else self._SUPPORTED_STREAM_TYPES,
}
duration = qs.get('duration', [None])[0]
if duration:
params['duration'] = {
'long': '>=1200',
'short': '<=240',
}[duration]
language = qs.get('language', ['all'])[0]
if language != 'all':
languages = [language]
if language == 'en':
languages.append('none')
params['any_languages'] = languages
entries = OnDemandPagedList(
functools.partial(self._fetch_page, claim_id, url),
functools.partial(self._fetch_page, claim_id, url, params),
self._PAGE_SIZE)
result_value = result.get('value') or {}
return self.playlist_result(

View File

@@ -4,7 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
str_or_none,
)
class LineTVIE(InfoExtractor):
@@ -88,3 +94,137 @@ class LineTVIE(InfoExtractor):
for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
'view_count': video_info.get('meta', {}).get('count'),
}
class LineLiveBaseIE(InfoExtractor):
_API_BASE_URL = 'https://live-api.line-apps.com/web/v4.0/channel/'
def _parse_broadcast_item(self, item):
broadcast_id = compat_str(item['id'])
title = item['title']
is_live = item.get('isBroadcastingNow')
thumbnails = []
for thumbnail_id, thumbnail_url in (item.get('thumbnailURLs') or {}).items():
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
})
channel = item.get('channel') or {}
channel_id = str_or_none(channel.get('id'))
return {
'id': broadcast_id,
'title': self._live_title(title) if is_live else title,
'thumbnails': thumbnails,
'timestamp': int_or_none(item.get('createdAt')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': 'https://live.line.me/channels/' + channel_id if channel_id else None,
'duration': int_or_none(item.get('archiveDuration')),
'view_count': int_or_none(item.get('viewerCount')),
'comment_count': int_or_none(item.get('chatCount')),
'is_live': is_live,
}
class LineLiveIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<channel_id>\d+)/broadcast/(?P<id>\d+)'
_TESTS = [{
'url': 'https://live.line.me/channels/4867368/broadcast/16331360',
'md5': 'bc931f26bf1d4f971e3b0982b3fab4a3',
'info_dict': {
'id': '16331360',
'title': '振りコピ講座😙😙😙',
'ext': 'mp4',
'timestamp': 1617095132,
'upload_date': '20210330',
'channel': '白川ゆめか',
'channel_id': '4867368',
'view_count': int,
'comment_count': int,
'is_live': False,
}
}, {
# archiveStatus == 'DELETED'
'url': 'https://live.line.me/channels/4778159/broadcast/16378488',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id, broadcast_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
self._API_BASE_URL + '%s/broadcast/%s' % (channel_id, broadcast_id),
broadcast_id)
item = broadcast['item']
info = self._parse_broadcast_item(item)
protocol = 'm3u8' if info['is_live'] else 'm3u8_native'
formats = []
for k, v in (broadcast.get(('live' if info['is_live'] else 'archived') + 'HLSURLs') or {}).items():
if not v:
continue
if k == 'abr':
formats.extend(self._extract_m3u8_formats(
v, broadcast_id, 'mp4', protocol,
m3u8_id='hls', fatal=False))
continue
f = {
'ext': 'mp4',
'format_id': 'hls-' + k,
'protocol': protocol,
'url': v,
}
if not k.isdigit():
f['vcodec'] = 'none'
formats.append(f)
if not formats:
archive_status = item.get('archiveStatus')
if archive_status != 'ARCHIVED':
raise ExtractorError('this video has been ' + archive_status.lower(), expected=True)
self._sort_formats(formats)
info['formats'] = formats
return info
class LineLiveChannelIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<id>\d+)(?!/broadcast/\d+)(?:[/?&#]|$)'
_TEST = {
'url': 'https://live.line.me/channels/5893542',
'info_dict': {
'id': '5893542',
'title': 'いくらちゃん',
'description': 'md5:c3a4af801f43b2fac0b02294976580be',
},
'playlist_mincount': 29
}
def _archived_broadcasts_entries(self, archived_broadcasts, channel_id):
while True:
for row in (archived_broadcasts.get('rows') or []):
share_url = str_or_none(row.get('shareURL'))
if not share_url:
continue
info = self._parse_broadcast_item(row)
info.update({
'_type': 'url',
'url': share_url,
'ie_key': LineLiveIE.ie_key(),
})
yield info
if not archived_broadcasts.get('hasNextPage'):
return
archived_broadcasts = self._download_json(
self._API_BASE_URL + channel_id + '/archived_broadcasts',
channel_id, query={
'lastId': info['id'],
})
def _real_extract(self, url):
channel_id = self._match_id(url)
channel = self._download_json(self._API_BASE_URL + channel_id, channel_id)
return self.playlist_result(
self._archived_broadcasts_entries(channel.get('archivedBroadcasts') or {}, channel_id),
channel_id, channel.get('title'), channel.get('information'))

View File

@@ -1,191 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class LiveLeakIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?liveleak\.com/view\?.*?\b[it]=(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.liveleak.com/view?i=757_1364311680',
'md5': '0813c2430bea7a46bf13acf3406992f4',
'info_dict': {
'id': '757_1364311680',
'ext': 'mp4',
'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2',
'title': 'Most unlucky car accident',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
'url': 'http://www.liveleak.com/view?i=f93_1390833151',
'md5': 'd3f1367d14cc3c15bf24fbfbe04b9abf',
'info_dict': {
'id': 'f93_1390833151',
'ext': 'mp4',
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Prochan embed
'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
'md5': '42c6d97d54f1db107958760788c5f48f',
'info_dict': {
'id': '4f7_1392687779',
'ext': 'mp4',
'description': "The guy with the cigarette seems amazingly nonchalant about the whole thing... I really hope my friends' reactions would be a bit stronger.\r\n\r\nAction-go to 0:55.",
'uploader': 'CapObveus',
'title': 'Man is Fatally Struck by Reckless Car While Packing up a Moving Truck',
'age_limit': 18,
},
'skip': 'Video is dead',
}, {
# Covers https://github.com/ytdl-org/youtube-dl/pull/5983
# Multiple resolutions
'url': 'http://www.liveleak.com/view?i=801_1409392012',
'md5': 'c3a449dbaca5c0d1825caecd52a57d7b',
'info_dict': {
'id': '801_1409392012',
'ext': 'mp4',
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Covers https://github.com/ytdl-org/youtube-dl/pull/10664#issuecomment-247439521
'url': 'http://m.liveleak.com/view?i=763_1473349649',
'add_ie': ['Youtube'],
'info_dict': {
'id': '763_1473349649',
'ext': 'mp4',
'title': 'Reporters and public officials ignore epidemic of black on asian violence in Sacramento | Colin Flaherty',
'description': 'Colin being the warrior he is and showing the injustice Asians in Sacramento are being subjected to.',
'uploader': 'Ziz',
'upload_date': '20160908',
'uploader_id': 'UCEbta5E_jqlZmEJsriTEtnw'
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.liveleak.com/view?i=677_1439397581',
'info_dict': {
'id': '677_1439397581',
'title': 'Fuel Depot in China Explosion caught on video',
},
'playlist_count': 3,
}, {
'url': 'https://www.liveleak.com/view?t=HvHi_1523016227',
'only_matching': True,
}, {
# No original video
'url': 'https://www.liveleak.com/view?t=C26ZZ_1558612804',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[ift]=[\w_]+[^"]+)"',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None))
video_thumbnail = self._og_search_thumbnail(webpage)
entries = self._parse_html5_media_entries(url, webpage, video_id)
if not entries:
# Maybe an embed?
embed_url = self._search_regex(
r'<iframe[^>]+src="((?:https?:)?//(?:www\.)?(?:prochan|youtube)\.com/embed[^"]+)"',
webpage, 'embed URL')
return {
'_type': 'url_transparent',
'url': embed_url,
'id': video_id,
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
}
for idx, info_dict in enumerate(entries):
formats = []
for a_format in info_dict['formats']:
if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None))
formats.append(a_format)
# Removing '.*.mp4' gives the raw video, which is essentially
# the same video without the LiveLeak logo at the top (see
# https://github.com/ytdl-org/youtube-dl/pull/4768)
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
format_id = 'original' + ('-' + format_id if format_id else '')
if self._is_valid_url(orig_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': orig_url,
'preference': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats
# Don't append entry ID for one-video pages to keep backward compatibility
if len(entries) > 1:
info_dict['id'] = '%s_%s' % (video_id, idx + 1)
else:
info_dict['id'] = video_id
info_dict.update({
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
'thumbnail': video_thumbnail,
})
return self.playlist_result(entries, video_id, video_title)
class LiveLeakEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[ift])=(?P<id>[\w_]+)'
# See generic.py for actual test cases
_TESTS = [{
'url': 'https://www.liveleak.com/ll_embed?i=874_1459135191',
'only_matching': True,
}, {
'url': 'https://www.liveleak.com/ll_embed?f=ab065df993c1',
'only_matching': True,
}]
def _real_extract(self, url):
kind, video_id = re.match(self._VALID_URL, url).groups()
if kind == 'f':
webpage = self._download_webpage(url, video_id)
liveleak_url = self._search_regex(
r'(?:logourl\s*:\s*|window\.open\()(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
webpage, 'LiveLeak URL', group='url')
else:
liveleak_url = 'http://www.liveleak.com/view?%s=%s' % (kind, video_id)
return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key())

View File

@@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MaoriTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?maoritelevision\.com/shows/(?:[^/]+/)+(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.maoritelevision.com/shows/korero-mai/S01E054/korero-mai-series-1-episode-54',
'md5': '5ade8ef53851b6a132c051b1cd858899',
'info_dict': {
'id': '4774724855001',
'ext': 'mp4',
'title': 'Kōrero Mai, Series 1 Episode 54',
'upload_date': '20160226',
'timestamp': 1456455018,
'description': 'md5:59bde32fd066d637a1a55794c56d8dcb',
'uploader_id': '1614493167001',
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1614493167001/HJlhIQhQf_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(
r'data-main-video-id=["\'](\d+)', webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'BrightcoveNew', brightcove_id)

View File

@@ -15,33 +15,39 @@ from ..utils import (
class MedalTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://medal.tv/clips/34934644/3Is9zyGMoBMr',
'url': 'https://medal.tv/clips/2mA60jWAGQCBH',
'md5': '7b07b064331b1cf9e8e5c52a06ae68fa',
'info_dict': {
'id': '34934644',
'id': '2mA60jWAGQCBH',
'ext': 'mp4',
'title': 'Quad Cold',
'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'MowgliSB',
'timestamp': 1603165266,
'upload_date': '20201020',
'uploader_id': 10619174,
'uploader_id': '10619174',
}
}, {
'url': 'https://medal.tv/clips/36787208',
'url': 'https://medal.tv/clips/2um24TWdty0NA',
'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148',
'info_dict': {
'id': '36787208',
'id': '2um24TWdty0NA',
'ext': 'mp4',
'title': 'u tk me i tk u bigger',
'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'Mimicc',
'timestamp': 1605580939,
'upload_date': '20201117',
'uploader_id': 5156321,
'uploader_id': '5156321',
}
}, {
'url': 'https://medal.tv/clips/37rMeFpryCC-9',
'only_matching': True,
}, {
'url': 'https://medal.tv/clips/2WRj40tpY_EU9',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -2,268 +2,113 @@ from __future__ import unicode_literals
import re
from .gigya import GigyaBaseIE
from ..compat import compat_str
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_duration,
try_get,
unified_timestamp,
mimetype2ext,
parse_iso8601,
)
class MedialaanIE(GigyaBaseIE):
class MedialaanIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.|nieuws\.)?
(?:
(?P<site_id>vtm|q2|vtmkzoom)\.be/
(?:
video(?:/[^/]+/id/|/?\?.*?\baid=)|
(?:[^/]+/)*
)
(?:embed\.)?mychannels.video/embed/|
embed\.mychannels\.video/(?:s(?:dk|cript)/)?production/|
(?:www\.)?(?:
(?:
7sur7|
demorgen|
hln|
joe|
qmusic
)\.be|
(?:
[abe]d|
bndestem|
destentor|
gelderlander|
pzc|
tubantia|
volkskrant
)\.nl
)/video/(?:[^/]+/)*[^/?&#]+~p
)
(?P<id>[^/?#&]+)
(?P<id>\d+)
'''
_NETRC_MACHINE = 'medialaan'
_APIKEY = '3_HZ0FtkMW_gOyKlqQzW5_0FHRC7Nd5XpXJZcDdXY4pk5eES2ZWmejRW5egwVm4ug-'
_SITE_TO_APP_ID = {
'vtm': 'vtm_watch',
'q2': 'q2',
'vtmkzoom': 'vtmkzoom',
}
_TESTS = [{
# vod
'url': 'http://vtm.be/video/volledige-afleveringen/id/vtm_20170219_VM0678361_vtmwatch',
'url': 'https://www.bndestem.nl/video/de-terugkeer-van-ally-de-aap-en-wie-vertrekt-er-nog-bij-nac~p193993',
'info_dict': {
'id': 'vtm_20170219_VM0678361_vtmwatch',
'id': '193993',
'ext': 'mp4',
'title': 'Allemaal Chris afl. 6',
'description': 'md5:4be86427521e7b07e0adb0c9c554ddb2',
'timestamp': 1487533280,
'upload_date': '20170219',
'duration': 2562,
'series': 'Allemaal Chris',
'season': 'Allemaal Chris',
'season_number': 1,
'season_id': '256936078124527',
'episode': 'Allemaal Chris afl. 6',
'episode_number': 6,
'episode_id': '256936078591527',
'title': 'De terugkeer van Ally de Aap en wie vertrekt er nog bij NAC?',
'timestamp': 1611663540,
'upload_date': '20210126',
'duration': 238,
},
'params': {
'skip_download': True,
},
'skip': 'Requires account credentials',
}, {
# clip
'url': 'http://vtm.be/video?aid=168332',
'info_dict': {
'id': '168332',
'ext': 'mp4',
'title': '"Veronique liegt!"',
'description': 'md5:1385e2b743923afe54ba4adc38476155',
'timestamp': 1489002029,
'upload_date': '20170308',
'duration': 96,
},
}, {
# vod
'url': 'http://vtm.be/video/volledige-afleveringen/id/257107153551000',
'url': 'https://www.gelderlander.nl/video/kanalen/degelderlander~c320/series/snel-nieuws~s984/noodbevel-in-doetinchem-politie-stuurt-mensen-centrum-uit~p194093',
'only_matching': True,
}, {
# vod
'url': 'http://vtm.be/video?aid=163157',
'url': 'https://embed.mychannels.video/sdk/production/193993?options=TFTFF_default',
'only_matching': True,
}, {
# vod
'url': 'http://www.q2.be/video/volledige-afleveringen/id/2be_20170301_VM0684442_q2',
'url': 'https://embed.mychannels.video/script/production/193993',
'only_matching': True,
}, {
# clip
'url': 'http://vtmkzoom.be/k3-dansstudio/een-nieuw-seizoen-van-k3-dansstudio',
'url': 'https://embed.mychannels.video/production/193993',
'only_matching': True,
}, {
# http/s redirect
'url': 'https://vtmkzoom.be/video?aid=45724',
'info_dict': {
'id': '257136373657000',
'ext': 'mp4',
'title': 'K3 Dansstudio Ushuaia afl.6',
},
'params': {
'skip_download': True,
},
'skip': 'Requires account credentials',
'url': 'https://mychannels.video/embed/193993',
'only_matching': True,
}, {
# nieuws.vtm.be
'url': 'https://nieuws.vtm.be/stadion/stadion/genk-nog-moeilijk-programma',
'url': 'https://embed.mychannels.video/embed/193993',
'only_matching': True,
}]
def _real_initialize(self):
self._logged_in = False
def _login(self):
username, password = self._get_login_info()
if username is None:
self.raise_login_required()
auth_data = {
'APIKey': self._APIKEY,
'sdk': 'js_6.1',
'format': 'json',
'loginID': username,
'password': password,
}
auth_info = self._gigya_login(auth_data)
self._uid = auth_info['UID']
self._uid_signature = auth_info['UIDSignature']
self._signature_timestamp = auth_info['signatureTimestamp']
self._logged_in = True
@staticmethod
def _extract_urls(webpage):
entries = []
for element in re.findall(r'(<div[^>]+data-mychannels-type="video"[^>]*>)', webpage):
mychannels_id = extract_attributes(element).get('data-mychannels-id')
if mychannels_id:
entries.append('https://mychannels.video/embed/' + mychannels_id)
return entries
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, site_id = mobj.group('id', 'site_id')
production_id = self._match_id(url)
production = self._download_json(
'https://embed.mychannels.video/sdk/production/' + production_id,
production_id, query={'options': 'UUUU_default'})['productions'][0]
title = production['title']
webpage = self._download_webpage(url, video_id)
config = self._parse_json(
self._search_regex(
r'videoJSConfig\s*=\s*JSON\.parse\(\'({.+?})\'\);',
webpage, 'config', default='{}'), video_id,
transform_source=lambda s: s.replace(
'\\\\', '\\').replace(r'\"', '"').replace(r"\'", "'"))
vod_id = config.get('vodId') or self._search_regex(
(r'\\"vodId\\"\s*:\s*\\"(.+?)\\"',
r'"vodId"\s*:\s*"(.+?)"',
r'<[^>]+id=["\']vod-(\d+)'),
webpage, 'video_id', default=None)
# clip, no authentication required
if not vod_id:
player = self._parse_json(
self._search_regex(
r'vmmaplayer\(({.+?})\);', webpage, 'vmma player',
default=''),
video_id, transform_source=lambda s: '[%s]' % s, fatal=False)
if player:
video = player[-1]
if video['videoUrl'] in ('http', 'https'):
return self.url_result(video['url'], MedialaanIE.ie_key())
info = {
'id': video_id,
'url': video['videoUrl'],
'title': video['title'],
'thumbnail': video.get('imageUrl'),
'timestamp': int_or_none(video.get('createdDate')),
'duration': int_or_none(video.get('duration')),
}
formats = []
for source in (production.get('sources') or []):
src = source.get('src')
if not src:
continue
ext = mimetype2ext(source.get('type'))
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, production_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
info = self._parse_html5_media_entries(
url, webpage, video_id, m3u8_id='hls')[0]
info.update({
'id': video_id,
'title': self._html_search_meta('description', webpage),
'duration': parse_duration(self._html_search_meta('duration', webpage)),
formats.append({
'ext': ext,
'url': src,
})
# vod, authentication required
else:
if not self._logged_in:
self._login()
self._sort_formats(formats)
settings = self._parse_json(
self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings', default='{}'),
video_id)
def get(container, item):
return try_get(
settings, lambda x: x[container][item],
compat_str) or self._search_regex(
r'"%s"\s*:\s*"([^"]+)' % item, webpage, item,
default=None)
app_id = get('vod', 'app_id') or self._SITE_TO_APP_ID.get(site_id, 'vtm_watch')
sso = get('vod', 'gigyaDatabase') or 'vtm-sso'
data = self._download_json(
'http://vod.medialaan.io/api/1.0/item/%s/video' % vod_id,
video_id, query={
'app_id': app_id,
'user_network': sso,
'UID': self._uid,
'UIDSignature': self._uid_signature,
'signatureTimestamp': self._signature_timestamp,
})
formats = self._extract_m3u8_formats(
data['response']['uri'], video_id, entry_protocol='m3u8_native',
ext='mp4', m3u8_id='hls')
self._sort_formats(formats)
info = {
'id': vod_id,
'formats': formats,
}
api_key = get('vod', 'apiKey')
channel = get('medialaanGigya', 'channel')
if api_key:
videos = self._download_json(
'http://vod.medialaan.io/vod/v2/videos', video_id, fatal=False,
query={
'channels': channel,
'ids': vod_id,
'limit': 1,
'apikey': api_key,
})
if videos:
video = try_get(
videos, lambda x: x['response']['videos'][0], dict)
if video:
def get(container, item, expected_type=None):
return try_get(
video, lambda x: x[container][item], expected_type)
def get_string(container, item):
return get(container, item, compat_str)
info.update({
'series': get_string('program', 'title'),
'season': get_string('season', 'title'),
'season_number': int_or_none(get('season', 'number')),
'season_id': get_string('season', 'id'),
'episode': get_string('episode', 'title'),
'episode_number': int_or_none(get('episode', 'number')),
'episode_id': get_string('episode', 'id'),
'duration': int_or_none(
video.get('duration')) or int_or_none(
video.get('durationMillis'), scale=1000),
'title': get_string('episode', 'title'),
'description': get_string('episode', 'text'),
'timestamp': unified_timestamp(get_string(
'publication', 'begin')),
})
if not info.get('title'):
info['title'] = try_get(
config, lambda x: x['videoConfig']['title'],
compat_str) or self._html_search_regex(
r'\\"title\\"\s*:\s*\\"(.+?)\\"', webpage, 'title',
default=None) or self._og_search_title(webpage)
if not info.get('description'):
info['description'] = self._html_search_regex(
r'<div[^>]+class="field-item\s+even">\s*<p>(.+?)</p>',
webpage, 'description', default=None)
return info
return {
'id': production_id,
'title': title,
'formats': formats,
'thumbnail': production.get('posterUrl'),
'timestamp': parse_iso8601(production.get('publicationDate'), ' '),
'duration': int_or_none(production.get('duration')) or None,
}

View File

@@ -24,7 +24,7 @@ class MediasetIE(ThePlatformBaseIE):
(?:(?:www|static3)\.)?mediasetplay\.mediaset\.it/
(?:
(?:video|on-demand|movie)/(?:[^/]+/)+[^/]+_|
player/index\.html\?.*?\bprogramGuid=
player(?:/v\d+)?/index\.html\?.*?\bprogramGuid=
)
)(?P<id>[0-9A-Z]{16,})
'''
@@ -73,6 +73,10 @@ class MediasetIE(ThePlatformBaseIE):
# iframe twitter (from http://www.wittytv.it/se-prima-mi-fidavo-zero/)
'url': 'https://static3.mediasetplay.mediaset.it/player/index.html?appKey=5ad3966b1de1c4000d5cec48&programGuid=FAFU000000665104&id=665104',
'only_matching': True,
}, {
# embedUrl (from https://www.wittytv.it/amici/est-ce-que-tu-maimes-gabriele-5-dicembre-copia/)
'url': 'https://static3.mediasetplay.mediaset.it/player/v2/index.html?partnerId=wittytv&configId=&programGuid=FD00000000153323&autoplay=true&purl=http://www.wittytv.it/amici/est-ce-que-tu-maimes-gabriele-5-dicembre-copia/',
'only_matching': True,
}, {
'url': 'mediaset:FAFU000000665924',
'only_matching': True,

View File

@@ -78,7 +78,7 @@ class MindsIE(MindsBaseIE):
else:
return self.url_result(entity['perma_url'])
else:
assert(entity['subtype'] == 'video')
assert (entity['subtype'] == 'video')
video_id = entity_id
# 1080p and webm formats available only on the sources array
video = self._call_api(

View File

@@ -1,15 +1,91 @@
from __future__ import unicode_literals
from .nhl import NHLBaseIE
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
parse_duration,
parse_iso8601,
try_get,
)
class MLBIE(NHLBaseIE):
class MLBBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_video_data(display_id)
video_id = video['id']
title = video['title']
feed = self._get_feed(video)
formats = []
for playback in (feed.get('playbacks') or []):
playback_url = playback.get('url')
if not playback_url:
continue
name = playback.get('name')
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4',
'm3u8_native', m3u8_id=name, fatal=False))
else:
f = {
'format_id': name,
'url': playback_url,
}
mobj = re.search(r'_(\d+)K_(\d+)X(\d+)', name)
if mobj:
f.update({
'height': int(mobj.group(3)),
'tbr': int(mobj.group(1)),
'width': int(mobj.group(2)),
})
mobj = re.search(r'_(\d+)x(\d+)_(\d+)_(\d+)K\.mp4', playback_url)
if mobj:
f.update({
'fps': int(mobj.group(3)),
'height': int(mobj.group(2)),
'tbr': int(mobj.group(4)),
'width': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats)
thumbnails = []
for cut in (try_get(feed, lambda x: x['image']['cuts'], list) or []):
src = cut.get('src')
if not src:
continue
thumbnails.append({
'height': int_or_none(cut.get('height')),
'url': src,
'width': int_or_none(cut.get('width')),
})
language = (video.get('language') or 'EN').lower()
return {
'id': video_id,
'title': title,
'formats': formats,
'description': video.get('description'),
'duration': parse_duration(feed.get('duration')),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(video.get(self._TIMESTAMP_KEY)),
'subtitles': self._extract_mlb_subtitles(feed, language),
}
class MLBIE(MLBBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:[\da-z_-]+\.)*(?P<site>mlb)\.com/
(?:[\da-z_-]+\.)*mlb\.com/
(?:
(?:
(?:[^/]+/)*c-|
(?:[^/]+/)*video/[^/]+/c-|
(?:
shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp|
@@ -18,7 +94,6 @@ class MLBIE(NHLBaseIE):
(?P<id>\d+)
)
'''
_CONTENT_DOMAIN = 'content.mlb.com'
_TESTS = [
{
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
@@ -76,18 +151,6 @@ class MLBIE(NHLBaseIE):
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
'md5': 'e09e37b552351fddbf4d9e699c924d68',
'info_dict': {
'id': '75609783',
'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429139220,
'upload_date': '20150415',
}
},
{
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
'only_matching': True,
@@ -113,8 +176,92 @@ class MLBIE(NHLBaseIE):
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
'only_matching': True,
},
{
'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
'only_matching': True,
}
]
_TIMESTAMP_KEY = 'date'
@staticmethod
def _get_feed(video):
return video
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for keyword in (feed.get('keywordsAll') or []):
keyword_type = keyword.get('type')
if keyword_type and keyword_type.startswith('closed_captions_location_'):
cc_location = keyword.get('value')
if cc_location:
subtitles.setdefault(language, []).append({
'url': cc_location,
})
return subtitles
def _download_video_data(self, display_id):
return self._download_json(
'http://content.mlb.com/mlb/item/id/v1/%s/details/web-v1.json' % display_id,
display_id)
class MLBVideoIE(MLBBaseIE):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/(?:[^/]+/)*video/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.mlb.com/mariners/video/ackley-s-spectacular-catch-c34698933',
'md5': '632358dacfceec06bad823b83d21df2d',
'info_dict': {
'id': 'c04a8863-f569-42e6-9f87-992393657614',
'ext': 'mp4',
'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66,
'timestamp': 1405995000,
'upload_date': '20140722',
'thumbnail': r're:^https?://.+',
},
}
_TIMESTAMP_KEY = 'timestamp'
@classmethod
def suitable(cls, url):
return False if MLBIE.suitable(url) else super(MLBVideoIE, cls).suitable(url)
@staticmethod
def _get_feed(video):
return video['feeds'][0]
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for cc_location in (feed.get('closedCaptions') or []):
subtitles.setdefault(language, []).append({
'url': cc_location,
})
def _download_video_data(self, display_id):
# https://www.mlb.com/data-service/en/videos/[SLUG]
return self._download_json(
'https://fastball-gateway.mlb.com/graphql',
display_id, query={
'query': '''{
mediaPlayback(ids: "%s") {
description
feeds(types: CMS) {
closedCaptions
duration
image {
cuts {
width
height
src
}
}
playbacks {
name
url
}
}
id
timestamp
title
}
}''' % display_id,
})['data']['mediaPlayback'][0]

View File

@@ -255,7 +255,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
@staticmethod
def _extract_child_with_type(parent, t):
return next(c for c in parent['children'] if c.get('type') == t)
for c in parent['children']:
if c.get('type') == t:
return c
def _extract_mgid(self, webpage):
try:
@@ -286,7 +288,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
data = self._parse_json(self._search_regex(
r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None)
main_container = self._extract_child_with_type(data, 'MainContainer')
video_player = self._extract_child_with_type(main_container, 'VideoPlayer')
ab_testing = self._extract_child_with_type(main_container, 'ABTesting')
video_player = self._extract_child_with_type(ab_testing or main_container, 'VideoPlayer')
mgid = video_player['props']['media']['video']['config']['uri']
return mgid
@@ -320,7 +323,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1', webpage)
if mobj:
return mobj.group('url')

View File

@@ -35,7 +35,9 @@ class MySpassIE(InfoExtractor):
title = xpath_text(metadata, 'title', fatal=True)
video_url = xpath_text(metadata, 'url_flv', 'download url', True)
video_id_int = int(video_id)
for group in re.search(r'/myspass2009/\d+/(\d+)/(\d+)/(\d+)/', video_url).groups():
grps = re.search(r'/myspass2009/\d+/(\d+)/(\d+)/(\d+)/', video_url)
for group in grps.groups() if grps else []:
group_int = int(group)
if group_int > video_id_int:
video_url = video_url.replace(

View File

@@ -4,8 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
merge_dicts,
parse_iso8601,
@@ -20,13 +22,13 @@ class NDRBaseIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
display_id = next(group for group in mobj.groups() if group)
webpage = self._download_webpage(url, display_id)
return self._extract_embed(webpage, display_id)
return self._extract_embed(webpage, display_id, url)
class NDRIE(NDRBaseIE):
IE_NAME = 'ndr'
IE_DESC = 'NDR.de - Norddeutscher Rundfunk'
_VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
_VALID_URL = r'https?://(?:\w+\.)*ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
_TESTS = [{
# httpVideo, same content id
'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
@@ -38,13 +40,14 @@ class NDRIE(NDRBaseIE):
'title': 'Party, Pötte und Parade',
'description': 'md5:ad14f9d2f91d3040b6930c697e5f6b4c',
'uploader': 'ndrtv',
'timestamp': 1431108900,
'timestamp': 1431255671,
'upload_date': '20150510',
'duration': 3498,
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
# httpVideo, different content id
'url': 'http://www.ndr.de/sport/fussball/40-Osnabrueck-spielt-sich-in-einen-Rausch,osna270.html',
@@ -63,6 +66,7 @@ class NDRIE(NDRBaseIE):
'params': {
'skip_download': True,
},
'skip': 'No longer available',
}, {
# httpAudio, same content id
'url': 'http://www.ndr.de/info/La-Valette-entgeht-der-Hinrichtung,audio51535.html',
@@ -74,8 +78,8 @@ class NDRIE(NDRBaseIE):
'title': 'La Valette entgeht der Hinrichtung',
'description': 'md5:22f9541913a40fe50091d5cdd7c9f536',
'uploader': 'ndrinfo',
'timestamp': 1290626100,
'upload_date': '20140729',
'timestamp': 1631711863,
'upload_date': '20210915',
'duration': 884,
},
'params': {
@@ -89,9 +93,10 @@ class NDRIE(NDRBaseIE):
'display_id': 'extra-3-Satiremagazin-mit-Christian-Ehring',
'ext': 'mp4',
'title': 'Extra 3 vom 11.11.2020 mit Christian Ehring',
'description': 'md5:42ee53990a715eaaf4dc7f13a3bd56c6',
'description': 'md5:700f6de264010585012a72f97b0ac0c9',
'uploader': 'ndrtv',
'upload_date': '20201113',
'upload_date': '20201207',
'timestamp': 1614349457,
'duration': 1749,
'subtitles': {
'de': [{
@@ -109,19 +114,38 @@ class NDRIE(NDRBaseIE):
'only_matching': True,
}]
def _extract_embed(self, webpage, display_id):
embed_url = self._html_search_meta(
'embedURL', webpage, 'embed URL',
default=None) or self._search_regex(
r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'embed URL', group='url')
def _extract_embed(self, webpage, display_id, url):
embed_url = (
self._html_search_meta(
'embedURL', webpage, 'embed URL',
default=None)
or self._search_regex(
r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'embed URL', group='url', default=None)
or self._search_regex(
r'\bvar\s*sophoraID\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'embed URL', group='url', default=''))
# some more work needed if we only found sophoraID
if re.match(r'^[a-z]+\d+$', embed_url):
# get the initial part of the url path,. eg /panorama/archiv/2022/
parsed_url = compat_urllib_parse_urlparse(url)
path = self._search_regex(r'(.+/)%s' % display_id, parsed_url.path or '', 'embed URL', default='')
# find tell-tale image with the actual ID
ndr_id = self._search_regex(r'%s([a-z]+\d+)(?!\.)\b' % (path, ), webpage, 'embed URL', default=None)
# or try to use special knowledge!
NDR_INFO_URL_TPL = 'https://www.ndr.de/info/%s-player.html'
embed_url = 'ndr:%s' % (ndr_id, ) if ndr_id else NDR_INFO_URL_TPL % (embed_url, )
if not embed_url:
raise ExtractorError('Unable to extract embedUrl')
description = self._search_regex(
r'<p[^>]+itemprop="description">([^<]+)</p>',
webpage, 'description', default=None) or self._og_search_description(webpage)
timestamp = parse_iso8601(
self._search_regex(
r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
webpage, 'upload date', default=None))
(r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="(?P<cont>[^"]+)"',
r'\bvar\s*pdt\s*=\s*(?P<q>["\'])(?P<cont>(?:(?!(?P=q)).)+)(?P=q)', ),
webpage, 'upload date', group='cont', default=None))
info = self._search_json_ld(webpage, display_id, default={})
return merge_dicts({
'_type': 'url_transparent',
@@ -153,19 +177,19 @@ class NJoyIE(NDRBaseIE):
'params': {
'skip_download': True,
},
'skip': 'No longer available',
}, {
# httpVideo, different content id
'url': 'http://www.n-joy.de/musik/Das-frueheste-DJ-Set-des-Nordens-live-mit-Felix-Jaehn-,felixjaehn168.html',
'md5': '417660fffa90e6df2fda19f1b40a64d8',
'info_dict': {
'id': 'dockville882',
'id': 'livestream283',
'display_id': 'Das-frueheste-DJ-Set-des-Nordens-live-mit-Felix-Jaehn-',
'ext': 'mp4',
'title': '"Ich hab noch nie" mit Felix Jaehn',
'description': 'md5:85dd312d53be1b99e1f998a16452a2f3',
'ext': 'mp3',
'title': 'Das frueheste DJ Set des Nordens live mit Felix Jaehn',
'description': 'md5:681698f527b8601e511e7b79edde7d2c',
'uploader': 'njoy',
'upload_date': '20150822',
'duration': 211,
'upload_date': '20210830',
},
'params': {
'skip_download': True,
@@ -175,18 +199,25 @@ class NJoyIE(NDRBaseIE):
'only_matching': True,
}]
def _extract_embed(self, webpage, display_id):
def _extract_embed(self, webpage, display_id, url=None):
# find tell-tale URL with the actual ID, or ...
video_id = self._search_regex(
r'<iframe[^>]+id="pp_([\da-z]+)"', webpage, 'embed id')
description = self._search_regex(
r'<div[^>]+class="subline"[^>]*>[^<]+</div>\s*<p>([^<]+)</p>',
webpage, 'description', fatal=False)
(r'''\bsrc\s*=\s*["']?(?:/\w+)+/([a-z]+\d+)(?!\.)\b''',
r'<iframe[^>]+id="pp_([\da-z]+)"', ),
webpage, 'NDR id', default=None)
description = (
self._html_search_meta('description', webpage)
or self._search_regex(
r'<div[^>]+class="subline"[^>]*>[^<]+</div>\s*<p>([^<]+)</p>',
webpage, 'description', fatal=False))
return {
'_type': 'url_transparent',
'ie_key': 'NDREmbedBase',
'url': 'ndr:%s' % video_id,
'display_id': display_id,
'description': description,
'title': display_id.replace('-', ' ').strip(),
}
@@ -291,7 +322,7 @@ class NDREmbedBaseIE(InfoExtractor):
class NDREmbedIE(NDREmbedBaseIE):
IE_NAME = 'ndr:embed'
_VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
_VALID_URL = r'https?://(?:\w+\.)*ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:(?:ard)?player|externalPlayer)\.html'
_TESTS = [{
'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html',
'md5': '8b9306142fe65bbdefb5ce24edb6b0a9',
@@ -304,6 +335,7 @@ class NDREmbedIE(NDREmbedBaseIE):
'upload_date': '20150907',
'duration': 132,
},
'skip': 'No longer available',
}, {
'url': 'http://www.ndr.de/ndr2/events/soundcheck/soundcheck3366-player.html',
'md5': '002085c44bae38802d94ae5802a36e78',
@@ -319,6 +351,7 @@ class NDREmbedIE(NDREmbedBaseIE):
'params': {
'skip_download': True,
},
'skip': 'No longer available',
}, {
'url': 'http://www.ndr.de/info/audio51535-player.html',
'md5': 'bb3cd38e24fbcc866d13b50ca59307b8',
@@ -328,7 +361,7 @@ class NDREmbedIE(NDREmbedBaseIE):
'title': 'La Valette entgeht der Hinrichtung',
'is_live': False,
'uploader': 'ndrinfo',
'upload_date': '20140729',
'upload_date': '20210915',
'duration': 884,
},
'params': {
@@ -349,15 +382,17 @@ class NDREmbedIE(NDREmbedBaseIE):
'params': {
'skip_download': True,
},
'skip': 'No longer available',
}, {
# httpVideoLive
'url': 'http://www.ndr.de/fernsehen/livestream/livestream217-externalPlayer.html',
'info_dict': {
'id': 'livestream217',
'ext': 'flv',
'ext': 'mp4',
'title': r're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True,
'upload_date': '20150910',
'upload_date': '20210409',
'uploader': 'ndrtv',
},
'params': {
'skip_download': True,
@@ -395,9 +430,10 @@ class NJoyEmbedIE(NDREmbedBaseIE):
'ext': 'mp4',
'title': 'Zehn Jahre Reeperbahn Festival - die Doku',
'is_live': False,
'upload_date': '20150807',
'upload_date': '20200826',
'duration': 1011,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
# httpAudio
'url': 'http://www.n-joy.de/news_wissen/stefanrichter100-player_image-d5e938b1-f21a-4b9a-86b8-aaba8bca3a13_theme-n-joy.html',
@@ -414,6 +450,7 @@ class NJoyEmbedIE(NDREmbedBaseIE):
'params': {
'skip_download': True,
},
'skip': 'No longer available',
}, {
# httpAudioLive, no explicit ext
'url': 'http://www.n-joy.de/news_wissen/webradioweltweit100-player_image-3fec0484-2244-4565-8fb8-ed25fd28b173_theme-n-joy.html',
@@ -423,7 +460,7 @@ class NJoyEmbedIE(NDREmbedBaseIE):
'title': r're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True,
'uploader': 'njoy',
'upload_date': '20150810',
'upload_date': '20210830',
},
'params': {
'skip_download': True,

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -7,7 +8,7 @@ from ..utils import urljoin
class NhkBaseIE(InfoExtractor):
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/%s/%s/%s/all%s.json'
_API_URL_TEMPLATE = 'https://nwapi.nhk.jp/nhkworld/%sod%slist/v7b/%s/%s/%s/all%s.json'
_BASE_URL_REGEX = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand'
_TYPE_REGEX = r'/(?P<type>video|audio)/'
@@ -23,7 +24,7 @@ class NhkBaseIE(InfoExtractor):
def _extract_episode_info(self, url, episode=None):
fetch_episode = episode is None
lang, m_type, episode_id = re.match(NhkVodIE._VALID_URL, url).groups()
if episode_id.isdigit():
if len(episode_id) == 7:
episode_id = episode_id[:4] + '-' + episode_id[4:]
is_video = m_type == 'video'
@@ -84,7 +85,8 @@ class NhkBaseIE(InfoExtractor):
class NhkVodIE(NhkBaseIE):
_VALID_URL = r'%s%s(?P<id>\d{7}|[^/]+?-\d{8}-[0-9a-z]+)' % (NhkBaseIE._BASE_URL_REGEX, NhkBaseIE._TYPE_REGEX)
# the 7-character IDs can have alphabetic chars too: assume [a-z] rather than just [a-f], eg
_VALID_URL = r'%s%s(?P<id>[0-9a-z]{7}|[^/]+?-\d{8}-[0-9a-z]+)' % (NhkBaseIE._BASE_URL_REGEX, NhkBaseIE._TYPE_REGEX)
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
@@ -124,6 +126,19 @@ class NhkVodIE(NhkBaseIE):
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
'only_matching': True,
}, {
# video, alphabetic character in ID #29670
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999a34/',
'only_matching': True,
'info_dict': {
'id': 'qfjay6cg',
'ext': 'mp4',
'title': 'DESIGN TALKS plus - Fishermens Finery',
'description': 'md5:8a8f958aaafb0d7cb59d38de53f1e448',
'thumbnail': r're:^https?:/(/[a-z0-9.-]+)+\.jpg\?w=1920&h=1080$',
'upload_date': '20210615',
'timestamp': 1623722008,
}
}]
def _real_extract(self, url):

View File

@@ -2,25 +2,28 @@
from __future__ import unicode_literals
import datetime
import functools
import itertools
import json
import math
import re
from .common import InfoExtractor
from .common import InfoExtractor, SearchInfoExtractor
from ..postprocessor.ffmpeg import FFmpegPostProcessor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import (
determine_ext,
dict_get,
ExtractorError,
dict_get,
float_or_none,
InAdvancePagedList,
int_or_none,
OnDemandPagedList,
parse_duration,
parse_iso8601,
PostProcessingError,
remove_start,
str_or_none,
try_get,
unified_timestamp,
urlencode_postdata,
@@ -34,7 +37,7 @@ class NiconicoIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.nicovideo.jp/watch/sm22312215',
'md5': 'd1a75c0823e2f629128c43e1212760f9',
'md5': 'a5bad06f1347452102953f323c69da34s',
'info_dict': {
'id': 'sm22312215',
'ext': 'mp4',
@@ -157,11 +160,34 @@ class NiconicoIE(InfoExtractor):
}, {
'url': 'http://sp.nicovideo.jp/watch/sm28964488?ss_pos=1&cp_in=wt_tg',
'only_matching': True,
}, {
# DMC video with heartbeat
'url': 'https://www.nicovideo.jp/watch/sm34815188',
'md5': '9360c6e1f1519d7759e2fe8e1326ae83',
'info_dict': {
'id': 'sm34815188',
'ext': 'mp4',
'title': 'md5:aee93e9f3366db72f902f6cd5d389cb7',
'description': 'md5:7b9149fc7a00ab053cafaf5c19662704',
'thumbnail': r're:https?://.*',
'uploader': 'md5:2762e18fa74dbb40aa1ad27c6291ee32',
'uploader_id': '67449889',
'upload_date': '20190322',
'timestamp': int, # timestamp is unstable
'duration': 1082.0,
'view_count': int,
'comment_count': int,
},
}]
_VALID_URL = r'https?://(?:www\.|secure\.|sp\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
_NETRC_MACHINE = 'niconico'
_API_HEADERS = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0'
}
def _real_initialize(self):
self._login()
@@ -191,37 +217,89 @@ class NiconicoIE(InfoExtractor):
self._downloader.report_warning('unable to log in: bad username or password')
return login_ok
def _extract_format_for_quality(self, api_data, video_id, audio_quality, video_quality):
def yesno(boolean):
return 'yes' if boolean else 'no'
def _get_heartbeat_info(self, info_dict):
session_api_data = api_data['video']['dmcInfo']['session_api']
session_api_endpoint = session_api_data['urls'][0]
video_id, video_src_id, audio_src_id = info_dict['url'].split(':')[1].split('/')
format_id = '-'.join(map(lambda s: remove_start(s['id'], 'archive_'), [video_quality, audio_quality]))
api_data = (
info_dict.get('_api_data')
or self._parse_json(
self._html_search_regex(
'data-api-data="([^"]+)"',
self._download_webpage('http://www.nicovideo.jp/watch/' + video_id, video_id),
'API data', default='{}'),
video_id))
session_api_data = try_get(api_data, lambda x: x['media']['delivery']['movie']['session'])
session_api_endpoint = try_get(session_api_data, lambda x: x['urls'][0])
def ping():
status = try_get(
self._download_json(
'https://nvapi.nicovideo.jp/v1/2ab0cbaa/watch', video_id,
query={'t': try_get(api_data, lambda x: x['media']['delivery']['trackingId'])},
note='Acquiring permission for downloading video',
headers=self._API_HEADERS),
lambda x: x['meta']['status'])
if status != 200:
self.report_warning('Failed to acquire permission for playing video. The video may not download.')
yesno = lambda x: 'yes' if x else 'no'
# m3u8 (encryption)
if try_get(api_data, lambda x: x['media']['delivery']['encryption']) is not None:
protocol = 'm3u8'
encryption = self._parse_json(session_api_data['token'], video_id)['hls_encryption']
session_api_http_parameters = {
'parameters': {
'hls_parameters': {
'encryption': {
encryption: {
'encrypted_key': try_get(api_data, lambda x: x['media']['delivery']['encryption']['encryptedKey']),
'key_uri': try_get(api_data, lambda x: x['media']['delivery']['encryption']['keyUri'])
}
},
'transfer_preset': '',
'use_ssl': yesno(session_api_endpoint['isSsl']),
'use_well_known_port': yesno(session_api_endpoint['isWellKnownPort']),
'segment_duration': 6000,
}
}
}
# http
else:
protocol = 'http'
session_api_http_parameters = {
'parameters': {
'http_output_download_parameters': {
'use_ssl': yesno(session_api_endpoint['isSsl']),
'use_well_known_port': yesno(session_api_endpoint['isWellKnownPort']),
}
}
}
session_response = self._download_json(
session_api_endpoint['url'], video_id,
query={'_format': 'json'},
headers={'Content-Type': 'application/json'},
note='Downloading JSON metadata for %s' % format_id,
note='Downloading JSON metadata for %s' % info_dict['format_id'],
data=json.dumps({
'session': {
'client_info': {
'player_id': session_api_data['player_id'],
'player_id': session_api_data.get('playerId'),
},
'content_auth': {
'auth_type': session_api_data['auth_types'][session_api_data['protocols'][0]],
'content_key_timeout': session_api_data['content_key_timeout'],
'auth_type': try_get(session_api_data, lambda x: x['authTypes'][session_api_data['protocols'][0]]),
'content_key_timeout': session_api_data.get('contentKeyTimeout'),
'service_id': 'nicovideo',
'service_user_id': session_api_data['service_user_id']
'service_user_id': session_api_data.get('serviceUserId')
},
'content_id': session_api_data['content_id'],
'content_id': session_api_data.get('contentId'),
'content_src_id_sets': [{
'content_src_ids': [{
'src_id_to_mux': {
'audio_src_ids': [audio_quality['id']],
'video_src_ids': [video_quality['id']],
'audio_src_ids': [audio_src_id],
'video_src_ids': [video_src_id],
}
}]
}],
@@ -229,52 +307,81 @@ class NiconicoIE(InfoExtractor):
'content_uri': '',
'keep_method': {
'heartbeat': {
'lifetime': session_api_data['heartbeat_lifetime']
'lifetime': session_api_data.get('heartbeatLifetime')
}
},
'priority': session_api_data['priority'],
'priority': session_api_data.get('priority'),
'protocol': {
'name': 'http',
'parameters': {
'http_parameters': {
'parameters': {
'http_output_download_parameters': {
'use_ssl': yesno(session_api_endpoint['is_ssl']),
'use_well_known_port': yesno(session_api_endpoint['is_well_known_port']),
}
}
}
'http_parameters': session_api_http_parameters
}
},
'recipe_id': session_api_data['recipe_id'],
'recipe_id': session_api_data.get('recipeId'),
'session_operation_auth': {
'session_operation_auth_by_signature': {
'signature': session_api_data['signature'],
'token': session_api_data['token'],
'signature': session_api_data.get('signature'),
'token': session_api_data.get('token'),
}
},
'timing_constraint': 'unlimited'
}
}).encode())
resolution = video_quality.get('resolution', {})
info_dict['url'] = session_response['data']['session']['content_uri']
info_dict['protocol'] = protocol
# get heartbeat info
heartbeat_info_dict = {
'url': session_api_endpoint['url'] + '/' + session_response['data']['session']['id'] + '?_format=json&_method=PUT',
'data': json.dumps(session_response['data']),
# interval, convert milliseconds to seconds, then halve to make a buffer.
'interval': float_or_none(session_api_data.get('heartbeatLifetime'), scale=3000),
'ping': ping
}
return info_dict, heartbeat_info_dict
def _extract_format_for_quality(self, api_data, video_id, audio_quality, video_quality):
def parse_format_id(id_code):
mobj = re.match(r'''(?x)
(?:archive_)?
(?:(?P<codec>[^_]+)_)?
(?:(?P<br>[\d]+)kbps_)?
(?:(?P<res>[\d+]+)p_)?
''', '%s_' % id_code)
return mobj.groupdict() if mobj else {}
protocol = 'niconico_dmc'
format_id = '-'.join(map(lambda s: remove_start(s['id'], 'archive_'), [video_quality, audio_quality]))
vdict = parse_format_id(video_quality['id'])
adict = parse_format_id(audio_quality['id'])
resolution = try_get(video_quality, lambda x: x['metadata']['resolution'], dict) or {'height': vdict.get('res')}
vbr = try_get(video_quality, lambda x: x['metadata']['bitrate'], float)
return {
'url': session_response['data']['session']['content_uri'],
'url': '%s:%s/%s/%s' % (protocol, video_id, video_quality['id'], audio_quality['id']),
'format_id': format_id,
'format_note': 'DMC %s' % try_get(video_quality, lambda x: x['metadata']['label'], compat_str),
'ext': 'mp4', # Session API are used in HTML5, which always serves mp4
'abr': float_or_none(audio_quality.get('bitrate'), 1000),
'vbr': float_or_none(video_quality.get('bitrate'), 1000),
'height': resolution.get('height'),
'width': resolution.get('width'),
'vcodec': vdict.get('codec'),
'acodec': adict.get('codec'),
'vbr': float_or_none(vbr, 1000) or float_or_none(vdict.get('br')),
'abr': float_or_none(audio_quality.get('bitrate'), 1000) or float_or_none(adict.get('br')),
'height': int_or_none(resolution.get('height', vdict.get('res'))),
'width': int_or_none(resolution.get('width')),
'quality': -2 if 'low' in format_id else -1, # Default quality value is -1
'protocol': protocol,
'http_headers': {
'Origin': 'https://www.nicovideo.jp',
'Referer': 'https://www.nicovideo.jp/watch/' + video_id,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# Get video webpage. We are not actually interested in it for normal
# cases, but need the cookies in order to be able to download the
# info webpage
# Get video webpage for API data.
webpage, handle = self._download_webpage_handle(
'http://www.nicovideo.jp/watch/' + video_id, video_id)
if video_id.startswith('so'):
@@ -284,86 +391,136 @@ class NiconicoIE(InfoExtractor):
'data-api-data="([^"]+)"', webpage,
'API data', default='{}'), video_id)
def _format_id_from_url(video_url):
return 'economy' if video_real_url.endswith('low') else 'normal'
def get_video_info_web(items):
return dict_get(api_data['video'], items)
try:
video_real_url = api_data['video']['smileInfo']['url']
except KeyError: # Flash videos
# Get flv info
flv_info_webpage = self._download_webpage(
'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
video_id, 'Downloading flv info')
# Get video info
video_info_xml = self._download_xml(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id,
video_id, note='Downloading video info page')
flv_info = compat_parse_qs(flv_info_webpage)
if 'url' not in flv_info:
if 'deleted' in flv_info:
raise ExtractorError('The video has been deleted.',
expected=True)
elif 'closed' in flv_info:
raise ExtractorError('Niconico videos now require logging in',
expected=True)
elif 'error' in flv_info:
raise ExtractorError('%s reports error: %s' % (
self.IE_NAME, flv_info['error'][0]), expected=True)
else:
raise ExtractorError('Unable to find video URL')
def get_video_info_xml(items):
if not isinstance(items, list):
items = [items]
for item in items:
ret = xpath_text(video_info_xml, './/' + item)
if ret:
return ret
video_info_xml = self._download_xml(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id,
video_id, note='Downloading video info page')
if get_video_info_xml('error'):
error_code = get_video_info_xml('code')
def get_video_info(items):
if not isinstance(items, list):
items = [items]
for item in items:
ret = xpath_text(video_info_xml, './/' + item)
if ret:
return ret
if error_code == 'DELETED':
raise ExtractorError('The video has been deleted.',
expected=True)
elif error_code == 'NOT_FOUND':
raise ExtractorError('The video is not found.',
expected=True)
elif error_code == 'COMMUNITY':
self.to_screen('%s: The video is community members only.' % video_id)
else:
raise ExtractorError('%s reports error: %s' % (self.IE_NAME, error_code))
video_real_url = flv_info['url'][0]
# Start extracting video formats
formats = []
extension = get_video_info('movie_type')
if not extension:
extension = determine_ext(video_real_url)
# Get HTML5 videos info
quality_info = try_get(api_data, lambda x: x['media']['delivery']['movie'])
if not quality_info:
raise ExtractorError('The video can\'t be downloaded', expected=True)
formats = [{
'url': video_real_url,
'ext': extension,
'format_id': _format_id_from_url(video_real_url),
}]
else:
formats = []
for audio_quality in quality_info.get('audios') or {}:
for video_quality in quality_info.get('videos') or {}:
if not audio_quality.get('isAvailable') or not video_quality.get('isAvailable'):
continue
formats.append(self._extract_format_for_quality(
api_data, video_id, audio_quality, video_quality))
dmc_info = api_data['video'].get('dmcInfo')
if dmc_info: # "New" HTML5 videos
quality_info = dmc_info['quality']
for audio_quality in quality_info['audios']:
for video_quality in quality_info['videos']:
if not audio_quality['available'] or not video_quality['available']:
continue
formats.append(self._extract_format_for_quality(
api_data, video_id, audio_quality, video_quality))
# Get flv/swf info
timestamp = None
video_real_url = try_get(api_data, lambda x: x['video']['smileInfo']['url'])
if video_real_url:
is_economy = video_real_url.endswith('low')
self._sort_formats(formats)
else: # "Old" HTML5 videos
formats = [{
if is_economy:
self.report_warning('Site is currently in economy mode! You will only have access to lower quality streams')
# Invoking ffprobe to determine resolution
pp = FFmpegPostProcessor(self._downloader)
cookies = self._get_cookies('https://nicovideo.jp').output(header='', sep='; path=/; domain=nicovideo.jp;\n')
self.to_screen('%s: %s' % (video_id, 'Checking smile format with ffprobe'))
try:
metadata = pp.get_metadata_object(video_real_url, ['-cookies', cookies])
except PostProcessingError as err:
raise ExtractorError(err.msg, expected=True)
v_stream = a_stream = {}
# Some complex swf files doesn't have video stream (e.g. nm4809023)
for stream in metadata['streams']:
if stream['codec_type'] == 'video':
v_stream = stream
elif stream['codec_type'] == 'audio':
a_stream = stream
# Community restricted videos seem to have issues with the thumb API not returning anything at all
filesize = int(
(get_video_info_xml('size_high') if not is_economy else get_video_info_xml('size_low'))
or metadata['format']['size']
)
extension = (
get_video_info_xml('movie_type')
or 'mp4' if 'mp4' in metadata['format']['format_name'] else metadata['format']['format_name']
)
# 'creation_time' tag on video stream of re-encoded SMILEVIDEO mp4 files are '1970-01-01T00:00:00.000000Z'.
timestamp = (
parse_iso8601(get_video_info_web('first_retrieve'))
or unified_timestamp(get_video_info_web('postedDateTime'))
)
metadata_timestamp = (
parse_iso8601(try_get(v_stream, lambda x: x['tags']['creation_time']))
or timestamp if extension != 'mp4' else 0
)
# According to compconf, smile videos from pre-2017 are always better quality than their DMC counterparts
smile_threshold_timestamp = parse_iso8601('2016-12-08T00:00:00+09:00')
is_source = timestamp < smile_threshold_timestamp or metadata_timestamp > 0
# If movie file size is unstable, old server movie is not source movie.
if filesize > 1:
formats.append({
'url': video_real_url,
'ext': 'mp4',
'format_id': _format_id_from_url(video_real_url),
}]
'format_id': 'smile' if not is_economy else 'smile_low',
'format_note': 'SMILEVIDEO source' if not is_economy else 'SMILEVIDEO low quality',
'ext': extension,
'container': extension,
'vcodec': v_stream.get('codec_name'),
'acodec': a_stream.get('codec_name'),
# Some complex swf files doesn't have total bit rate metadata (e.g. nm6049209)
'tbr': int_or_none(metadata['format'].get('bit_rate'), scale=1000),
'vbr': int_or_none(v_stream.get('bit_rate'), scale=1000),
'abr': int_or_none(a_stream.get('bit_rate'), scale=1000),
'height': int_or_none(v_stream.get('height')),
'width': int_or_none(v_stream.get('width')),
'source_preference': 5 if not is_economy else -2,
'quality': 5 if is_source and not is_economy else None,
'filesize': filesize
})
def get_video_info(items):
return dict_get(api_data['video'], items)
self._sort_formats(formats)
# Start extracting information
title = get_video_info('title')
if not title:
title = self._og_search_title(webpage, default=None)
if not title:
title = self._html_search_regex(
title = (
get_video_info_xml('title') # prefer to get the untranslated original title
or get_video_info_web(['originalTitle', 'title'])
or self._og_search_title(webpage, default=None)
or self._html_search_regex(
r'<span[^>]+class="videoHeaderTitle"[^>]*>([^<]+)</span>',
webpage, 'video title')
webpage, 'video title'))
watch_api_data_string = self._html_search_regex(
r'<div[^>]+id="watchAPIDataContainer"[^>]+>([^<]+)</div>',
@@ -372,14 +529,15 @@ class NiconicoIE(InfoExtractor):
video_detail = watch_api_data.get('videoDetail', {})
thumbnail = (
get_video_info(['thumbnail_url', 'thumbnailURL'])
self._html_search_regex(r'<meta property="og:image" content="([^"]+)">', webpage, 'thumbnail data', default=None)
or dict_get( # choose highest from 720p to 240p
get_video_info_web('thumbnail'),
['ogp', 'player', 'largeUrl', 'middleUrl', 'url'])
or self._html_search_meta('image', webpage, 'thumbnail', default=None)
or video_detail.get('thumbnail'))
description = get_video_info('description')
description = get_video_info_web('description')
timestamp = (parse_iso8601(get_video_info('first_retrieve'))
or unified_timestamp(get_video_info('postedDateTime')))
if not timestamp:
match = self._html_search_meta('datePublished', webpage, 'date published', default=None)
if match:
@@ -388,19 +546,25 @@ class NiconicoIE(InfoExtractor):
timestamp = parse_iso8601(
video_detail['postedAt'].replace('/', '-'),
delimiter=' ', timezone=datetime.timedelta(hours=9))
timestamp = timestamp or try_get(api_data, lambda x: parse_iso8601(x['video']['registeredAt']))
view_count = int_or_none(get_video_info(['view_counter', 'viewCount']))
view_count = int_or_none(get_video_info_web(['view_counter', 'viewCount']))
if not view_count:
match = self._html_search_regex(
r'>Views: <strong[^>]*>([^<]+)</strong>',
webpage, 'view count', default=None)
if match:
view_count = int_or_none(match.replace(',', ''))
view_count = view_count or video_detail.get('viewCount')
view_count = (
view_count
or video_detail.get('viewCount')
or try_get(api_data, lambda x: x['video']['count']['view']))
comment_count = (
int_or_none(get_video_info_web('comment_num'))
or video_detail.get('commentCount')
or try_get(api_data, lambda x: x['video']['count']['comment']))
comment_count = (int_or_none(get_video_info('comment_num'))
or video_detail.get('commentCount')
or try_get(api_data, lambda x: x['thread']['commentCount']))
if not comment_count:
match = self._html_search_regex(
r'>Comments: <strong[^>]*>([^<]+)</strong>',
@@ -409,22 +573,41 @@ class NiconicoIE(InfoExtractor):
comment_count = int_or_none(match.replace(',', ''))
duration = (parse_duration(
get_video_info('length')
get_video_info_web('length')
or self._html_search_meta(
'video:duration', webpage, 'video duration', default=None))
or video_detail.get('length')
or get_video_info('duration'))
or get_video_info_web('duration'))
webpage_url = get_video_info('watch_url') or url
webpage_url = get_video_info_web('watch_url') or url
# for channel movie and community movie
channel_id = try_get(
api_data,
(lambda x: x['channel']['globalId'],
lambda x: x['community']['globalId']))
channel = try_get(
api_data,
(lambda x: x['channel']['name'],
lambda x: x['community']['name']))
# Note: cannot use api_data.get('owner', {}) because owner may be set to "null"
# in the JSON, which will cause None to be returned instead of {}.
owner = try_get(api_data, lambda x: x.get('owner'), dict) or {}
uploader_id = get_video_info(['ch_id', 'user_id']) or owner.get('id')
uploader = get_video_info(['ch_name', 'user_nickname']) or owner.get('nickname')
uploader_id = str_or_none(
get_video_info_web(['ch_id', 'user_id'])
or owner.get('id')
or channel_id
)
uploader = (
get_video_info_web(['ch_name', 'user_nickname'])
or owner.get('nickname')
or channel
)
return {
'id': video_id,
'_api_data': api_data,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
@@ -432,6 +615,8 @@ class NiconicoIE(InfoExtractor):
'uploader': uploader,
'timestamp': timestamp,
'uploader_id': uploader_id,
'channel': channel,
'channel_id': channel_id,
'view_count': view_count,
'comment_count': comment_count,
'duration': duration,
@@ -440,7 +625,7 @@ class NiconicoIE(InfoExtractor):
class NiconicoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/(?:user/\d+/)?mylist/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/(?:user/\d+/|my/)?mylist/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.nicovideo.jp/mylist/27411728',
@@ -456,60 +641,185 @@ class NiconicoPlaylistIE(InfoExtractor):
'url': 'https://www.nicovideo.jp/user/805442/mylist/27411728',
'only_matching': True,
}]
_PAGE_SIZE = 100
def _call_api(self, list_id, resource, query):
return self._download_json(
'https://nvapi.nicovideo.jp/v2/mylists/' + list_id, list_id,
'Downloading %s JSON metatdata' % resource, query=query,
headers={'X-Frontend-Id': 6})['data']['mylist']
def _parse_owner(self, item):
owner = item.get('owner') or {}
if owner:
return {
'uploader': owner.get('name'),
'uploader_id': owner.get('id'),
}
return {}
def _fetch_page(self, list_id, page):
page += 1
items = self._call_api(list_id, 'page %d' % page, {
'page': page,
'pageSize': self._PAGE_SIZE,
})['items']
for item in items:
video = item.get('video') or {}
video_id = video.get('id')
if not video_id:
continue
count = video.get('count') or {}
get_count = lambda x: int_or_none(count.get(x))
info = {
'_type': 'url',
'id': video_id,
'title': video.get('title'),
'url': 'https://www.nicovideo.jp/watch/' + video_id,
'description': video.get('shortDescription'),
'duration': int_or_none(video.get('duration')),
'view_count': get_count('view'),
'comment_count': get_count('comment'),
'ie_key': NiconicoIE.ie_key(),
}
info.update(self._parse_owner(video))
yield info
_API_HEADERS = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0'
}
def _real_extract(self, url):
list_id = self._match_id(url)
mylist = self._call_api(list_id, 'list', {
'pageSize': 1,
})
entries = InAdvancePagedList(
functools.partial(self._fetch_page, list_id),
math.ceil(mylist['totalItemCount'] / self._PAGE_SIZE),
self._PAGE_SIZE)
result = self.playlist_result(
entries, list_id, mylist.get('name'), mylist.get('description'))
result.update(self._parse_owner(mylist))
return result
def get_page_data(pagenum, pagesize):
return self._download_json(
'http://nvapi.nicovideo.jp/v2/mylists/' + list_id, list_id,
query={'page': 1 + pagenum, 'pageSize': pagesize},
headers=self._API_HEADERS).get('data').get('mylist')
data = get_page_data(0, 1)
title = data.get('name')
description = data.get('description')
uploader = data.get('owner').get('name')
uploader_id = data.get('owner').get('id')
def pagefunc(pagenum):
data = get_page_data(pagenum, 25)
return ({
'_type': 'url',
'url': 'http://www.nicovideo.jp/watch/' + item.get('watchId'),
} for item in data.get('items'))
return {
'_type': 'playlist',
'id': list_id,
'title': title,
'description': description,
'uploader': uploader,
'uploader_id': uploader_id,
'entries': OnDemandPagedList(pagefunc, 25),
}
class NicovideoSearchBaseIE(InfoExtractor):
_MAX_RESULTS = float('inf')
def _entries(self, url, item_id, query=None, note='Downloading page %(page)s'):
query = query or {}
pages = [query['page']] if 'page' in query else itertools.count(1)
for page_num in pages:
query['page'] = str(page_num)
webpage = self._download_webpage(url, item_id, query=query, note=note % {'page': page_num})
results = re.findall(r'(?<=data-video-id=)["\']?(?P<videoid>.+?)(?=["\'])', webpage)
for item in results:
yield self.url_result('http://www.nicovideo.jp/watch/%s' % item, 'Niconico', item)
if not results:
break
def _get_n_results(self, query, n):
entries = self._entries(self._proto_relative_url('//www.nicovideo.jp/search/%s' % query), query)
if n < self._MAX_RESULTS:
entries = itertools.islice(entries, 0, n)
return self.playlist_result(entries, query, query)
class NicovideoSearchIE(NicovideoSearchBaseIE, SearchInfoExtractor):
IE_DESC = 'Nico video search'
IE_NAME = 'nicovideo:search'
_SEARCH_KEY = 'nicosearch'
def _search_results(self, query):
return self._entries(
self._proto_relative_url('//www.nicovideo.jp/search/%s' % query), query)
class NicovideoSearchURLIE(NicovideoSearchBaseIE):
IE_NAME = '%s_url' % NicovideoSearchIE.IE_NAME
IE_DESC = 'Nico video search URLs'
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/search/(?P<id>[^?#&]+)?'
_TESTS = [{
'url': 'http://www.nicovideo.jp/search/sm9',
'info_dict': {
'id': 'sm9',
'title': 'sm9'
},
'playlist_mincount': 40,
}, {
'url': 'https://www.nicovideo.jp/search/sm9?sort=h&order=d&end=2020-12-31&start=2020-01-01',
'info_dict': {
'id': 'sm9',
'title': 'sm9'
},
'playlist_count': 31,
}]
def _real_extract(self, url):
query = self._match_id(url)
return self.playlist_result(self._entries(url, query), query, query)
class NicovideoSearchDateIE(NicovideoSearchBaseIE, SearchInfoExtractor):
IE_DESC = 'Nico video search, newest first'
IE_NAME = '%s:date' % NicovideoSearchIE.IE_NAME
_SEARCH_KEY = 'nicosearchdate'
_TESTS = [{
'url': 'nicosearchdateall:a',
'info_dict': {
'id': 'a',
'title': 'a'
},
'playlist_mincount': 1610,
}]
_START_DATE = datetime.date(2007, 1, 1)
_RESULTS_PER_PAGE = 32
_MAX_PAGES = 50
def _entries(self, url, item_id, start_date=None, end_date=None):
start_date, end_date = start_date or self._START_DATE, end_date or datetime.datetime.now().date()
# If the last page has a full page of videos, we need to break down the query interval further
last_page_len = len(list(self._get_entries_for_date(
url, item_id, start_date, end_date, self._MAX_PAGES,
note='Checking number of videos from {0} to {1}'.format(start_date, end_date))))
if (last_page_len == self._RESULTS_PER_PAGE and start_date != end_date):
midpoint = start_date + ((end_date - start_date) // 2)
for entry in itertools.chain(
iter(self._entries(url, item_id, midpoint, end_date)),
iter(self._entries(url, item_id, start_date, midpoint))):
yield entry
else:
self.to_screen('{0}: Downloading results from {1} to {2}'.format(item_id, start_date, end_date))
for entry in iter(self._get_entries_for_date(
url, item_id, start_date, end_date, note=' Downloading page %(page)s')):
yield entry
def _get_entries_for_date(self, url, item_id, start_date, end_date=None, page_num=None, note=None):
query = {
'start': compat_str(start_date),
'end': compat_str(end_date or start_date),
'sort': 'f',
'order': 'd',
}
if page_num:
query['page'] = compat_str(page_num)
for entry in iter(super(NicovideoSearchDateIE, self)._entries(url, item_id, query=query, note=note)):
yield entry
class NiconicoUserIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/user/(?P<id>\d+)/?(?:$|[#?])'
_TEST = {
'url': 'https://www.nicovideo.jp/user/419948',
'info_dict': {
'id': '419948',
},
'playlist_mincount': 101,
}
_API_URL = "https://nvapi.nicovideo.jp/v1/users/%s/videos?sortKey=registeredAt&sortOrder=desc&pageSize=%s&page=%s"
_PAGE_SIZE = 100
_API_HEADERS = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0'
}
def _entries(self, list_id):
total_count = 1
count = page_num = 0
while count < total_count:
json_parsed = self._download_json(
self._API_URL % (list_id, self._PAGE_SIZE, page_num + 1), list_id,
headers=self._API_HEADERS,
note='Downloading JSON metadata%s' % (' page %d' % page_num if page_num else ''))
if not page_num:
total_count = int_or_none(json_parsed['data'].get('totalCount'))
for entry in json_parsed["data"]["items"]:
count += 1
yield self.url_result('https://www.nicovideo.jp/watch/%s' % entry['id'])
page_num += 1
def _real_extract(self, url):
list_id = self._match_id(url)
return self.playlist_result(self._entries(list_id), list_id)

Some files were not shown because too many files have changed in this diff Show More