Merge branch 'master' into mediathekviewweb

This commit is contained in:
ckaotik 2021-05-09 13:50:20 +02:00
commit 47f1d70149
91 changed files with 3685 additions and 1638 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2021.02.10** - [ ] I've verified that I'm running youtube-dl version **2021.04.26**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.02.10 [debug] youtube-dl version 2021.04.26
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2021.02.10** - [ ] I've verified that I'm running youtube-dl version **2021.04.26**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2021.02.10** - [ ] I've verified that I'm running youtube-dl version **2021.04.26**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2021.02.10** - [ ] I've verified that I'm running youtube-dl version **2021.04.26**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.02.10 [debug] youtube-dl version 2021.04.26
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2021.02.10** - [ ] I've verified that I'm running youtube-dl version **2021.04.26**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -49,11 +49,18 @@ jobs:
- name: Install Jython - name: Install Jython
if: ${{ matrix.python-impl == 'jython' }} if: ${{ matrix.python-impl == 'jython' }}
run: | run: |
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython" java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose - name: Install nose
if: ${{ matrix.python-impl != 'jython' }}
run: pip install nose run: pip install nose
- name: Install nose (Jython)
if: ${{ matrix.python-impl == 'jython' }}
# Working around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
run: |
wget https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl
pip install nose-1.3.7-py2-none-any.whl
- name: Run tests - name: Run tests
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }} continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env: env:

199
ChangeLog
View File

@ -1,3 +1,202 @@
version 2021.04.26
Extractors
+ [xfileshare] Add support for wolfstream.tv (#28858)
* [francetvinfo] Improve video id extraction (#28792)
* [medaltv] Fix extraction (#28807)
* [tver] Redirect all downloads to Brightcove (#28849)
* [go] Improve video id extraction (#25207, #25216, #26058)
* [youtube] Fix lazy extractors (#28780)
+ [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774)
* [cbsnews] Fix extraction for python <3.6 (#23359)
version 2021.04.17
Core
+ [utils] Add support for experimental HTTP response status code
308 Permanent Redirect (#27877, #28768)
Extractors
+ [lbry] Add support for HLS videos (#27877, #28768)
* [youtube] Fix stretched ratio calculation
* [youtube] Improve stretch extraction (#28769)
* [youtube:tab] Improve grid extraction (#28725)
+ [youtube:tab] Detect series playlist on playlists page (#28723)
+ [youtube] Add more invidious instances (#28706)
* [pluralsight] Extend anti-throttling timeout (#28712)
* [youtube] Improve URL to extractor routing (#27572, #28335, #28742)
+ [maoritv] Add support for maoritelevision.com (#24552)
+ [youtube:tab] Pass innertube context and x-goog-visitor-id header along with
continuation requests (#28702)
* [mtv] Fix Viacom A/B Testing Video Player extraction (#28703)
+ [pornhub] Extract DASH and HLS formats from get_media end point (#28698)
* [cbssports] Fix extraction (#28682)
* [jamendo] Fix track extraction (#28686)
* [curiositystream] Fix format extraction (#26845, #28668)
version 2021.04.07
Core
* [extractor/common] Use compat_cookies_SimpleCookie for _get_cookies
+ [compat] Introduce compat_cookies_SimpleCookie
* [extractor/common] Improve JSON-LD author extraction
* [extractor/common] Fix _get_cookies on python 2 (#20673, #23256, #20326,
#28640)
Extractors
* [youtube] Fix extraction of videos with restricted location (#28685)
+ [line] Add support for live.line.me (#17205, #28658)
* [vimeo] Improve extraction (#28591)
* [youku] Update ccode (#17852, #28447, #28460, #28648)
* [youtube] Prefer direct entry metadata over entry metadata from playlist
(#28619, #28636)
* [screencastomatic] Fix extraction (#11976, #24489)
+ [palcomp3] Add support for palcomp3.com (#13120)
+ [arnes] Add support for video.arnes.si (#28483)
+ [youtube:tab] Add support for hashtags (#28308)
version 2021.04.01
Extractors
* [youtube] Setup CONSENT cookie when needed (#28604)
* [vimeo] Fix password protected review extraction (#27591)
* [youtube] Improve age-restricted video extraction (#28578)
version 2021.03.31
Extractors
* [vlive] Fix inkey request (#28589)
* [francetvinfo] Improve video id extraction (#28584)
+ [instagram] Extract duration (#28469)
* [instagram] Improve title extraction (#28469)
+ [sbs] Add support for ondemand watch URLs (#28566)
* [youtube] Fix video's channel extraction (#28562)
* [picarto] Fix live stream extraction (#28532)
* [vimeo] Fix unlisted video extraction (#28414)
* [youtube:tab] Fix playlist/community continuation items extraction (#28266)
* [ard] Improve clip id extraction (#22724, #28528)
version 2021.03.25
Extractors
+ [zoom] Add support for zoom.us (#16597, #27002, #28531)
* [bbc] Fix BBC IPlayer Episodes/Group extraction (#28360)
* [youtube] Fix default value for youtube_include_dash_manifest (#28523)
* [zingmp3] Fix extraction (#11589, #16409, #16968, #27205)
+ [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
+ [tiktok] Detect private videos (#28453)
* [vimeo:album] Fix extraction for albums with number of videos multiple
to page size (#28486)
* [vvvvid] Fix kenc format extraction (#28473)
* [mlb] Fix video extraction (#21241)
* [svtplay] Improve extraction (#28448)
* [applepodcasts] Fix extraction (#28445)
* [rtve] Improve extraction
+ Extract all formats
* Fix RTVE Infantil extraction (#24851)
+ Extract is_live and series
version 2021.03.14
Core
+ Introduce release_timestamp meta field (#28386)
Extractors
+ [southpark] Add support for southparkstudios.com (#28413)
* [southpark] Fix extraction (#26763, #28413)
* [sportdeutschland] Fix extraction (#21856, #28425)
* [pinterest] Reduce the number of HLS format requests
* [peertube] Improve thumbnail extraction (#28419)
* [tver] Improve title extraction (#28418)
* [fujitv] Fix HLS formats extension (#28416)
* [shahid] Fix format extraction (#28383)
+ [lbry] Add support for channel filters (#28385)
+ [bandcamp] Extract release timestamp
+ [lbry] Extract release timestamp (#28386)
* [pornhub] Detect flagged videos
+ [pornhub] Extract formats from get_media end point (#28395)
* [bilibili] Fix video info extraction (#28341)
+ [cbs] Add support for Paramount+ (#28342)
+ [trovo] Add Origin header to VOD formats (#28346)
* [voxmedia] Fix volume embed extraction (#28338)
version 2021.03.03
Extractors
* [youtube:tab] Switch continuation to browse API (#28289, #28327)
* [9c9media] Fix extraction for videos with multiple ContentPackages (#28309)
+ [bbc] Add support for BBC Reel videos (#21870, #23660, #28268)
version 2021.03.02
Extractors
* [zdf] Rework extractors (#11606, #13473, #17354, #21185, #26711, #27068,
#27930, #28198, #28199, #28274)
* Generalize cross-extractor video ids for zdf based extractors
* Improve extraction
* Fix 3sat and phoenix
* [stretchinternet] Fix extraction (#28297)
* [urplay] Fix episode data extraction (#28292)
+ [bandaichannel] Add support for b-ch.com (#21404)
* [srgssr] Improve extraction (#14717, #14725, #27231, #28238)
+ Extract subtitle
* Fix extraction for new videos
* Update srf download domains
* [vvvvid] Reduce season request payload size
+ [vvvvid] Extract series sublists playlist title (#27601, #27618)
+ [dplay] Extract Ad-Free uplynk URLs (#28160)
+ [wat] Detect DRM protected videos (#27958)
* [tf1] Improve extraction (#27980, #28040)
* [tmz] Fix and improve extraction (#24603, #24687, 28211)
+ [gedidigital] Add support for Gedi group sites (#7347, #26946)
* [youtube] Fix get_video_info request
version 2021.02.22
Core
+ [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowercase
(#28112)
Extractors
* [apa] Fix and improve extraction (#27750)
+ [youporn] Extract duration (#28019)
+ [peertube] Add support for canard.tube (#28190)
* [youtube] Fixup m4a_dash formats (#28165)
+ [samplefocus] Add support for samplefocus.com (#27763)
+ [vimeo] Add support for unlisted video source format extraction
* [viki] Improve extraction (#26522, #28203)
* Extract uploader URL and episode number
* Report login required error
+ Extract 480p formats
* Fix API v4 calls
* [ninegag] Unescape title (#28201)
* [youtube] Improve URL regular expression (#28193)
+ [youtube] Add support for redirect.invidious.io (#28193)
+ [dplay] Add support for de.hgtv.com (#28182)
+ [dplay] Add support for discoveryplus.com (#24698)
+ [simplecast] Add support for simplecast.com (#24107)
* [youtube] Fix uploader extraction in flat playlist mode (#28045)
* [yandexmusic:playlist] Request missing tracks in chunks (#27355, #28184)
+ [storyfire] Add support for storyfire.com (#25628, #26349)
+ [zhihu] Add support for zhihu.com (#28177)
* [youtube] Fix controversial videos when authenticated with cookies (#28174)
* [ccma] Fix timestamp parsing in python 2
+ [videopress] Add support for video.wordpress.com
* [kakao] Improve info extraction and detect geo restriction (#26577)
* [xboxclips] Fix extraction (#27151)
* [ard] Improve formats extraction (#28155)
+ [canvas] Add support for dagelijksekost.een.be (#28119)
version 2021.02.10 version 2021.02.10
Extractors Extractors

View File

@ -3,6 +3,7 @@
- **20min** - **20min**
- **220.ro** - **220.ro**
- **23video** - **23video**
- **247sports**
- **24video** - **24video**
- **3qsdn**: 3Q SDN - **3qsdn**: 3Q SDN
- **3sat** - **3sat**
@ -82,6 +83,7 @@
- **awaan:video** - **awaan:video**
- **AZMedien**: AZ Medien videos - **AZMedien**: AZ Medien videos
- **BaiduVideo**: 百度视频 - **BaiduVideo**: 百度视频
- **bandaichannel**
- **Bandcamp** - **Bandcamp**
- **Bandcamp:album** - **Bandcamp:album**
- **Bandcamp:weekly** - **Bandcamp:weekly**
@ -89,7 +91,8 @@
- **bbc**: BBC - **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer - **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist** - **bbc.co.uk:iplayer:episodes**
- **bbc.co.uk:iplayer:group**
- **bbc.co.uk:playlist** - **bbc.co.uk:playlist**
- **BBVTV** - **BBVTV**
- **Beatport** - **Beatport**
@ -158,7 +161,8 @@
- **cbsnews**: CBS News - **cbsnews**: CBS News
- **cbsnews:embed** - **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports** - **cbssports**
- **cbssports:embed**
- **CCMA** - **CCMA**
- **CCTV**: 央视网 - **CCTV**: 央视网
- **CDA** - **CDA**
@ -212,6 +216,7 @@
- **curiositystream** - **curiositystream**
- **curiositystream:collection** - **curiositystream:collection**
- **CWTV** - **CWTV**
- **DagelijkseKost**: dagelijksekost.een.be
- **DailyMail** - **DailyMail**
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
@ -233,6 +238,7 @@
- **DiscoveryGo** - **DiscoveryGo**
- **DiscoveryGoPlaylist** - **DiscoveryGoPlaylist**
- **DiscoveryNetworksDe** - **DiscoveryNetworksDe**
- **DiscoveryPlus**
- **DiscoveryVR** - **DiscoveryVR**
- **Disney** - **Disney**
- **dlive:stream** - **dlive:stream**
@ -328,6 +334,7 @@
- **Gaskrank** - **Gaskrank**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **GediDigital**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
- **Gfycat** - **Gfycat**
- **GiantBomb** - **GiantBomb**
@ -353,6 +360,7 @@
- **HentaiStigma** - **HentaiStigma**
- **hetklokhuis** - **hetklokhuis**
- **hgtv.com:show** - **hgtv.com:show**
- **HGTVDe**
- **HiDive** - **HiDive**
- **HistoricFilms** - **HistoricFilms**
- **history:player** - **history:player**
@ -457,6 +465,8 @@
- **limelight** - **limelight**
- **limelight:channel** - **limelight:channel**
- **limelight:channel_list** - **limelight:channel_list**
- **LineLive**
- **LineLiveChannel**
- **LineTV** - **LineTV**
- **linkedin:learning** - **linkedin:learning**
- **linkedin:learning:course** - **linkedin:learning:course**
@ -482,6 +492,7 @@
- **mangomolo:live** - **mangomolo:live**
- **mangomolo:video** - **mangomolo:video**
- **ManyVids** - **ManyVids**
- **MaoriTV**
- **Markiza** - **Markiza**
- **MarkizaPage** - **MarkizaPage**
- **massengeschmack.tv** - **massengeschmack.tv**
@ -517,6 +528,7 @@
- **mixcloud:playlist** - **mixcloud:playlist**
- **mixcloud:user** - **mixcloud:user**
- **MLB** - **MLB**
- **MLBVideo**
- **Mnet** - **Mnet**
- **MNetTV** - **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
@ -672,6 +684,9 @@
- **OutsideTV** - **OutsideTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PalcoMP3:artist**
- **PalcoMP3:song**
- **PalcoMP3:video**
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **ParamountNetwork** - **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
@ -803,6 +818,7 @@
- **safari:course**: safaribooksonline.com online courses - **safari:course**: safaribooksonline.com online courses
- **SAKTV** - **SAKTV**
- **SaltTV** - **SaltTV**
- **SampleFocus**
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
- **savefrom.net** - **savefrom.net**
- **SBS**: sbs.com.au - **SBS**: sbs.com.au
@ -825,6 +841,9 @@
- **ShahidShow** - **ShahidShow**
- **Shared**: shared.sx - **Shared**: shared.sx
- **ShowRoomLive** - **ShowRoomLive**
- **simplecast**
- **simplecast:episode**
- **simplecast:podcast**
- **Sina** - **Sina**
- **sky.it** - **sky.it**
- **sky:news** - **sky:news**
@ -877,6 +896,9 @@
- **Steam** - **Steam**
- **Stitcher** - **Stitcher**
- **StitcherShow** - **StitcherShow**
- **StoryFire**
- **StoryFireSeries**
- **StoryFireUser**
- **Streamable** - **Streamable**
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
@ -1045,6 +1067,7 @@
- **Vidbit** - **Vidbit**
- **Viddler** - **Viddler**
- **Videa** - **Videa**
- **video.arnes.si**: Arnes Video
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.sky.it** - **video.sky.it**
- **video.sky.it:live** - **video.sky.it:live**
@ -1139,7 +1162,7 @@
- **WWE** - **WWE**
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing - **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, WolfStream, XVideoSharing
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **XHamsterUser** - **XHamsterUser**
@ -1198,5 +1221,8 @@
- **ZattooLive** - **ZattooLive**
- **ZDF** - **ZDF**
- **ZDFChannel** - **ZDFChannel**
- **Zhihu**
- **zingmp3**: mp3.zing.vn - **zingmp3**: mp3.zing.vn
- **zingmp3:album**
- **zoom**
- **Zype** - **Zype**

View File

@ -70,15 +70,6 @@ class TestAllURLsMatching(unittest.TestCase):
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url']) # self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url']) # self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_facebook_matching(self): def test_facebook_matching(self):
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268')) self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793')) self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793'))

View File

@ -39,6 +39,16 @@ class TestExecution(unittest.TestCase):
_, stderr = p.communicate() _, stderr = p.communicate()
self.assertFalse(stderr) self.assertFalse(stderr)
def test_lazy_extractors(self):
try:
subprocess.check_call([sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, 'test/test_all_urls.py'], cwd=rootDir, stdout=_DEV_NULL)
finally:
try:
os.remove('youtube_dl/extractor/lazy_extractors.py')
except (IOError, OSError):
pass
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

26
test/test_youtube_misc.py Normal file
View File

@ -0,0 +1,26 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeIE
class TestYoutubeMisc(unittest.TestCase):
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
if __name__ == '__main__':
unittest.main()

View File

@ -773,11 +773,20 @@ class YoutubeDL(object):
def extract_info(self, url, download=True, ie_key=None, extra_info={}, def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True, force_generic_extractor=False): process=True, force_generic_extractor=False):
''' """
Returns a list with a dictionary for each video we find. Return a list with a dictionary for each video extracted.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result Arguments:
''' url -- URL to extract
Keyword arguments:
download -- whether to download videos during extraction
ie_key -- extractor key hint
extra_info -- dictionary containing the extra values to add to each result
process -- whether to resolve all unresolved references (URLs, playlist items),
must be True for download to work.
force_generic_extractor -- force using the generic extractor
"""
if not ie_key and force_generic_extractor: if not ie_key and force_generic_extractor:
ie_key = 'Generic' ie_key = 'Generic'
@ -1511,12 +1520,16 @@ class YoutubeDL(object):
if 'display_id' not in info_dict and 'id' in info_dict: if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id'] info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None: for ts_key, date_key in (
('timestamp', 'upload_date'),
('release_timestamp', 'release_date'),
):
if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows, # Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728) # see http://bugs.python.org/issue1646728)
try: try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp']) upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d') info_dict[date_key] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError): except (ValueError, OverflowError, OSError):
pass pass

View File

@ -73,6 +73,15 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import Cookie as compat_cookies import Cookie as compat_cookies
if sys.version_info[0] == 2:
class compat_cookies_SimpleCookie(compat_cookies.SimpleCookie):
def load(self, rawdata):
if isinstance(rawdata, compat_str):
rawdata = str(rawdata)
return super(compat_cookies_SimpleCookie, self).load(rawdata)
else:
compat_cookies_SimpleCookie = compat_cookies.SimpleCookie
try: try:
import html.entities as compat_html_entities import html.entities as compat_html_entities
except ImportError: # Python 2 except ImportError: # Python 2
@ -3000,6 +3009,7 @@ __all__ = [
'compat_cookiejar', 'compat_cookiejar',
'compat_cookiejar_Cookie', 'compat_cookiejar_Cookie',
'compat_cookies', 'compat_cookies',
'compat_cookies_SimpleCookie',
'compat_ctypes_WINFUNCTYPE', 'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element', 'compat_etree_Element',
'compat_etree_fromstring', 'compat_etree_fromstring',

View File

@ -6,25 +6,21 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
js_to_json, int_or_none,
url_or_none, url_or_none,
) )
class APAIE(InfoExtractor): class APAIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})' _VALID_URL = r'(?P<base_url>https?://[^/]+\.apa\.at)/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{ _TESTS = [{
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029', 'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
'md5': '2b12292faeb0a7d930c778c7a5b4759b', 'md5': '2b12292faeb0a7d930c778c7a5b4759b',
'info_dict': { 'info_dict': {
'id': 'jjv85FdZ', 'id': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'ext': 'mp4', 'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview', 'title': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
}, },
}, { }, {
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78', 'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
@ -46,9 +42,11 @@ class APAIE(InfoExtractor):
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id, base_url = mobj.group('id', 'base_url')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(
'%s/player/%s' % (base_url, video_id), video_id)
jwplatform_id = self._search_regex( jwplatform_id = self._search_regex(
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage, r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
@ -59,16 +57,18 @@ class APAIE(InfoExtractor):
'jwplatform:' + jwplatform_id, ie='JWPlatform', 'jwplatform:' + jwplatform_id, ie='JWPlatform',
video_id=video_id) video_id=video_id)
sources = self._parse_json( def extract(field, name=None):
self._search_regex( return self._search_regex(
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'), r'\b%s["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % field,
video_id, transform_source=js_to_json) webpage, name or field, default=None, group='value')
title = extract('title') or video_id
description = extract('description')
thumbnail = extract('poster', 'thumbnail')
formats = [] formats = []
for source in sources: for format_id in ('hls', 'progressive'):
if not isinstance(source, dict): source_url = url_or_none(extract(format_id))
continue
source_url = url_or_none(source.get('file'))
if not source_url: if not source_url:
continue continue
ext = determine_ext(source_url) ext = determine_ext(source_url)
@ -77,18 +77,19 @@ class APAIE(InfoExtractor):
source_url, video_id, 'mp4', entry_protocol='m3u8_native', source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
else: else:
height = int_or_none(self._search_regex(
r'(\d+)\.mp4', source_url, 'height', default=None))
formats.append({ formats.append({
'url': source_url, 'url': source_url,
'format_id': format_id,
'height': height,
}) })
self._sort_formats(formats) self._sort_formats(formats)
thumbnail = self._search_regex(
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
return { return {
'id': video_id, 'id': video_id,
'title': video_id, 'title': title,
'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'formats': formats, 'formats': formats,
} }

View File

@ -42,6 +42,7 @@ class ApplePodcastsIE(InfoExtractor):
ember_data = self._parse_json(self._search_regex( ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<', r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id) webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes'] episode = ember_data['data']['attributes']
description = episode.get('description') or {} description = episode.get('description') or {}

View File

@ -335,7 +335,7 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE): class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)' _VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?:[^/]+/)?(?:player|live|video)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/', 'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
'md5': 'a1dc75a39c61601b980648f7c9f9f71d', 'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
@ -365,22 +365,22 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}, { }, {
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg', 'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/video/coronavirus-update-ndr-info/astrazeneca-kurz-lockdown-und-pims-syndrom-81/ndr/Y3JpZDovL25kci5kZS84NzE0M2FjNi0wMWEwLTQ5ODEtOTE5NS1mOGZhNzdhOTFmOTI/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
player_page = self._download_json( player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway', 'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({ video_id, data=json.dumps({
'query': '''{ 'query': '''{
playerPage(client:"%s", clipId: "%s") { playerPage(client: "ard", clipId: "%s") {
blockedByFsk blockedByFsk
broadcastedOn broadcastedOn
maturityContentRating maturityContentRating
@ -410,7 +410,7 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
} }
} }
} }
}''' % (mobj.group('client'), video_id), }''' % video_id,
}).encode(), headers={ }).encode(), headers={
'Content-Type': 'application/json' 'Content-Type': 'application/json'
})['data']['playerPage'] })['data']['playerPage']
@ -435,7 +435,6 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None)) r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({ info.update({
'age_limit': age_limit, 'age_limit': age_limit,
'display_id': display_id,
'title': title, 'title': title,
'description': description, 'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')), 'timestamp': unified_timestamp(player_page.get('broadcastedOn')),

View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View File

@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from ..utils import extract_attributes
class BandaiChannelIE(BrightcoveNewIE):
IE_NAME = 'bandaichannel'
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
_TESTS = [{
'url': 'https://www.b-ch.com/titles/514/001',
'md5': 'a0f2d787baa5729bed71108257f613a4',
'info_dict': {
'id': '6128044564001',
'ext': 'mp4',
'title': 'メタルファイターMIKU 第1話',
'timestamp': 1580354056,
'uploader_id': '5797077852001',
'upload_date': '20200130',
'duration': 1387.733,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
attrs = extract_attributes(self._search_regex(
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
bc = self._download_json(
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
return self._parse_brightcove_metadata(bc, bc['id'])

View File

@ -49,6 +49,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Ben Prunty', 'uploader': 'Ben Prunty',
'timestamp': 1396508491, 'timestamp': 1396508491,
'upload_date': '20140403', 'upload_date': '20140403',
'release_timestamp': 1396483200,
'release_date': '20140403', 'release_date': '20140403',
'duration': 260.877, 'duration': 260.877,
'track': 'Lanius (Battle)', 'track': 'Lanius (Battle)',
@ -69,6 +70,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Mastodon', 'uploader': 'Mastodon',
'timestamp': 1322005399, 'timestamp': 1322005399,
'upload_date': '20111122', 'upload_date': '20111122',
'release_timestamp': 1076112000,
'release_date': '20040207', 'release_date': '20040207',
'duration': 120.79, 'duration': 120.79,
'track': 'Hail to Fire', 'track': 'Hail to Fire',
@ -197,7 +199,7 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': artist, 'uploader': artist,
'timestamp': timestamp, 'timestamp': timestamp,
'release_date': unified_strdate(tralbum.get('album_release_date')), 'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
'duration': duration, 'duration': duration,
'track': track, 'track': track,
'track_number': track_number, 'track_number': track_number,

View File

@ -1,31 +1,39 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import functools
import itertools import itertools
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html, clean_html,
dict_get, dict_get,
ExtractorError,
float_or_none, float_or_none,
get_element_by_class, get_element_by_class,
int_or_none, int_or_none,
js_to_json, js_to_json,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
strip_or_none,
try_get, try_get,
unescapeHTML, unescapeHTML,
unified_timestamp,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_urlparse,
)
class BBCCoUkIE(InfoExtractor): class BBCCoUkIE(InfoExtractor):
@ -756,8 +764,17 @@ class BBCIE(BBCCoUkIE):
'only_matching': True, 'only_matching': True,
}, { }, {
# custom redirection to www.bbc.com # custom redirection to www.bbc.com
# also, video with window.__INITIAL_DATA__
'url': 'http://www.bbc.co.uk/news/science-environment-33661876', 'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True, 'info_dict': {
'id': 'p02xzws1',
'ext': 'mp4',
'title': "Pluto may have 'nitrogen glaciers'",
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, { }, {
# single video article embedded with data-media-vpid # single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187', 'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
@ -793,11 +810,25 @@ class BBCIE(BBCCoUkIE):
'description': 'Learn English words and phrases from this story', 'description': 'Learn English words and phrases from this story',
}, },
'add_ie': [BBCCoUkIE.ie_key()], 'add_ie': [BBCCoUkIE.ie_key()],
}, {
# BBC Reel
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
'info_dict': {
'id': 'p07c6sb9',
'ext': 'mp4',
'title': 'How positive thinking is harming your happiness',
'alt_title': 'The downsides of positive thinking',
'description': 'md5:fad74b31da60d83b8265954ee42d85b4',
'duration': 235,
'thumbnail': r're:https?://.+/p07c9dsr.jpg',
'upload_date': '20190604',
'categories': ['Psychology'],
},
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE) EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE) return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url)) else super(BBCIE, cls).suitable(url))
@ -929,7 +960,7 @@ class BBCIE(BBCCoUkIE):
else: else:
entry['title'] = info['title'] entry['title'] = info['title']
entry['formats'].extend(info['formats']) entry['formats'].extend(info['formats'])
except Exception as e: except ExtractorError as e:
# Some playlist URL may fail with 500, at the same time # Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g. # the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu) # http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
@ -980,6 +1011,37 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
# bbc reel (e.g. https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness)
initial_data = self._parse_json(self._html_search_regex(
r'<script[^>]+id=(["\'])initial-data\1[^>]+data-json=(["\'])(?P<json>(?:(?!\2).)+)',
webpage, 'initial data', default='{}', group='json'), playlist_id, fatal=False)
if initial_data:
init_data = try_get(
initial_data, lambda x: x['initData']['items'][0], dict) or {}
smp_data = init_data.get('smpData') or {}
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
version_id = clip_data.get('versionID')
if version_id:
title = smp_data['title']
formats, subtitles = self._download_media_selector(version_id)
self._sort_formats(formats)
image_url = smp_data.get('holdingImageURL')
display_date = init_data.get('displayDate')
topic_title = init_data.get('topicTitle')
return {
'id': version_id,
'title': title,
'formats': formats,
'alt_title': init_data.get('shortTitle'),
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
'description': smp_data.get('summary') or init_data.get('shortSummary'),
'upload_date': display_date.replace('-', '') if display_date else None,
'subtitles': subtitles,
'duration': int_or_none(clip_data.get('duration')),
'categories': [topic_title] if topic_title else None,
}
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975) # Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video # There are several setPayload calls may be present but the video
# seems to be always related to the first one # seems to be always related to the first one
@ -1041,7 +1103,7 @@ class BBCIE(BBCCoUkIE):
thumbnail = None thumbnail = None
image_url = current_programme.get('image_url') image_url = current_programme.get('image_url')
if image_url: if image_url:
thumbnail = image_url.replace('{recipe}', '1920x1920') thumbnail = image_url.replace('{recipe}', 'raw')
return { return {
'id': programme_id, 'id': programme_id,
'title': title, 'title': title,
@ -1114,12 +1176,29 @@ class BBCIE(BBCCoUkIE):
continue continue
formats, subtitles = self._download_media_selector(item_id) formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats) self._sort_formats(formats)
item_desc = None
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
if blocks:
summary = []
for block in blocks:
text = try_get(block, lambda x: x['model']['text'], compat_str)
if text:
summary.append(text)
if summary:
item_desc = '\n\n'.join(summary)
item_time = None
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
if try_get(meta, lambda x: x['label']) == 'Published':
item_time = unified_timestamp(meta.get('timestamp'))
break
entries.append({ entries.append({
'id': item_id, 'id': item_id,
'title': item_title, 'title': item_title,
'thumbnail': item.get('holdingImageUrl'), 'thumbnail': item.get('holdingImageUrl'),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'timestamp': item_time,
'description': strip_or_none(item_desc),
}) })
for resp in (initial_data.get('data') or {}).values(): for resp in (initial_data.get('data') or {}).values():
name = resp.get('name') name = resp.get('name')
@ -1293,21 +1372,149 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
playlist_id, title, description) playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE): class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
IE_NAME = 'bbc.co.uk:iplayer:playlist' _VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s' @staticmethod
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)' def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
_TESTS = [{ _TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v', 'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': { 'info_dict': {
'id': 'b05rcz9v', 'id': 'b05rcz9v',
'title': 'The Disappearance', 'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.', 'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
}, },
'playlist_mincount': 6, 'playlist_mincount': 8,
'skip': 'This programme is not currently available on BBC iPlayer',
}, { }, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes # Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32', 'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': { 'info_dict': {
@ -1316,14 +1523,56 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7', 'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
}, },
'playlist_mincount': 10, 'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}] }]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _extract_title_and_description(self, webpage): def _get_episode_image(self, episode):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False) return self._get_default(episode, 'images', 'standard')
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>', def _get_episode_field(self, episode, field):
webpage, 'description', fatal=False, group='value') return episode.get(field)
return title, description
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE): class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View File

@ -156,6 +156,7 @@ class BiliBiliIE(InfoExtractor):
cid = js['result']['cid'] cid = js['result']['cid']
headers = { headers = {
'Accept': 'application/json',
'Referer': url 'Referer': url
} }
headers.update(self.geo_verification_headers()) headers.update(self.geo_verification_headers())

View File

@ -1,86 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
remove_start,
int_or_none,
)
class BlinkxIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
IE_NAME = 'blinkx'
_TEST = {
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
'md5': '337cf7a344663ec79bf93a526a2e06c7',
'info_dict': {
'id': 'Da0Gw3xc',
'ext': 'mp4',
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
'uploader': 'IGN News',
'upload_date': '20150217',
'timestamp': 1424215740,
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
'duration': 47.743333,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id[:8]
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
+ 'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
duration = None
thumbnails = []
formats = []
for m in data['media']:
if m['type'] == 'jpg':
thumbnails.append({
'url': m['link'],
'width': int(m['w']),
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
tbr = vbr + abr if vbr and abr else None
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
'vcodec': vcodec,
'acodec': acodec,
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(m.get('w')),
'height': int_or_none(m.get('h')),
})
self._sort_formats(formats)
return {
'id': display_id,
'fullid': video_id,
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,
}

View File

@ -27,7 +27,7 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)' _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@ -52,6 +52,9 @@ class CBSIE(CBSBaseIE):
}, { }, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}] }]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517): def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):

View File

@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
def _real_extract(self, url): def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode( item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))), compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0] -zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews') return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View File

@ -1,38 +1,113 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .cbs import CBSBaseIE import re
# from .cbs import CBSBaseIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
)
class CBSSportsIE(CBSBaseIE): # class CBSSportsEmbedIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)' class CBSSportsEmbedIE(InfoExtractor):
IE_NAME = 'cbssports:embed'
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
(?:
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
pcid%3D(?P<pcid>\d+)
)'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/', 'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
'info_dict': { 'only_matching': True,
'id': '1214315075735',
'ext': 'mp4',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/', 'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
'only_matching': True, 'only_matching': True,
}] }]
def _extract_video_info(self, filter_query, video_id): # def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id) # return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
uuid, pcid = re.match(self._VALID_URL, url).groups()
query = {'id': uuid} if uuid else {'pcid': pcid}
video = self._download_json(
'https://www.cbssports.com/api/content/video/',
uuid or pcid, query=query)[0]
video_id = video['id']
title = video['title']
metadata = video.get('metaData') or {}
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
formats = self._extract_m3u8_formats(
metadata['files'][0]['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
image = video.get('image')
thumbnails = None
if image:
image_path = image.get('path')
if image_path:
thumbnails = [{
'url': image_path,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
'filesize': int_or_none(image.get('size')),
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video.get('description'),
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
'duration': int_or_none(metadata.get('duration')),
}
class CBSSportsBaseIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( iframe_url = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'], r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
webpage, 'video id') webpage, 'embed url')
return self._extract_video_info('byId=%s' % video_id, video_id) return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
class CBSSportsIE(CBSSportsBaseIE):
IE_NAME = 'cbssports'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
'info_dict': {
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
'ext': 'mp4',
'title': 'Cover 3: Stanford Spring Gleaning',
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
'timestamp': 1617218398,
'upload_date': '20210331',
'duration': 502,
},
}]
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
IE_NAME = '247sports'
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
'info_dict': {
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
'ext': 'mp4',
'title': '2021 QB Jake Garcia senior highlights through five games',
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
'timestamp': 1607114223,
'upload_date': '20201204',
'duration': 208,
},
}]

View File

@ -133,6 +133,8 @@ class CDAIE(InfoExtractor):
'age_limit': 18 if need_confirm_age else 0, 'age_limit': 18 if need_confirm_age else 0,
} }
info = self._search_json_ld(webpage, video_id, default={})
# Source: https://www.cda.pl/js/player.js?t=1606154898 # Source: https://www.cda.pl/js/player.js?t=1606154898
def decrypt_file(a): def decrypt_file(a):
for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'): for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'):
@ -197,7 +199,7 @@ class CDAIE(InfoExtractor):
handler = self._download_webpage handler = self._download_webpage
webpage = handler( webpage = handler(
self._BASE_URL + href, video_id, urljoin(self._BASE_URL, href), video_id,
'Downloading %s version information' % resolution, fatal=False) 'Downloading %s version information' % resolution, fatal=False)
if not webpage: if not webpage:
# Manually report warning because empty page is returned when # Manually report warning because empty page is returned when
@ -209,6 +211,4 @@ class CDAIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts(info_dict, info) return merge_dicts(info_dict, info)

View File

@ -17,7 +17,7 @@ import math
from ..compat import ( from ..compat import (
compat_cookiejar_Cookie, compat_cookiejar_Cookie,
compat_cookies, compat_cookies_SimpleCookie,
compat_etree_Element, compat_etree_Element,
compat_etree_fromstring, compat_etree_fromstring,
compat_getpass, compat_getpass,
@ -230,8 +230,10 @@ class InfoExtractor(object):
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
license: License name the video is licensed under. license: License name the video is licensed under.
creator: The creator of the video. creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released. release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available. timestamp: UNIX timestamp of the moment the video became available
(uploaded).
upload_date: Video upload date (YYYYMMDD). upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp. If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader. uploader_id: Nickname or id of the video uploader.
@ -1273,6 +1275,7 @@ class InfoExtractor(object):
def extract_video_object(e): def extract_video_object(e):
assert e['@type'] == 'VideoObject' assert e['@type'] == 'VideoObject'
author = e.get('author')
info.update({ info.update({
'url': url_or_none(e.get('contentUrl')), 'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')), 'title': unescapeHTML(e.get('name')),
@ -1280,7 +1283,11 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')), 'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')), 'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')), 'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')), # author can be an instance of 'Organization' or 'Person' types.
# both types can have 'name' property(inherited from 'Thing' type). [1]
# however some websites are using 'Text' type instead.
# 1. https://schema.org/VideoObject
'uploader': author.get('name') if isinstance(author, dict) else author if isinstance(author, compat_str) else None,
'filesize': float_or_none(e.get('contentSize')), 'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')), 'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')), 'width': int_or_none(e.get('width')),
@ -2894,10 +2901,10 @@ class InfoExtractor(object):
self._downloader.cookiejar.set_cookie(cookie) self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url): def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """ """ Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url) req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req) self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie')) return compat_cookies_SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie): def _apply_first_set_cookie_header(self, url_handle, cookie):
""" """

View File

@ -25,12 +25,12 @@ class CuriosityStreamBaseIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True) '%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id): def _call_api(self, path, video_id, query=None):
headers = {} headers = {}
if self._auth_token: if self._auth_token:
headers['X-Auth-Token'] = self._auth_token headers['X-Auth-Token'] = self._auth_token
result = self._download_json( result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers) self._API_BASE_URL + path, video_id, headers=headers, query=query)
self._handle_errors(result) self._handle_errors(result)
return result['data'] return result['data']
@ -52,27 +52,38 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)' _VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'https://app.curiositystream.com/video/2', 'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': { 'info_dict': {
'id': '2', 'id': '2',
'ext': 'mp4', 'ext': 'mp4',
'title': 'How Did You Develop The Internet?', 'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.', 'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
} },
'params': {
'format': 'bestvideo',
# m3u8 download
'skip_download': True,
},
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title']
formats = [] formats = []
for encoding_format in ('m3u8', 'mpd'):
media = self._call_api('media/' + video_id, video_id, query={
'encodingsNew': 'true',
'encodingsFormat': encoding_format,
})
for encoding in media.get('encodings', []): for encoding in media.get('encodings', []):
m3u8_url = encoding.get('master_playlist_url') playlist_url = encoding.get('master_playlist_url')
if m3u8_url: if encoding_format == 'm3u8':
# use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', playlist_url, video_id, 'mp4',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
elif encoding_format == 'mpd':
formats.extend(self._extract_mpd_formats(
playlist_url, video_id, mpd_id='dash', fatal=False))
encoding_url = encoding.get('url') encoding_url = encoding.get('url')
file_url = encoding.get('file_url') file_url = encoding.get('file_url')
if not encoding_url and not file_url: if not encoding_url and not file_url:
@ -108,6 +119,8 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
formats.append(fmt) formats.append(fmt)
self._sort_formats(formats) self._sort_formats(formats)
title = media['title']
subtitles = {} subtitles = {}
for closed_caption in media.get('closed_captions', []): for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file') sub_url = closed_caption.get('file')
@ -140,7 +153,7 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'title': 'Curious Minds: The Internet', 'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?', 'description': 'How is the internet shaping our lives in the 21st Century?',
}, },
'playlist_mincount': 17, 'playlist_mincount': 16,
}, { }, {
'url': 'https://curiositystream.com/series/2', 'url': 'https://curiositystream.com/series/2',
'only_matching': True, 'only_matching': True,

View File

@ -32,6 +32,18 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1013700/Advanced-Material # From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml', 'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True, 'only_matching': True,
}, {
# From https://gdcvault.com/play/1016624, empty speakerVideo
'url': 'https://sevt.dispeak.com/ubm/gdc/online12/xml/201210-822101_1349794556671DDDD.xml',
'info_dict': {
'id': '201210-822101_1349794556671DDDD',
'ext': 'flv',
'title': 'Pre-launch - Preparing to Take the Plunge',
},
}, {
# From http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru, empty slideVideo
'url': 'http://events.digitallyspeaking.com/gdc/project25/xml/p25-miyamoto1999_1282467389849HSVB.xml',
'only_matching': True,
}] }]
def _parse_mp4(self, metadata): def _parse_mp4(self, metadata):
@ -84,25 +96,19 @@ class DigitallySpeakingIE(InfoExtractor):
'vcodec': 'none', 'vcodec': 'none',
'format_id': audio.get('code'), 'format_id': audio.get('code'),
}) })
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True) for video_key, format_id, preference in (
('slide', 'slides', -2), ('speaker', 'speaker', -1)):
video_path = xpath_text(metadata, './%sVideo' % video_key)
if not video_path:
continue
formats.append({ formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'), 'play_path': remove_end(video_path, '.flv'),
'ext': 'flv', 'ext': 'flv',
'format_note': 'slide deck video', 'format_note': '%s video' % video_key,
'quality': -2, 'quality': preference,
'preference': -2, 'preference': preference,
'format_id': 'slides', 'format_id': format_id,
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
}) })
return formats return formats

View File

@ -330,6 +330,7 @@ class DiscoveryPlusIE(DPlayIE):
'videoId': video_id, 'videoId': video_id,
'wisteriaProperties': { 'wisteriaProperties': {
'platform': 'desktop', 'platform': 'desktop',
'product': 'dplus_us',
}, },
}).encode('utf-8'))['data']['attributes']['streaming'] }).encode('utf-8'))['data']['attributes']['streaming']

View File

@ -1,193 +1,43 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .zdf import ZDFIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
xpath_text,
determine_ext,
float_or_none,
ExtractorError,
)
class DreiSatIE(InfoExtractor): class DreiSatIE(ZDFIE):
IE_NAME = '3sat' IE_NAME = '3sat'
_GEO_COUNTRIES = ['DE'] _VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)' _TESTS = [{
_TESTS = [ # Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html
{ 'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html',
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918', 'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
'md5': 'be37228896d30a88f315b638900a026e',
'info_dict': { 'info_dict': {
'id': '45918', 'id': '141007_ab18_10wochensommer_film',
'ext': 'mp4',
'title': 'Ab 18! - 10 Wochen Sommer',
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26',
'duration': 2660,
'timestamp': 1608604200,
'upload_date': '20201222',
},
}, {
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html',
'info_dict': {
'id': '140913_sendung_schweizweit',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Waidmannsheil', 'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817', 'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': 'SCHWEIZWEIT', 'timestamp': 1410623100,
'uploader_id': '100000210',
'upload_date': '20140913' 'upload_date': '20140913'
}, },
'params': { 'params': {
'skip_download': True, # m3u8 downloads 'skip_download': True,
} }
}, }, {
{ # Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
'url': 'http://www.3sat.de/mediathek/mediathek.php?mode=play&obj=51066', 'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
'only_matching': True, 'only_matching': True,
}, }, {
] # Same as https://www.zdf.de/wissen/nano/nano-21-mai-2019-102.html, equal media ids
'url': 'https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html',
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): 'only_matching': True,
param_groups = {} }]
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
group_id = param_group.get(self._xpath_ns(
'id', 'http://www.w3.org/XML/1998/namespace'))
params = {}
for param in param_group:
params[param.get('name')] = param.get('value')
param_groups[group_id] = params
formats = []
for video in smil.findall(self._xpath_ns('.//video', namespace)):
src = video.get('src')
if not src:
continue
bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
group_id = video.get('paramGroup')
param_group = param_groups[group_id]
for proto in param_group['protocols'].split(','):
formats.append({
'url': '%s://%s' % (proto, param_group['host']),
'app': param_group['app'],
'play_path': src,
'ext': 'flv',
'format_id': '%s-%d' % (proto, bitrate),
'tbr': bitrate,
})
self._sort_formats(formats)
return formats
def extract_from_xml_url(self, video_id, xml_url):
doc = self._download_xml(
xml_url, video_id,
note='Downloading video info',
errnote='Failed to download video info')
status_code = xpath_text(doc, './status/statuscode')
if status_code and status_code != 'ok':
if status_code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id
else:
message = '%s returned error: %s' % (self.IE_NAME, status_code)
raise ExtractorError(message, expected=True)
title = xpath_text(doc, './/information/title', 'title', True)
urls = []
formats = []
for fnode in doc.findall('.//formitaeten/formitaet'):
video_url = xpath_text(fnode, 'url')
if not video_url or video_url in urls:
continue
urls.append(video_url)
is_available = 'http://www.metafilegenerator' not in video_url
geoloced = 'static_geoloced_online' in video_url
if not is_available or geoloced:
continue
format_id = fnode.attrib['basetype']
format_m = re.match(r'''(?x)
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
''', format_id)
ext = determine_ext(video_url, None) or format_m.group('container')
if ext == 'meta':
continue
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
video_url, video_id, fatal=False))
elif ext == 'm3u8':
# the certificates are misconfigured (see
# https://github.com/ytdl-org/youtube-dl/issues/8665)
if video_url.startswith('https://'):
continue
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id, fatal=False))
else:
quality = xpath_text(fnode, './quality')
if quality:
format_id += '-' + quality
abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
tbr = int_or_none(self._search_regex(
r'_(\d+)k', video_url, 'bitrate', None))
if tbr and vbr and not abr:
abr = tbr - vbr
formats.append({
'format_id': format_id,
'url': video_url,
'ext': ext,
'acodec': format_m.group('acodec'),
'vcodec': format_m.group('vcodec'),
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(xpath_text(fnode, './width')),
'height': int_or_none(xpath_text(fnode, './height')),
'filesize': int_or_none(xpath_text(fnode, './filesize')),
'protocol': format_m.group('proto').lower(),
})
geolocation = xpath_text(doc, './/details/geolocation')
if not formats and geolocation and geolocation != 'none':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
thumbnails = []
for node in doc.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
return {
'id': video_id,
'title': title,
'description': xpath_text(doc, './/information/detail'),
'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
'thumbnails': thumbnails,
'uploader': xpath_text(doc, './/details/originChannelTitle'),
'uploader_id': xpath_text(doc, './/details/originChannelId'),
'upload_date': upload_date,
'formats': formats,
}
def _real_extract(self, url):
video_id = self._match_id(url)
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
return self.extract_from_xml_url(video_id, details_url)

View File

@ -72,6 +72,7 @@ from .arte import (
ArteTVEmbedIE, ArteTVEmbedIE,
ArteTVPlaylistIE, ArteTVPlaylistIE,
) )
from .arnes import ArnesIE
from .asiancrush import ( from .asiancrush import (
AsianCrushIE, AsianCrushIE,
AsianCrushPlaylistIE, AsianCrushPlaylistIE,
@ -90,11 +91,13 @@ from .awaan import (
) )
from .azmedien import AZMedienIE from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE from .baidu import BaiduVideoIE
from .bandaichannel import BandaiChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import ( from .bbc import (
BBCCoUkIE, BBCCoUkIE,
BBCCoUkArticleIE, BBCCoUkArticleIE,
BBCCoUkIPlayerPlaylistIE, BBCCoUkIPlayerEpisodesIE,
BBCCoUkIPlayerGroupIE,
BBCCoUkPlaylistIE, BBCCoUkPlaylistIE,
BBCIE, BBCIE,
) )
@ -129,7 +132,6 @@ from .bleacherreport import (
BleacherReportIE, BleacherReportIE,
BleacherReportCMSIE, BleacherReportCMSIE,
) )
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE from .bongacams import BongaCamsIE
@ -188,7 +190,11 @@ from .cbsnews import (
CBSNewsIE, CBSNewsIE,
CBSNewsLiveVideoIE, CBSNewsLiveVideoIE,
) )
from .cbssports import CBSSportsIE from .cbssports import (
CBSSportsEmbedIE,
CBSSportsIE,
TwentyFourSevenSportsIE,
)
from .ccc import ( from .ccc import (
CCCIE, CCCIE,
CCCPlaylistIE, CCCPlaylistIE,
@ -421,6 +427,7 @@ from .gamestar import GameStarIE
from .gaskrank import GaskrankIE from .gaskrank import GaskrankIE
from .gazeta import GazetaIE from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE from .gdcvault import GDCVaultIE
from .gedidigital import GediDigitalIE
from .generic import GenericIE from .generic import GenericIE
from .gfycat import GfycatIE from .gfycat import GfycatIE
from .giantbomb import GiantBombIE from .giantbomb import GiantBombIE
@ -591,7 +598,11 @@ from .limelight import (
LimelightChannelIE, LimelightChannelIE,
LimelightChannelListIE, LimelightChannelListIE,
) )
from .line import LineTVIE from .line import (
LineTVIE,
LineLiveIE,
LineLiveChannelIE,
)
from .linkedin import ( from .linkedin import (
LinkedInLearningIE, LinkedInLearningIE,
LinkedInLearningCourseIE, LinkedInLearningCourseIE,
@ -628,6 +639,7 @@ from .mangomolo import (
MangomoloLiveIE, MangomoloLiveIE,
) )
from .manyvids import ManyVidsIE from .manyvids import ManyVidsIE
from .maoritv import MaoriTVIE
from .markiza import ( from .markiza import (
MarkizaIE, MarkizaIE,
MarkizaPageIE, MarkizaPageIE,
@ -675,7 +687,10 @@ from .mixcloud import (
MixcloudUserIE, MixcloudUserIE,
MixcloudPlaylistIE, MixcloudPlaylistIE,
) )
from .mlb import MLBIE from .mlb import (
MLBIE,
MLBVideoIE,
)
from .mnet import MnetIE from .mnet import MnetIE
from .moevideo import MoeVideoIE from .moevideo import MoeVideoIE
from .mofosex import ( from .mofosex import (
@ -876,6 +891,11 @@ from .packtpub import (
PacktPubIE, PacktPubIE,
PacktPubCourseIE, PacktPubCourseIE,
) )
from .palcomp3 import (
PalcoMP3IE,
PalcoMP3ArtistIE,
PalcoMP3VideoIE,
)
from .pandoratv import PandoraTVIE from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
@ -1623,5 +1643,9 @@ from .zattoo import (
) )
from .zdf import ZDFIE, ZDFChannelIE from .zdf import ZDFIE, ZDFChannelIE
from .zhihu import ZhihuIE from .zhihu import ZhihuIE
from .zingmp3 import ZingMp3IE from .zingmp3 import (
ZingMp3IE,
ZingMp3AlbumIE,
)
from .zoom import ZoomIE
from .zype import ZypeIE from .zype import ZypeIE

View File

@ -383,6 +383,10 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
}, { }, {
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin', 'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
'only_matching': True, 'only_matching': True,
}, {
# "<figure id=" pattern (#28792)
'url': 'https://www.francetvinfo.fr/culture/patrimoine/incendie-de-notre-dame-de-paris/notre-dame-de-paris-de-l-incendie-de-la-cathedrale-a-sa-reconstruction_4372291.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -399,7 +403,8 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)', (r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)', r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'), r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id') webpage, 'video id')
return self._make_url_result(video_id) return self._make_url_result(video_id)

View File

@ -17,7 +17,7 @@ class FujiTVFODPlus7IE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id) self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id, 'mp4')
for f in formats: for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr')) wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh: if wh:

View File

@ -16,7 +16,7 @@ from ..utils import (
class FunimationIE(InfoExtractor): class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/shows/[^/]+/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:[^/]+/)?shows/[^/]+/(?P<id>[^/?#&]+)'
_NETRC_MACHINE = 'funimation' _NETRC_MACHINE = 'funimation'
_TOKEN = None _TOKEN = None
@ -51,6 +51,10 @@ class FunimationIE(InfoExtractor):
}, { }, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/', 'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True, 'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}] }]
def _login(self): def _login(self):

View File

@ -6,6 +6,7 @@ from .common import InfoExtractor
from .kaltura import KalturaIE from .kaltura import KalturaIE
from ..utils import ( from ..utils import (
HEADRequest, HEADRequest,
remove_start,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
urlencode_postdata, urlencode_postdata,
@ -102,6 +103,26 @@ class GDCVaultIE(InfoExtractor):
'format': 'mp4-408', 'format': 'mp4-408',
}, },
}, },
{
# Kaltura embed, whitespace between quote and embedded URL in iframe's src
'url': 'https://www.gdcvault.com/play/1025699',
'info_dict': {
'id': '0_zagynv0a',
'ext': 'mp4',
'title': 'Tech Toolbox',
'upload_date': '20190408',
'uploader_id': 'joe@blazestreaming.com',
'timestamp': 1554764629,
},
'params': {
'skip_download': True,
},
},
{
# HTML5 video
'url': 'http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru',
'only_matching': True,
},
] ]
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
@ -175,7 +196,18 @@ class GDCVaultIE(InfoExtractor):
xml_name = self._html_search_regex( xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>', r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename') start_page, 'xml filename', default=None)
if not xml_name:
info = self._parse_html5_media_entries(url, start_page, video_id)[0]
info.update({
'title': remove_start(self._search_regex(
r'>Session Name:\s*<.*?>\s*<td>(.+?)</td>', start_page,
'title', default=None) or self._og_search_title(
start_page, default=None), 'GDC Vault - '),
'id': video_id,
'display_id': display_id,
})
return info
embed_url = '%s/xml/%s' % (xml_root, xml_name) embed_url = '%s/xml/%s' % (xml_root, xml_name)
ie_key = 'DigitallySpeaking' ie_key = 'DigitallySpeaking'

View File

@ -0,0 +1,161 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
)
class GediDigitalIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://video\.
(?:
(?:
(?:espresso\.)?repubblica
|lastampa
|ilsecoloxix
)|
(?:
iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)\.gelocal
)\.it(?:/[^/]+){2,3}?/(?P<id>\d+)(?:[/?&#]|$)'''
_TESTS = [{
'url': 'https://video.lastampa.it/politica/il-paradosso-delle-regionali-la-lega-vince-ma-sembra-aver-perso/121559/121683',
'md5': '84658d7fb9e55a6e57ecc77b73137494',
'info_dict': {
'id': '121559',
'ext': 'mp4',
'title': 'Il paradosso delle Regionali: ecco perché la Lega vince ma sembra aver perso',
'description': 'md5:de7f4d6eaaaf36c153b599b10f8ce7ca',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-full-.+?\.jpg$',
'duration': 125,
},
}, {
'url': 'https://video.espresso.repubblica.it/embed/tutti-i-video/01-ted-villa/14772/14870&width=640&height=360',
'only_matching': True,
}, {
'url': 'https://video.repubblica.it/motori/record-della-pista-a-spa-francorchamps-la-pagani-huayra-roadster-bc-stupisce/367415/367963',
'only_matching': True,
}, {
'url': 'https://video.ilsecoloxix.it/sport/cassani-e-i-brividi-azzurri-ai-mondiali-di-imola-qui-mi-sono-innamorato-del-ciclismo-da-ragazzino-incredibile-tornarci-da-ct/66184/66267',
'only_matching': True,
}, {
'url': 'https://video.iltirreno.gelocal.it/sport/dentro-la-notizia-ferrari-cosa-succede-a-maranello/141059/142723',
'only_matching': True,
}, {
'url': 'https://video.messaggeroveneto.gelocal.it/locale/maria-giovanna-elmi-covid-vaccino/138155/139268',
'only_matching': True,
}, {
'url': 'https://video.ilpiccolo.gelocal.it/dossier/big-john/dinosauro-big-john-al-via-le-visite-guidate-a-trieste/135226/135751',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimantova.gelocal.it/locale/dal-ponte-visconteo-di-valeggio-l-and-8217sos-dei-ristoratori-aprire-anche-a-cena/137310/137818',
'only_matching': True,
}, {
'url': 'https://video.mattinopadova.gelocal.it/dossier/coronavirus-in-veneto/covid-a-vo-un-anno-dopo-un-cuore-tricolore-per-non-dimenticare/138402/138964',
'only_matching': True,
}, {
'url': 'https://video.laprovinciapavese.gelocal.it/locale/mede-zona-rossa-via-alle-vaccinazioni-per-gli-over-80/137545/138120',
'only_matching': True,
}, {
'url': 'https://video.tribunatreviso.gelocal.it/dossier/coronavirus-in-veneto/ecco-le-prima-vaccinazioni-di-massa-nella-marca/134485/135024',
'only_matching': True,
}, {
'url': 'https://video.nuovavenezia.gelocal.it/locale/camion-troppo-alto-per-il-ponte-ferroviario-perde-il-carico/135734/136266',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimodena.gelocal.it/locale/modena-scoperta-la-proteina-che-predice-il-livello-di-gravita-del-covid/139109/139796',
'only_matching': True,
}, {
'url': 'https://video.lanuovaferrara.gelocal.it/locale/due-bombole-di-gpl-aperte-e-abbandonate-i-vigili-bruciano-il-gas/134391/134957',
'only_matching': True,
}, {
'url': 'https://video.corrierealpi.gelocal.it/dossier/cortina-2021-i-mondiali-di-sci-alpino/mondiali-di-sci-il-timelapse-sulla-splendida-olympia/133760/134331',
'only_matching': True,
}, {
'url': 'https://video.lasentinella.gelocal.it/locale/vestigne-centra-un-auto-e-si-ribalta/138931/139466',
'only_matching': True,
}, {
'url': 'https://video.espresso.repubblica.it/tutti-i-video/01-ted-villa/14772',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
['twitter:title', 'og:title'], webpage, fatal=True)
player_data = re.findall(
r"PlayerFactory\.setParam\('(?P<type>format|param)',\s*'(?P<name>[^']+)',\s*'(?P<val>[^']+)'\);",
webpage)
formats = []
duration = thumb = None
for t, n, v in player_data:
if t == 'format':
if n in ('video-hds-vod-ec', 'video-hls-vod-ec', 'video-viralize', 'video-youtube-pfp'):
continue
elif n.endswith('-vod-ak'):
formats.extend(self._extract_akamai_formats(
v, video_id, {'http': 'media.gedidigital.it'}))
else:
ext = determine_ext(v)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
v, video_id, 'mp4', 'm3u8_native', m3u8_id=n, fatal=False))
continue
f = {
'format_id': n,
'url': v,
}
if ext == 'mp3':
abr = int_or_none(self._search_regex(
r'-mp3-audio-(\d+)', v, 'abr', default=None))
f.update({
'abr': abr,
'tbr': abr,
'vcodec': 'none'
})
else:
mobj = re.match(r'^video-rrtv-(\d+)(?:-(\d+))?$', n)
if mobj:
f.update({
'height': int(mobj.group(1)),
'vbr': int_or_none(mobj.group(2)),
})
if not f.get('vbr'):
f['vbr'] = int_or_none(self._search_regex(
r'-video-rrtv-(\d+)', v, 'abr', default=None))
formats.append(f)
elif t == 'param':
if n in ['image_full', 'image']:
thumb = v
elif n == 'videoDuration':
duration = int_or_none(v)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(
['twitter:description', 'og:description', 'description'], webpage),
'thumbnail': thumb or self._og_search_thumbnail(webpage),
'formats': formats,
'duration': duration,
}

View File

@ -2953,7 +2953,7 @@ class GenericIE(InfoExtractor):
webpage) webpage)
if not mobj: if not mobj:
mobj = re.search( mobj = re.search(
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)', r'data-video-link=["\'](?P<url>http://m\.mlb\.com/video/[^"\']+)',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB') return self.url_result(mobj.group('url'), 'MLB')

View File

@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re import re
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
determine_ext, determine_ext,
parse_age_limit, parse_age_limit,
try_get,
urlencode_postdata, urlencode_postdata,
ExtractorError, ExtractorError,
) )
@ -116,6 +118,18 @@ class GoIE(AdobePassIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot',
'info_dict': {
'id': 'VDKA22600213',
'ext': 'mp4',
'title': 'Pilot',
'description': 'md5:74306df917cfc199d76d061d66bebdb4',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True, 'only_matching': True,
@ -149,11 +163,27 @@ class GoIE(AdobePassIE):
brand = site_info.get('brand') brand = site_info.get('brand')
if not video_id or not site_info: if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id) webpage = self._download_webpage(url, display_id or video_id)
data = self._parse_json(
self._search_regex(
r'["\']__abc_com__["\']\s*\]\s*=\s*({.+?})\s*;', webpage,
'data', default='{}'),
display_id or video_id, fatal=False)
# https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot
layout = try_get(data, lambda x: x['page']['content']['video']['layout'], dict)
video_id = None
if layout:
video_id = try_get(
layout,
(lambda x: x['videoid'], lambda x: x['video']['id']),
compat_str)
if not video_id:
video_id = self._search_regex( video_id = self._search_regex(
( (
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'" # There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood # from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)', r'data-video-id=["\']*(VDKA\w+)',
# page.analytics.videoIdCode
r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet # https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)' r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
), webpage, 'video id', default=video_id) ), webpage, 'video id', default=video_id)

View File

@ -12,6 +12,7 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none,
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
@ -32,6 +33,7 @@ class InstagramIE(InfoExtractor):
'title': 'Video by naomipq', 'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8', 'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1371748545, 'timestamp': 1371748545,
'upload_date': '20130620', 'upload_date': '20130620',
'uploader_id': 'naomipq', 'uploader_id': 'naomipq',
@ -48,6 +50,7 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by britneyspears', 'title': 'Video by britneyspears',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1453760977, 'timestamp': 1453760977,
'upload_date': '20160125', 'upload_date': '20160125',
'uploader_id': 'britneyspears', 'uploader_id': 'britneyspears',
@ -86,6 +89,24 @@ class InstagramIE(InfoExtractor):
'title': 'Post by instagram', 'title': 'Post by instagram',
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957', 'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
}, },
}, {
# IGTV
'url': 'https://www.instagram.com/tv/BkfuX9UB-eK/',
'info_dict': {
'id': 'BkfuX9UB-eK',
'ext': 'mp4',
'title': 'Fingerboarding Tricks with @cass.fb',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 53.83,
'timestamp': 1530032919,
'upload_date': '20180626',
'uploader_id': 'instagram',
'uploader': 'Instagram',
'like_count': int,
'comment_count': int,
'comments': list,
'description': 'Meet Cass Hirst (@cass.fb), a fingerboarding pro who can perform tiny ollies and kickflips while blindfolded.',
}
}, { }, {
'url': 'https://instagram.com/p/-Cmh1cukG2/', 'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True, 'only_matching': True,
@ -159,7 +180,9 @@ class InstagramIE(InfoExtractor):
description = try_get( description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'], media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption') compat_str) or media.get('caption')
title = media.get('title')
thumbnail = media.get('display_src') or media.get('display_url') thumbnail = media.get('display_src') or media.get('display_url')
duration = float_or_none(media.get('video_duration'))
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date')) timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name') uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username') uploader_id = media.get('owner', {}).get('username')
@ -200,9 +223,10 @@ class InstagramIE(InfoExtractor):
continue continue
entries.append({ entries.append({
'id': node.get('shortcode') or node['id'], 'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num, 'title': node.get('title') or 'Video %d' % edge_num,
'url': node_video_url, 'url': node_video_url,
'thumbnail': node.get('display_url'), 'thumbnail': node.get('display_url'),
'duration': float_or_none(node.get('video_duration')),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])), 'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])), 'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')), 'view_count': int_or_none(node.get('video_view_count')),
@ -239,8 +263,9 @@ class InstagramIE(InfoExtractor):
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by %s' % uploader_id, 'title': title or 'Video by %s' % uploader_id,
'description': description, 'description': description,
'duration': duration,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'timestamp': timestamp, 'timestamp': timestamp,
'uploader_id': uploader_id, 'uploader_id': uploader_id,

View File

@ -29,34 +29,51 @@ class JamendoIE(InfoExtractor):
'id': '196219', 'id': '196219',
'display_id': 'stories-from-emona-i', 'display_id': 'stories-from-emona-i',
'ext': 'flac', 'ext': 'flac',
'title': 'Maya Filipič - Stories from Emona I', # 'title': 'Maya Filipič - Stories from Emona I',
'artist': 'Maya Filipič', 'title': 'Stories from Emona I',
# 'artist': 'Maya Filipič',
'track': 'Stories from Emona I', 'track': 'Stories from Emona I',
'duration': 210, 'duration': 210,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1217438117, 'timestamp': 1217438117,
'upload_date': '20080730', 'upload_date': '20080730',
'license': 'by-nc-nd',
'view_count': int,
'like_count': int,
'average_rating': int,
'tags': ['piano', 'peaceful', 'newage', 'strings', 'upbeat'],
} }
}, { }, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock', 'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True, 'only_matching': True,
}] }]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url): def _real_extract(self, url):
track_id, display_id = self._VALID_URL_RE.match(url).groups() track_id, display_id = self._VALID_URL_RE.match(url).groups()
webpage = self._download_webpage( # webpage = self._download_webpage(
'https://www.jamendo.com/track/' + track_id, track_id) # 'https://www.jamendo.com/track/' + track_id, track_id)
models = self._parse_json(self._html_search_regex( # models = self._parse_json(self._html_search_regex(
r"data-bundled-models='([^']+)", # r"data-bundled-models='([^']+)",
webpage, 'bundled models'), track_id) # webpage, 'bundled models'), track_id)
track = models['track']['models'][0] # track = models['track']['models'][0]
track = self._call_api('track', track_id)
title = track_name = track['name'] title = track_name = track['name']
get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {} # get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
artist = get_model('artist') # artist = get_model('artist')
artist_name = artist.get('name') # artist_name = artist.get('name')
if artist_name: # if artist_name:
title = '%s - %s' % (artist_name, title) # title = '%s - %s' % (artist_name, title)
album = get_model('album') # album = get_model('album')
formats = [{ formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294' 'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@ -74,7 +91,7 @@ class JamendoIE(InfoExtractor):
urls = [] urls = []
thumbnails = [] thumbnails = []
for _, covers in track.get('cover', {}).items(): for covers in (track.get('cover') or {}).values():
for cover_id, cover_url in covers.items(): for cover_id, cover_url in covers.items():
if not cover_url or cover_url in urls: if not cover_url or cover_url in urls:
continue continue
@ -88,13 +105,14 @@ class JamendoIE(InfoExtractor):
}) })
tags = [] tags = []
for tag in track.get('tags', []): for tag in (track.get('tags') or []):
tag_name = tag.get('name') tag_name = tag.get('name')
if not tag_name: if not tag_name:
continue continue
tags.append(tag_name) tags.append(tag_name)
stats = track.get('stats') or {} stats = track.get('stats') or {}
license = track.get('licenseCC') or []
return { return {
'id': track_id, 'id': track_id,
@ -103,11 +121,11 @@ class JamendoIE(InfoExtractor):
'title': title, 'title': title,
'description': track.get('description'), 'description': track.get('description'),
'duration': int_or_none(track.get('duration')), 'duration': int_or_none(track.get('duration')),
'artist': artist_name, # 'artist': artist_name,
'track': track_name, 'track': track_name,
'album': album.get('name'), # 'album': album.get('name'),
'formats': formats, 'formats': formats,
'license': '-'.join(track.get('licenseCC', [])) or None, 'license': '-'.join(license) if license else None,
'timestamp': int_or_none(track.get('dateCreated')), 'timestamp': int_or_none(track.get('dateCreated')),
'view_count': int_or_none(stats.get('listenedAll')), 'view_count': int_or_none(stats.get('listenedAll')),
'like_count': int_or_none(stats.get('favorited')), 'like_count': int_or_none(stats.get('favorited')),
@ -116,9 +134,9 @@ class JamendoIE(InfoExtractor):
} }
class JamendoAlbumIE(InfoExtractor): class JamendoAlbumIE(JamendoIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
'url': 'https://www.jamendo.com/album/121486/duck-on-cover', 'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': { 'info_dict': {
'id': '121486', 'id': '121486',
@ -151,17 +169,7 @@ class JamendoAlbumIE(InfoExtractor):
'params': { 'params': {
'playlistend': 2 'playlistend': 2
} }
} }]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url): def _real_extract(self, url):
album_id = self._match_id(url) album_id = self._match_id(url)
@ -169,7 +177,7 @@ class JamendoAlbumIE(InfoExtractor):
album_name = album.get('name') album_name = album.get('name')
entries = [] entries = []
for track in album.get('tracks', []): for track in (album.get('tracks') or []):
track_id = track.get('id') track_id = track.get('id')
if not track_id: if not track_id:
continue continue

View File

@ -120,7 +120,7 @@ class KalturaIE(InfoExtractor):
def _extract_urls(webpage): def _extract_urls(webpage):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
finditer = ( finditer = (
re.finditer( list(re.finditer(
r"""(?xs) r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\( kWidget\.(?:thumb)?[Ee]mbed\(
\{.*? \{.*?
@ -128,8 +128,8 @@ class KalturaIE(InfoExtractor):
(?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*? (?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s* (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\}) (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage) """, webpage))
or re.finditer( or list(re.finditer(
r'''(?xs) r'''(?xs)
(?P<q1>["']) (?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)* (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@ -142,16 +142,16 @@ class KalturaIE(InfoExtractor):
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s* \[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
) )
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3) (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage) ''', webpage))
or re.finditer( or list(re.finditer(
r'''(?xs) r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["']) <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])\s*
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+) (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+) [?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
(?P=q1) (?P=q1)
''', webpage) ''', webpage))
) )
urls = [] urls = []
for mobj in finditer: for mobj in finditer:

View File

@ -6,8 +6,10 @@ import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs,
compat_str, compat_str,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
@ -60,6 +62,7 @@ class LBRYBaseIE(InfoExtractor):
'description': stream_value.get('description'), 'description': stream_value.get('description'),
'license': stream_value.get('license'), 'license': stream_value.get('license'),
'timestamp': int_or_none(stream.get('timestamp')), 'timestamp': int_or_none(stream.get('timestamp')),
'release_timestamp': int_or_none(stream_value.get('release_time')),
'tags': stream_value.get('tags'), 'tags': stream_value.get('tags'),
'duration': int_or_none(media.get('duration')), 'duration': int_or_none(media.get('duration')),
'channel': try_get(signing_channel, lambda x: x['value']['title']), 'channel': try_get(signing_channel, lambda x: x['value']['title']),
@ -92,6 +95,8 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:f6cb5c704b332d37f5119313c2c98f51', 'description': 'md5:f6cb5c704b332d37f5119313c2c98f51',
'timestamp': 1595694354, 'timestamp': 1595694354,
'upload_date': '20200725', 'upload_date': '20200725',
'release_timestamp': 1595340697,
'release_date': '20200721',
'width': 1280, 'width': 1280,
'height': 720, 'height': 720,
} }
@ -106,6 +111,8 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:661ac4f1db09f31728931d7b88807a61', 'description': 'md5:661ac4f1db09f31728931d7b88807a61',
'timestamp': 1591312601, 'timestamp': 1591312601,
'upload_date': '20200604', 'upload_date': '20200604',
'release_timestamp': 1591312421,
'release_date': '20200604',
'tags': list, 'tags': list,
'duration': 2570, 'duration': 2570,
'channel': 'The LBRY Foundation', 'channel': 'The LBRY Foundation',
@ -113,6 +120,26 @@ class LBRYIE(LBRYBaseIE):
'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212', 'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212',
'vcodec': 'none', 'vcodec': 'none',
} }
}, {
# HLS
'url': 'https://odysee.com/@gardeningincanada:b/plants-i-will-never-grow-again.-the:e',
'md5': 'fc82f45ea54915b1495dd7cb5cc1289f',
'info_dict': {
'id': 'e51671357333fe22ae88aad320bde2f6f96b1410',
'ext': 'mp4',
'title': 'PLANTS I WILL NEVER GROW AGAIN. THE BLACK LIST PLANTS FOR A CANADIAN GARDEN | Gardening in Canada 🍁',
'description': 'md5:9c539c6a03fb843956de61a4d5288d5e',
'timestamp': 1618254123,
'upload_date': '20210412',
'release_timestamp': 1618254002,
'release_date': '20210412',
'tags': list,
'duration': 554,
'channel': 'Gardening In Canada',
'channel_id': 'b8be0e93b423dad221abe29545fbe8ec36e806bc',
'channel_url': 'https://odysee.com/@gardeningincanada:b8be0e93b423dad221abe29545fbe8ec36e806bc',
'formats': 'mincount:3',
}
}, { }, {
'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e', 'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e',
'only_matching': True, 'only_matching': True,
@ -156,10 +183,18 @@ class LBRYIE(LBRYBaseIE):
streaming_url = self._call_api_proxy( streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url'] 'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
info = self._parse_stream(result, url) info = self._parse_stream(result, url)
urlh = self._request_webpage(
streaming_url, display_id, note='Downloading streaming redirect url info')
if determine_ext(urlh.geturl()) == 'm3u8':
info['formats'] = self._extract_m3u8_formats(
urlh.geturl(), display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(info['formats'])
else:
info['url'] = streaming_url
info.update({ info.update({
'id': claim_id, 'id': claim_id,
'title': title, 'title': title,
'url': streaming_url,
}) })
return info return info
@ -181,17 +216,18 @@ class LBRYChannelIE(LBRYBaseIE):
}] }]
_PAGE_SIZE = 50 _PAGE_SIZE = 50
def _fetch_page(self, claim_id, url, page): def _fetch_page(self, claim_id, url, params, page):
page += 1 page += 1
result = self._call_api_proxy( page_params = {
'claim_search', claim_id, {
'channel_ids': [claim_id], 'channel_ids': [claim_id],
'claim_type': 'stream', 'claim_type': 'stream',
'no_totals': True, 'no_totals': True,
'page': page, 'page': page,
'page_size': self._PAGE_SIZE, 'page_size': self._PAGE_SIZE,
'stream_types': self._SUPPORTED_STREAM_TYPES, }
}, 'page %d' % page) page_params.update(params)
result = self._call_api_proxy(
'claim_search', claim_id, page_params, 'page %d' % page)
for item in (result.get('items') or []): for item in (result.get('items') or []):
stream_claim_name = item.get('name') stream_claim_name = item.get('name')
stream_claim_id = item.get('claim_id') stream_claim_id = item.get('claim_id')
@ -212,8 +248,31 @@ class LBRYChannelIE(LBRYBaseIE):
result = self._resolve_url( result = self._resolve_url(
'lbry://' + display_id, display_id, 'channel') 'lbry://' + display_id, display_id, 'channel')
claim_id = result['claim_id'] claim_id = result['claim_id']
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
content = qs.get('content', [None])[0]
params = {
'fee_amount': qs.get('fee_amount', ['>=0'])[0],
'order_by': {
'new': ['release_time'],
'top': ['effective_amount'],
'trending': ['trending_group', 'trending_mixed'],
}[qs.get('order', ['new'])[0]],
'stream_types': [content] if content in ['audio', 'video'] else self._SUPPORTED_STREAM_TYPES,
}
duration = qs.get('duration', [None])[0]
if duration:
params['duration'] = {
'long': '>=1200',
'short': '<=240',
}[duration]
language = qs.get('language', ['all'])[0]
if language != 'all':
languages = [language]
if language == 'en':
languages.append('none')
params['any_languages'] = languages
entries = OnDemandPagedList( entries = OnDemandPagedList(
functools.partial(self._fetch_page, claim_id, url), functools.partial(self._fetch_page, claim_id, url, params),
self._PAGE_SIZE) self._PAGE_SIZE)
result_value = result.get('value') or {} result_value = result.get('value') or {}
return self.playlist_result( return self.playlist_result(

View File

@ -4,7 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
str_or_none,
)
class LineTVIE(InfoExtractor): class LineTVIE(InfoExtractor):
@ -88,3 +94,137 @@ class LineTVIE(InfoExtractor):
for thumbnail in video_info.get('thumbnails', {}).get('list', [])], for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
'view_count': video_info.get('meta', {}).get('count'), 'view_count': video_info.get('meta', {}).get('count'),
} }
class LineLiveBaseIE(InfoExtractor):
_API_BASE_URL = 'https://live-api.line-apps.com/web/v4.0/channel/'
def _parse_broadcast_item(self, item):
broadcast_id = compat_str(item['id'])
title = item['title']
is_live = item.get('isBroadcastingNow')
thumbnails = []
for thumbnail_id, thumbnail_url in (item.get('thumbnailURLs') or {}).items():
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
})
channel = item.get('channel') or {}
channel_id = str_or_none(channel.get('id'))
return {
'id': broadcast_id,
'title': self._live_title(title) if is_live else title,
'thumbnails': thumbnails,
'timestamp': int_or_none(item.get('createdAt')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': 'https://live.line.me/channels/' + channel_id if channel_id else None,
'duration': int_or_none(item.get('archiveDuration')),
'view_count': int_or_none(item.get('viewerCount')),
'comment_count': int_or_none(item.get('chatCount')),
'is_live': is_live,
}
class LineLiveIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<channel_id>\d+)/broadcast/(?P<id>\d+)'
_TESTS = [{
'url': 'https://live.line.me/channels/4867368/broadcast/16331360',
'md5': 'bc931f26bf1d4f971e3b0982b3fab4a3',
'info_dict': {
'id': '16331360',
'title': '振りコピ講座😙😙😙',
'ext': 'mp4',
'timestamp': 1617095132,
'upload_date': '20210330',
'channel': '白川ゆめか',
'channel_id': '4867368',
'view_count': int,
'comment_count': int,
'is_live': False,
}
}, {
# archiveStatus == 'DELETED'
'url': 'https://live.line.me/channels/4778159/broadcast/16378488',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id, broadcast_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
self._API_BASE_URL + '%s/broadcast/%s' % (channel_id, broadcast_id),
broadcast_id)
item = broadcast['item']
info = self._parse_broadcast_item(item)
protocol = 'm3u8' if info['is_live'] else 'm3u8_native'
formats = []
for k, v in (broadcast.get(('live' if info['is_live'] else 'archived') + 'HLSURLs') or {}).items():
if not v:
continue
if k == 'abr':
formats.extend(self._extract_m3u8_formats(
v, broadcast_id, 'mp4', protocol,
m3u8_id='hls', fatal=False))
continue
f = {
'ext': 'mp4',
'format_id': 'hls-' + k,
'protocol': protocol,
'url': v,
}
if not k.isdigit():
f['vcodec'] = 'none'
formats.append(f)
if not formats:
archive_status = item.get('archiveStatus')
if archive_status != 'ARCHIVED':
raise ExtractorError('this video has been ' + archive_status.lower(), expected=True)
self._sort_formats(formats)
info['formats'] = formats
return info
class LineLiveChannelIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<id>\d+)(?!/broadcast/\d+)(?:[/?&#]|$)'
_TEST = {
'url': 'https://live.line.me/channels/5893542',
'info_dict': {
'id': '5893542',
'title': 'いくらちゃん',
'description': 'md5:c3a4af801f43b2fac0b02294976580be',
},
'playlist_mincount': 29
}
def _archived_broadcasts_entries(self, archived_broadcasts, channel_id):
while True:
for row in (archived_broadcasts.get('rows') or []):
share_url = str_or_none(row.get('shareURL'))
if not share_url:
continue
info = self._parse_broadcast_item(row)
info.update({
'_type': 'url',
'url': share_url,
'ie_key': LineLiveIE.ie_key(),
})
yield info
if not archived_broadcasts.get('hasNextPage'):
return
archived_broadcasts = self._download_json(
self._API_BASE_URL + channel_id + '/archived_broadcasts',
channel_id, query={
'lastId': info['id'],
})
def _real_extract(self, url):
channel_id = self._match_id(url)
channel = self._download_json(self._API_BASE_URL + channel_id, channel_id)
return self.playlist_result(
self._archived_broadcasts_entries(channel.get('archivedBroadcasts') or {}, channel_id),
channel_id, channel.get('title'), channel.get('information'))

View File

@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MaoriTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?maoritelevision\.com/shows/(?:[^/]+/)+(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.maoritelevision.com/shows/korero-mai/S01E054/korero-mai-series-1-episode-54',
'md5': '5ade8ef53851b6a132c051b1cd858899',
'info_dict': {
'id': '4774724855001',
'ext': 'mp4',
'title': 'Kōrero Mai, Series 1 Episode 54',
'upload_date': '20160226',
'timestamp': 1456455018,
'description': 'md5:59bde32fd066d637a1a55794c56d8dcb',
'uploader_id': '1614493167001',
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1614493167001/HJlhIQhQf_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(
r'data-main-video-id=["\'](\d+)', webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'BrightcoveNew', brightcove_id)

View File

@ -15,33 +15,39 @@ from ..utils import (
class MedalTVIE(InfoExtractor): class MedalTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://medal.tv/clips/34934644/3Is9zyGMoBMr', 'url': 'https://medal.tv/clips/2mA60jWAGQCBH',
'md5': '7b07b064331b1cf9e8e5c52a06ae68fa', 'md5': '7b07b064331b1cf9e8e5c52a06ae68fa',
'info_dict': { 'info_dict': {
'id': '34934644', 'id': '2mA60jWAGQCBH',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Quad Cold', 'title': 'Quad Cold',
'description': 'Medal,https://medal.tv/desktop/', 'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'MowgliSB', 'uploader': 'MowgliSB',
'timestamp': 1603165266, 'timestamp': 1603165266,
'upload_date': '20201020', 'upload_date': '20201020',
'uploader_id': 10619174, 'uploader_id': '10619174',
} }
}, { }, {
'url': 'https://medal.tv/clips/36787208', 'url': 'https://medal.tv/clips/2um24TWdty0NA',
'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148', 'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148',
'info_dict': { 'info_dict': {
'id': '36787208', 'id': '2um24TWdty0NA',
'ext': 'mp4', 'ext': 'mp4',
'title': 'u tk me i tk u bigger', 'title': 'u tk me i tk u bigger',
'description': 'Medal,https://medal.tv/desktop/', 'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'Mimicc', 'uploader': 'Mimicc',
'timestamp': 1605580939, 'timestamp': 1605580939,
'upload_date': '20201117', 'upload_date': '20201117',
'uploader_id': 5156321, 'uploader_id': '5156321',
} }
}, {
'url': 'https://medal.tv/clips/37rMeFpryCC-9',
'only_matching': True,
}, {
'url': 'https://medal.tv/clips/2WRj40tpY_EU9',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,15 +1,91 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .nhl import NHLBaseIE import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
parse_duration,
parse_iso8601,
try_get,
)
class MLBIE(NHLBaseIE): class MLBBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_video_data(display_id)
video_id = video['id']
title = video['title']
feed = self._get_feed(video)
formats = []
for playback in (feed.get('playbacks') or []):
playback_url = playback.get('url')
if not playback_url:
continue
name = playback.get('name')
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4',
'm3u8_native', m3u8_id=name, fatal=False))
else:
f = {
'format_id': name,
'url': playback_url,
}
mobj = re.search(r'_(\d+)K_(\d+)X(\d+)', name)
if mobj:
f.update({
'height': int(mobj.group(3)),
'tbr': int(mobj.group(1)),
'width': int(mobj.group(2)),
})
mobj = re.search(r'_(\d+)x(\d+)_(\d+)_(\d+)K\.mp4', playback_url)
if mobj:
f.update({
'fps': int(mobj.group(3)),
'height': int(mobj.group(2)),
'tbr': int(mobj.group(4)),
'width': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats)
thumbnails = []
for cut in (try_get(feed, lambda x: x['image']['cuts'], list) or []):
src = cut.get('src')
if not src:
continue
thumbnails.append({
'height': int_or_none(cut.get('height')),
'url': src,
'width': int_or_none(cut.get('width')),
})
language = (video.get('language') or 'EN').lower()
return {
'id': video_id,
'title': title,
'formats': formats,
'description': video.get('description'),
'duration': parse_duration(feed.get('duration')),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(video.get(self._TIMESTAMP_KEY)),
'subtitles': self._extract_mlb_subtitles(feed, language),
}
class MLBIE(MLBBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:[\da-z_-]+\.)*(?P<site>mlb)\.com/ (?:[\da-z_-]+\.)*mlb\.com/
(?: (?:
(?: (?:
(?:[^/]+/)*c-| (?:[^/]+/)*video/[^/]+/c-|
(?: (?:
shared/video/embed/(?:embed|m-internal-embed)\.html| shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp| (?:[^/]+/)+(?:play|index)\.jsp|
@ -18,7 +94,6 @@ class MLBIE(NHLBaseIE):
(?P<id>\d+) (?P<id>\d+)
) )
''' '''
_CONTENT_DOMAIN = 'content.mlb.com'
_TESTS = [ _TESTS = [
{ {
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933', 'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
@ -76,18 +151,6 @@ class MLBIE(NHLBaseIE):
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{
'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
'md5': 'e09e37b552351fddbf4d9e699c924d68',
'info_dict': {
'id': '75609783',
'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429139220,
'upload_date': '20150415',
}
},
{ {
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694', 'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
'only_matching': True, 'only_matching': True,
@ -113,8 +176,92 @@ class MLBIE(NHLBaseIE):
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb', 'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
'only_matching': True,
}
] ]
_TIMESTAMP_KEY = 'date'
@staticmethod
def _get_feed(video):
return video
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for keyword in (feed.get('keywordsAll') or []):
keyword_type = keyword.get('type')
if keyword_type and keyword_type.startswith('closed_captions_location_'):
cc_location = keyword.get('value')
if cc_location:
subtitles.setdefault(language, []).append({
'url': cc_location,
})
return subtitles
def _download_video_data(self, display_id):
return self._download_json(
'http://content.mlb.com/mlb/item/id/v1/%s/details/web-v1.json' % display_id,
display_id)
class MLBVideoIE(MLBBaseIE):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/(?:[^/]+/)*video/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.mlb.com/mariners/video/ackley-s-spectacular-catch-c34698933',
'md5': '632358dacfceec06bad823b83d21df2d',
'info_dict': {
'id': 'c04a8863-f569-42e6-9f87-992393657614',
'ext': 'mp4',
'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66,
'timestamp': 1405995000,
'upload_date': '20140722',
'thumbnail': r're:^https?://.+',
},
}
_TIMESTAMP_KEY = 'timestamp'
@classmethod
def suitable(cls, url):
return False if MLBIE.suitable(url) else super(MLBVideoIE, cls).suitable(url)
@staticmethod
def _get_feed(video):
return video['feeds'][0]
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for cc_location in (feed.get('closedCaptions') or []):
subtitles.setdefault(language, []).append({
'url': cc_location,
})
def _download_video_data(self, display_id):
# https://www.mlb.com/data-service/en/videos/[SLUG]
return self._download_json(
'https://fastball-gateway.mlb.com/graphql',
display_id, query={
'query': '''{
mediaPlayback(ids: "%s") {
description
feeds(types: CMS) {
closedCaptions
duration
image {
cuts {
width
height
src
}
}
playbacks {
name
url
}
}
id
timestamp
title
}
}''' % display_id,
})['data']['mediaPlayback'][0]

View File

@ -255,7 +255,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
@staticmethod @staticmethod
def _extract_child_with_type(parent, t): def _extract_child_with_type(parent, t):
return next(c for c in parent['children'] if c.get('type') == t) for c in parent['children']:
if c.get('type') == t:
return c
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
try: try:
@ -286,7 +288,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
data = self._parse_json(self._search_regex( data = self._parse_json(self._search_regex(
r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None) r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None)
main_container = self._extract_child_with_type(data, 'MainContainer') main_container = self._extract_child_with_type(data, 'MainContainer')
video_player = self._extract_child_with_type(main_container, 'VideoPlayer') ab_testing = self._extract_child_with_type(main_container, 'ABTesting')
video_player = self._extract_child_with_type(ab_testing or main_container, 'VideoPlayer')
mgid = video_player['props']['media']['video']['config']['uri'] mgid = video_player['props']['media']['video']['config']['uri']
return mgid return mgid
@ -320,7 +323,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1', webpage)
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')

View File

@ -23,11 +23,9 @@ class NineCNineMediaIE(InfoExtractor):
destination_code, content_id = re.match(self._VALID_URL, url).groups() destination_code, content_id = re.match(self._VALID_URL, url).groups()
api_base_url = self._API_BASE_TEMPLATE % (destination_code, content_id) api_base_url = self._API_BASE_TEMPLATE % (destination_code, content_id)
content = self._download_json(api_base_url, content_id, query={ content = self._download_json(api_base_url, content_id, query={
'$include': '[Media,Season,ContentPackages]', '$include': '[Media.Name,Season,ContentPackages.Duration,ContentPackages.Id]',
}) })
title = content['Name'] title = content['Name']
if len(content['ContentPackages']) > 1:
raise ExtractorError('multiple content packages')
content_package = content['ContentPackages'][0] content_package = content['ContentPackages'][0]
package_id = content_package['Id'] package_id = content_package['Id']
content_package_url = api_base_url + 'contentpackages/%s/' % package_id content_package_url = api_base_url + 'contentpackages/%s/' % package_id

View File

@ -0,0 +1,148 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
try_get,
)
class PalcoMP3BaseIE(InfoExtractor):
_GQL_QUERY_TMPL = '''{
artist(slug: "%s") {
%s
}
}'''
_ARTIST_FIELDS_TMPL = '''music(slug: "%%s") {
%s
}'''
_MUSIC_FIELDS = '''duration
hls
mp3File
musicID
plays
title'''
def _call_api(self, artist_slug, artist_fields):
return self._download_json(
'https://www.palcomp3.com.br/graphql/', artist_slug, query={
'query': self._GQL_QUERY_TMPL % (artist_slug, artist_fields),
})['data']
def _parse_music(self, music):
music_id = compat_str(music['musicID'])
title = music['title']
formats = []
hls_url = music.get('hls')
if hls_url:
formats.append({
'url': hls_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
mp3_file = music.get('mp3File')
if mp3_file:
formats.append({
'url': mp3_file,
})
return {
'id': music_id,
'title': title,
'formats': formats,
'duration': int_or_none(music.get('duration')),
'view_count': int_or_none(music.get('plays')),
}
def _real_initialize(self):
self._ARTIST_FIELDS_TMPL = self._ARTIST_FIELDS_TMPL % self._MUSIC_FIELDS
def _real_extract(self, url):
artist_slug, music_slug = re.match(self._VALID_URL, url).groups()
artist_fields = self._ARTIST_FIELDS_TMPL % music_slug
music = self._call_api(artist_slug, artist_fields)['artist']['music']
return self._parse_music(music)
class PalcoMP3IE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:song'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/nossas-composicoes-cuida-bem-dela/',
'md5': '99fd6405b2d8fd589670f6db1ba3b358',
'info_dict': {
'id': '3162927',
'ext': 'mp3',
'title': 'Nossas Composições - CUIDA BEM DELA',
'duration': 210,
'view_count': int,
}
}]
@classmethod
def suitable(cls, url):
return False if PalcoMP3VideoIE.suitable(url) else super(PalcoMP3IE, cls).suitable(url)
class PalcoMP3ArtistIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:artist'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com.br/condedoforro/',
'info_dict': {
'id': '358396',
'title': 'Conde do Forró',
},
'playlist_mincount': 188,
}]
_ARTIST_FIELDS_TMPL = '''artistID
musics {
nodes {
%s
}
}
name'''
@ classmethod
def suitable(cls, url):
return False if re.match(PalcoMP3IE._VALID_URL, url) else super(PalcoMP3ArtistIE, cls).suitable(url)
def _real_extract(self, url):
artist_slug = self._match_id(url)
artist = self._call_api(artist_slug, self._ARTIST_FIELDS_TMPL)['artist']
def entries():
for music in (try_get(artist, lambda x: x['musics']['nodes'], list) or []):
yield self._parse_music(music)
return self.playlist_result(
entries(), str_or_none(artist.get('artistID')), artist.get('name'))
class PalcoMP3VideoIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:video'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)/?#clipe'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/maiara-e-maraisa-voce-faz-falta-aqui-ao-vivo-em-vicosa-mg/#clipe',
'add_ie': ['Youtube'],
'info_dict': {
'id': '_pD1nR2qqPg',
'ext': 'mp4',
'title': 'Maiara e Maraisa - Você Faz Falta Aqui - DVD Ao Vivo Em Campo Grande',
'description': 'md5:7043342c09a224598e93546e98e49282',
'upload_date': '20161107',
'uploader_id': 'maiaramaraisaoficial',
'uploader': 'Maiara e Maraisa',
}
}]
_MUSIC_FIELDS = 'youtubeID'
def _parse_music(self, music):
youtube_id = music['youtubeID']
return self.url_result(youtube_id, 'Youtube', youtube_id)

View File

@ -413,7 +413,8 @@ class PeerTubeIE(InfoExtractor):
peertube3\.cpy\.re| peertube3\.cpy\.re|
peertube2\.cpy\.re| peertube2\.cpy\.re|
videos\.tcit\.fr| videos\.tcit\.fr|
peertube\.cpy\.re peertube\.cpy\.re|
canard\.tube
)''' )'''
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}' _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
_API_BASE = 'https://%s/api/v1/videos/%s/%s' _API_BASE = 'https://%s/api/v1/videos/%s/%s'
@ -598,11 +599,13 @@ class PeerTubeIE(InfoExtractor):
else: else:
age_limit = None age_limit = None
webpage_url = 'https://%s/videos/watch/%s' % (host, video_id)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': urljoin(url, video.get('thumbnailPath')), 'thumbnail': urljoin(webpage_url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')), 'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName', compat_str), 'uploader': account_data('displayName', compat_str),
'uploader_id': str_or_none(account_data('id', int)), 'uploader_id': str_or_none(account_data('id', int)),
@ -620,5 +623,6 @@ class PeerTubeIE(InfoExtractor):
'tags': try_get(video, lambda x: x['tags'], list), 'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories, 'categories': categories,
'formats': formats, 'formats': formats,
'subtitles': subtitles 'subtitles': subtitles,
'webpage_url': webpage_url,
} }

View File

@ -1,45 +1,128 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .dreisat import DreiSatIE import re
from .youtube import YoutubeIE
from .zdf import ZDFBaseIE
from ..compat import compat_str
from ..utils import (
int_or_none,
merge_dicts,
unified_timestamp,
xpath_text,
)
class PhoenixIE(DreiSatIE): class PhoenixIE(ZDFBaseIE):
IE_NAME = 'phoenix.de' IE_NAME = 'phoenix.de'
_VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/ _VALID_URL = r'https?://(?:www\.)?phoenix\.de/(?:[^/]+/)*[^/?#&]*-a-(?P<id>\d+)\.html'
(?: _TESTS = [{
phoenix/die_sendungen/(?:[^/]+/)? # Same as https://www.zdf.de/politik/phoenix-sendungen/wohin-fuehrt-der-protest-in-der-pandemie-100.html
)? 'url': 'https://www.phoenix.de/sendungen/ereignisse/corona-nachgehakt/wohin-fuehrt-der-protest-in-der-pandemie-a-2050630.html',
(?P<id>[0-9]+)''' 'md5': '34ec321e7eb34231fd88616c65c92db0',
_TESTS = [
{
'url': 'http://www.phoenix.de/content/884301',
'md5': 'ed249f045256150c92e72dbb70eadec6',
'info_dict': { 'info_dict': {
'id': '884301', 'id': '210222_phx_nachgehakt_corona_protest',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Michael Krons mit Hans-Werner Sinn', 'title': 'Wohin führt der Protest in der Pandemie?',
'description': 'Im Dialog - Sa. 25.10.14, 00.00 - 00.35 Uhr', 'description': 'md5:7d643fe7f565e53a24aac036b2122fbd',
'upload_date': '20141025', 'duration': 1691,
'uploader': 'Im Dialog', 'timestamp': 1613906100,
} 'upload_date': '20210221',
'uploader': 'Phoenix',
'channel': 'corona nachgehakt',
}, },
{ }, {
'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/869815', # Youtube embed
'url': 'https://www.phoenix.de/sendungen/gespraeche/phoenix-streitgut-brennglas-corona-a-1965505.html',
'info_dict': {
'id': 'hMQtqFYjomk',
'ext': 'mp4',
'title': 'phoenix streitgut: Brennglas Corona - Wie gerecht ist unsere Gesellschaft?',
'description': 'md5:ac7a02e2eb3cb17600bc372e4ab28fdd',
'duration': 3509,
'upload_date': '20201219',
'uploader': 'phoenix',
'uploader_id': 'phoenix',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.phoenix.de/entwicklungen-in-russland-a-2044720.html',
'only_matching': True, 'only_matching': True,
}, }, {
{ # no media
'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/diskussionen/928234', 'url': 'https://www.phoenix.de/sendungen/dokumentationen/mit-dem-jumbo-durch-die-nacht-a-89625.html',
'only_matching': True, 'only_matching': True,
}, }, {
] # Same as https://www.zdf.de/politik/phoenix-sendungen/die-gesten-der-maechtigen-100.html
'url': 'https://www.phoenix.de/sendungen/dokumentationen/gesten-der-maechtigen-i-a-89468.html?ref=suche',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) article_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
internal_id = self._search_regex( article = self._download_json(
r'<div class="phx_vod" id="phx_vod_([0-9]+)"', 'https://www.phoenix.de/response/id/%s' % article_id, article_id,
webpage, 'internal video ID') 'Downloading article JSON')
api_url = 'http://www.phoenix.de/php/mediaplayer/data/beitrags_details.php?ak=web&id=%s' % internal_id video = article['absaetze'][0]
return self.extract_from_xml_url(video_id, api_url) title = video.get('titel') or article.get('subtitel')
if video.get('typ') == 'video-youtube':
video_id = video['id']
return self.url_result(
video_id, ie=YoutubeIE.ie_key(), video_id=video_id,
video_title=title)
video_id = compat_str(video.get('basename') or video.get('content'))
details = self._download_xml(
'https://www.phoenix.de/php/mediaplayer/data/beitrags_details.php',
video_id, 'Downloading details XML', query={
'ak': 'web',
'ptmd': 'true',
'id': video_id,
'profile': 'player2',
})
title = title or xpath_text(
details, './/information/title', 'title', fatal=True)
content_id = xpath_text(
details, './/video/details/basename', 'content id', fatal=True)
info = self._extract_ptmd(
'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/%s' % content_id,
content_id, None, url)
timestamp = unified_timestamp(xpath_text(details, './/details/airtime'))
thumbnails = []
for node in details.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
return merge_dicts(info, {
'id': content_id,
'title': title,
'description': xpath_text(details, './/information/detail'),
'duration': int_or_none(xpath_text(details, './/details/lengthSec')),
'thumbnails': thumbnails,
'timestamp': timestamp,
'uploader': xpath_text(details, './/details/channel'),
'uploader_id': xpath_text(details, './/details/originChannelId'),
'channel': xpath_text(details, './/details/originChannelTitle'),
})

View File

@ -1,22 +1,15 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
try_get,
update_url_query,
urlencode_postdata,
) )
class PicartoIE(InfoExtractor): class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)(?:/(?P<token>[a-zA-Z0-9]+))?' _VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
_TEST = { _TEST = {
'url': 'https://picarto.tv/Setz', 'url': 'https://picarto.tv/Setz',
'info_dict': { 'info_dict': {
@ -34,65 +27,46 @@ class PicartoIE(InfoExtractor):
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url) return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) channel_id = self._match_id(url)
channel_id = mobj.group('id')
metadata = self._download_json( data = self._download_json(
'https://api.picarto.tv/v1/channel/name/' + channel_id, 'https://ptvintern.picarto.tv/ptvapi', channel_id, query={
channel_id) 'query': '''{
channel(name: "%s") {
adult
id
online
stream_name
title
}
getLoadBalancerUrl(channel_name: "%s") {
url
}
}''' % (channel_id, channel_id),
})['data']
metadata = data['channel']
if metadata.get('online') is False: if metadata.get('online') == 0:
raise ExtractorError('Stream is offline', expected=True) raise ExtractorError('Stream is offline', expected=True)
title = metadata['title']
cdn_data = self._download_json( cdn_data = self._download_json(
'https://picarto.tv/process/channel', channel_id, data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js',
data=urlencode_postdata({'loadbalancinginfo': channel_id}), channel_id, 'Downloading load balancing info')
note='Downloading load balancing info')
token = mobj.group('token') or 'public'
params = {
'con': int(time.time() * 1000),
'token': token,
}
prefered_edge = cdn_data.get('preferedEdge')
formats = [] formats = []
for source in (cdn_data.get('source') or []):
for edge in cdn_data['edges']: source_url = source.get('url')
edge_ep = edge.get('ep') if not source_url:
if not edge_ep or not isinstance(edge_ep, compat_str):
continue continue
edge_id = edge.get('id') source_type = source.get('type')
for tech in cdn_data['techs']: if source_type == 'html5/application/vnd.apple.mpegurl':
tech_label = tech.get('label')
tech_type = tech.get('type')
preference = 0
if edge_id == prefered_edge:
preference += 1
format_id = []
if edge_id:
format_id.append(edge_id)
if tech_type == 'application/x-mpegurl' or tech_label == 'HLS':
format_id.append('hls')
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
update_url_query( source_url, channel_id, 'mp4', m3u8_id='hls', fatal=False))
'https://%s/hls/%s/index.m3u8' elif source_type == 'html5/video/mp4':
% (edge_ep, channel_id), params),
channel_id, 'mp4', preference=preference,
m3u8_id='-'.join(format_id), fatal=False))
continue
elif tech_type == 'video/mp4' or tech_label == 'MP4':
format_id.append('mp4')
formats.append({ formats.append({
'url': update_url_query( 'url': source_url,
'https://%s/mp4/%s.mp4' % (edge_ep, channel_id),
params),
'format_id': '-'.join(format_id),
'preference': preference,
}) })
else:
# rtmp format does not seem to work
continue
self._sort_formats(formats) self._sort_formats(formats)
mature = metadata.get('adult') mature = metadata.get('adult')
@ -103,10 +77,10 @@ class PicartoIE(InfoExtractor):
return { return {
'id': channel_id, 'id': channel_id,
'title': self._live_title(metadata.get('title') or channel_id), 'title': self._live_title(title.strip()),
'is_live': True, 'is_live': True,
'thumbnail': try_get(metadata, lambda x: x['thumbnails']['web']),
'channel': channel_id, 'channel': channel_id,
'channel_id': metadata.get('id'),
'channel_url': 'https://picarto.tv/%s' % channel_id, 'channel_url': 'https://picarto.tv/%s' % channel_id,
'age_limit': age_limit, 'age_limit': age_limit,
'formats': formats, 'formats': formats,

View File

@ -31,6 +31,7 @@ class PinterestBaseIE(InfoExtractor):
title = (data.get('title') or data.get('grid_title') or video_id).strip() title = (data.get('title') or data.get('grid_title') or video_id).strip()
urls = []
formats = [] formats = []
duration = None duration = None
if extract_formats: if extract_formats:
@ -38,8 +39,9 @@ class PinterestBaseIE(InfoExtractor):
if not isinstance(format_dict, dict): if not isinstance(format_dict, dict):
continue continue
format_url = url_or_none(format_dict.get('url')) format_url = url_or_none(format_dict.get('url'))
if not format_url: if not format_url or format_url in urls:
continue continue
urls.append(format_url)
duration = float_or_none(format_dict.get('duration'), scale=1000) duration = float_or_none(format_dict.get('duration'), scale=1000)
ext = determine_ext(format_url) ext = determine_ext(format_url)
if 'hls' in format_id.lower() or ext == 'm3u8': if 'hls' in format_id.lower() or ext == 'm3u8':

View File

@ -393,7 +393,7 @@ query viewClip {
# To somewhat reduce the probability of these consequences # To somewhat reduce the probability of these consequences
# we will sleep random amount of time before each call to ViewClip. # we will sleep random amount of time before each call to ViewClip.
self._sleep( self._sleep(
random.randint(2, 5), display_id, random.randint(5, 10), display_id,
'%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling') '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
if not viewclip: if not viewclip:

View File

@ -167,6 +167,7 @@ class PornHubIE(PornHubBaseIE):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video has been flagged for verification in accordance with our trust and safety policy',
}, { }, {
# subtitles # subtitles
'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5af5fef7c2aa7', 'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5af5fef7c2aa7',
@ -265,7 +266,8 @@ class PornHubIE(PornHubBaseIE):
webpage = dl_webpage('pc') webpage = dl_webpage('pc')
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>', (r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
r'(?s)<section[^>]+class=["\']noVideo["\'][^>]*>(?P<error>.+?)</section>'),
webpage, 'error message', default=None, group='error') webpage, 'error message', default=None, group='error')
if error_msg: if error_msg:
error_msg = re.sub(r'\s+', ' ', error_msg) error_msg = re.sub(r'\s+', ' ', error_msg)
@ -394,34 +396,50 @@ class PornHubIE(PornHubBaseIE):
upload_date = None upload_date = None
formats = [] formats = []
def add_format(format_url, height=None):
ext = determine_ext(format_url)
if ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
return
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
return
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url)
if mobj:
if not height:
height = int(mobj.group('height'))
tbr = int(mobj.group('tbr'))
formats.append({
'url': format_url,
'format_id': '%dp' % height if height else None,
'height': height,
'tbr': tbr,
})
for video_url, height in video_urls: for video_url, height in video_urls:
if not upload_date: if not upload_date:
upload_date = self._search_regex( upload_date = self._search_regex(
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None) r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date: if upload_date:
upload_date = upload_date.replace('/', '') upload_date = upload_date.replace('/', '')
ext = determine_ext(video_url) if '/video/get_media' in video_url:
if ext == 'mpd': medias = self._download_json(video_url, video_id, fatal=False)
formats.extend(self._extract_mpd_formats( if isinstance(medias, list):
video_url, video_id, mpd_id='dash', fatal=False)) for media in medias:
if not isinstance(media, dict):
continue continue
elif ext == 'm3u8': video_url = url_or_none(media.get('videoUrl'))
formats.extend(self._extract_m3u8_formats( if not video_url:
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue continue
tbr = None height = int_or_none(media.get('quality'))
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url) add_format(video_url, height)
if mobj: continue
if not height: add_format(video_url)
height = int(mobj.group('height'))
tbr = int(mobj.group('tbr'))
formats.append({
'url': video_url,
'format_id': '%dp' % height if height else None,
'height': height,
'tbr': tbr,
})
self._sort_formats(formats) self._sort_formats(formats)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(

View File

@ -15,17 +15,17 @@ class RDSIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<id>[^/]+)-\d+\.\d+' _VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<id>[^/]+)-\d+\.\d+'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rds.ca/videos/football/nfl/fowler-jr-prend-la-direction-de-jacksonville-3.1132799', # has two 9c9media ContentPackages, the web player selects the first ContentPackage
'url': 'https://www.rds.ca/videos/Hockey/NationalHockeyLeague/teams/9/forum-du-5-a-7-jesperi-kotkaniemi-de-retour-de-finlande-3.1377606',
'info_dict': { 'info_dict': {
'id': '604333', 'id': '2083309',
'display_id': 'fowler-jr-prend-la-direction-de-jacksonville', 'display_id': 'forum-du-5-a-7-jesperi-kotkaniemi-de-retour-de-finlande',
'ext': 'flv', 'ext': 'flv',
'title': 'Fowler Jr. prend la direction de Jacksonville', 'title': 'Forum du 5 à 7 : Kotkaniemi de retour de Finlande',
'description': 'Dante Fowler Jr. est le troisième choix du repêchage 2015 de la NFL. ', 'description': 'md5:83fa38ecc4a79b19e433433254077f25',
'timestamp': 1430397346, 'timestamp': 1606129030,
'upload_date': '20150430', 'upload_date': '20201123',
'duration': 154.354, 'duration': 773.039,
'age_limit': 0,
} }
}, { }, {
'url': 'http://www.rds.ca/vid%C3%A9os/un-voyage-positif-3.877934', 'url': 'http://www.rds.ca/vid%C3%A9os/un-voyage-positif-3.877934',

View File

@ -6,11 +6,12 @@ import re
from .srgssr import SRGSSRIE from .srgssr import SRGSSRIE
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
unescapeHTML, unescapeHTML,
determine_ext, urljoin,
) )
@ -21,7 +22,7 @@ class RTSIE(SRGSSRIE):
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html', 'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
'md5': 'ff7f8450a90cf58dacb64e29707b4a8e', 'md5': '753b877968ad8afaeddccc374d4256a5',
'info_dict': { 'info_dict': {
'id': '3449373', 'id': '3449373',
'display_id': 'les-enfants-terribles', 'display_id': 'les-enfants-terribles',
@ -35,6 +36,7 @@ class RTSIE(SRGSSRIE):
'thumbnail': r're:^https?://.*\.image', 'thumbnail': r're:^https?://.*\.image',
'view_count': int, 'view_count': int,
}, },
'expected_warnings': ['Unable to download f4m manifest', 'Failed to download m3u8 information'],
}, },
{ {
'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html', 'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
@ -63,11 +65,12 @@ class RTSIE(SRGSSRIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Unable to download f4m manifest', 'Failed to download m3u8 information'],
'skip': 'Blocked outside Switzerland', 'skip': 'Blocked outside Switzerland',
}, },
{ {
'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html', 'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
'md5': '1bae984fe7b1f78e94abc74e802ed99f', 'md5': '9bb06503773c07ce83d3cbd793cebb91',
'info_dict': { 'info_dict': {
'id': '5745356', 'id': '5745356',
'display_id': 'londres-cachee-par-un-epais-smog', 'display_id': 'londres-cachee-par-un-epais-smog',
@ -81,6 +84,7 @@ class RTSIE(SRGSSRIE):
'thumbnail': r're:^https?://.*\.image', 'thumbnail': r're:^https?://.*\.image',
'view_count': int, 'view_count': int,
}, },
'expected_warnings': ['Unable to download f4m manifest', 'Failed to download m3u8 information'],
}, },
{ {
'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html', 'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
@ -160,7 +164,7 @@ class RTSIE(SRGSSRIE):
media_type = 'video' if 'video' in all_info else 'audio' media_type = 'video' if 'video' in all_info else 'audio'
# check for errors # check for errors
self.get_media_data('rts', media_type, media_id) self._get_media_data('rts', media_type, media_id)
info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio'] info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
@ -194,6 +198,7 @@ class RTSIE(SRGSSRIE):
'tbr': extract_bitrate(format_url), 'tbr': extract_bitrate(format_url),
}) })
download_base = 'http://rtsww%s-d.rts.ch/' % ('-a' if media_type == 'audio' else '')
for media in info.get('media', []): for media in info.get('media', []):
media_url = media.get('url') media_url = media.get('url')
if not media_url or re.match(r'https?://', media_url): if not media_url or re.match(r'https?://', media_url):
@ -205,7 +210,7 @@ class RTSIE(SRGSSRIE):
format_id += '-%dk' % rate format_id += '-%dk' % rate
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': 'http://download-video.rts.ch/' + media_url, 'url': urljoin(download_base, media_url),
'tbr': rate or extract_bitrate(media_url), 'tbr': rate or extract_bitrate(media_url),
}) })

View File

@ -2,8 +2,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64 import base64
import io
import re import re
import time import sys
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@ -14,56 +15,13 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
qualities,
remove_end, remove_end,
remove_start, remove_start,
sanitized_Request,
std_headers, std_headers,
) )
_bytes_to_chr = (lambda x: x) if sys.version_info[0] == 2 else (lambda x: map(chr, x))
def _decrypt_url(png):
encrypted_data = compat_b64decode(png)
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:]
length = compat_struct_unpack('!I', text_chunk[:4])[0]
# Use bytearray to get integers when iterating in both python 2.x and 3.x
data = bytearray(text_chunk[8:8 + length])
data = [chr(b) for b in data if b != 0]
hash_index = data.index('#')
alphabet_data = data[:hash_index]
url_data = data[hash_index + 1:]
if url_data[0] == 'H' and url_data[3] == '%':
# remove useless HQ%% at the start
url_data = url_data[4:]
alphabet = []
e = 0
d = 0
for l in alphabet_data:
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in url_data:
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
return url
class RTVEALaCartaIE(InfoExtractor): class RTVEALaCartaIE(InfoExtractor):
@ -79,28 +37,31 @@ class RTVEALaCartaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia', 'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
'duration': 5024.566, 'duration': 5024.566,
'series': 'Balonmano',
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}, { }, {
'note': 'Live stream', 'note': 'Live stream',
'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/', 'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
'info_dict': { 'info_dict': {
'id': '1694255', 'id': '1694255',
'ext': 'flv', 'ext': 'mp4',
'title': 'TODO', 'title': 're:^24H LIVE [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
'skip_download': 'live stream',
}, },
'skip': 'The f4m manifest can\'t be used yet',
}, { }, {
'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/', 'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/',
'md5': 'e55e162379ad587e9640eda4f7353c0f', 'md5': 'd850f3c8731ea53952ebab489cf81cbf',
'info_dict': { 'info_dict': {
'id': '4236788', 'id': '4236788',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Servir y proteger - Capítulo 104', 'title': 'Servir y proteger - Capítulo 104',
'duration': 3222.0, 'duration': 3222.0,
}, },
'params': { 'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve', 'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
'only_matching': True, 'only_matching': True,
@ -111,58 +72,102 @@ class RTVEALaCartaIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8') user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8')
manager_info = self._download_json( self._manager = self._download_json(
'http://www.rtve.es/odin/loki/' + user_agent_b64, 'http://www.rtve.es/odin/loki/' + user_agent_b64,
None, 'Fetching manager info') None, 'Fetching manager info')['manager']
self._manager = manager_info['manager']
@staticmethod
def _decrypt_url(png):
encrypted_data = io.BytesIO(compat_b64decode(png)[8:])
while True:
length = compat_struct_unpack('!I', encrypted_data.read(4))[0]
chunk_type = encrypted_data.read(4)
if chunk_type == b'IEND':
break
data = encrypted_data.read(length)
if chunk_type == b'tEXt':
alphabet_data, text = data.split(b'\0')
quality, url_data = text.split(b'%%')
alphabet = []
e = 0
d = 0
for l in _bytes_to_chr(alphabet_data):
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in _bytes_to_chr(url_data):
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
yield quality.decode(), url
encrypted_data.read(4) # CRC
def _extract_png_formats(self, video_id):
png = self._download_webpage(
'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id),
video_id, 'Downloading url information', query={'q': 'v2'})
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
formats = []
for quality, video_url in self._decrypt_url(png):
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, 'dash', fatal=False))
else:
formats.append({
'format_id': quality,
'quality': q(quality),
'url': video_url,
})
self._sort_formats(formats)
return formats
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
info = self._download_json( info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id, 'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0] video_id)['page']['items'][0]
if info['state'] == 'DESPU': if info['state'] == 'DESPU':
raise ExtractorError('The video is no longer available', expected=True) raise ExtractorError('The video is no longer available', expected=True)
title = info['title'] title = info['title'].strip()
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id) formats = self._extract_png_formats(video_id)
png_request = sanitized_Request(png_url)
png_request.add_header('Referer', url)
png = self._download_webpage(png_request, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
ext = determine_ext(video_url)
formats = []
if not video_url.endswith('.f4m') and ext != 'm3u8':
if '?' not in video_url:
video_url = video_url.replace('resources/', 'auth/resources/')
video_url = video_url.replace('.net.rtve', '.multimedia.cdn.rtve')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'url': video_url,
})
self._sort_formats(formats)
subtitles = None subtitles = None
if info.get('sbtFile') is not None: sbt_file = info.get('sbtFile')
subtitles = self.extract_subtitles(video_id, info['sbtFile']) if sbt_file:
subtitles = self.extract_subtitles(video_id, sbt_file)
is_live = info.get('live') is True
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title) if is_live else title,
'formats': formats, 'formats': formats,
'thumbnail': info.get('image'), 'thumbnail': info.get('image'),
'page_url': url,
'subtitles': subtitles, 'subtitles': subtitles,
'duration': float_or_none(info.get('duration'), scale=1000), 'duration': float_or_none(info.get('duration'), 1000),
'is_live': is_live,
'series': info.get('programTitle'),
} }
def _get_subtitles(self, video_id, sub_file): def _get_subtitles(self, video_id, sub_file):
@ -174,48 +179,26 @@ class RTVEALaCartaIE(InfoExtractor):
for s in subs) for s in subs)
class RTVEInfantilIE(InfoExtractor): class RTVEInfantilIE(RTVEALaCartaIE):
IE_NAME = 'rtve.es:infantil' IE_NAME = 'rtve.es:infantil'
IE_DESC = 'RTVE infantil' IE_DESC = 'RTVE infantil'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/(?P<show>[^/]*)/video/(?P<short_title>[^/]*)/(?P<id>[0-9]+)/' _VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/[^/]+/video/[^/]+/(?P<id>[0-9]+)/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/', 'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/',
'md5': '915319587b33720b8e0357caaa6617e6', 'md5': '5747454717aedf9f9fdf212d1bcfc48d',
'info_dict': { 'info_dict': {
'id': '3040283', 'id': '3040283',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Maneras de vivir', 'title': 'Maneras de vivir',
'thumbnail': 'http://www.rtve.es/resources/jpg/6/5/1426182947956.JPG', 'thumbnail': r're:https?://.+/1426182947956\.JPG',
'duration': 357.958, 'duration': 357.958,
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}] }]
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0]
webpage = self._download_webpage(url, video_id) class RTVELiveIE(RTVEALaCartaIE):
vidplayer_id = self._search_regex(
r' id="vidplayer([0-9]+)"', webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
return {
'id': video_id,
'ext': 'mp4',
'title': info['title'],
'url': video_url,
'thumbnail': info.get('image'),
'duration': float_or_none(info.get('duration'), scale=1000),
}
class RTVELiveIE(InfoExtractor):
IE_NAME = 'rtve.es:live' IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams' IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)' _VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
@ -225,7 +208,7 @@ class RTVELiveIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'la-1', 'id': 'la-1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$', 'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
}, },
'params': { 'params': {
'skip_download': 'live stream', 'skip_download': 'live stream',
@ -234,29 +217,22 @@ class RTVELiveIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
start_time = time.gmtime()
video_id = mobj.group('id') video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es') title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
title = remove_start(title, 'Estoy viendo ') title = remove_start(title, 'Estoy viendo ')
title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
vidplayer_id = self._search_regex( vidplayer_id = self._search_regex(
(r'playerId=player([0-9]+)', (r'playerId=player([0-9]+)',
r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)', r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
r'data-id=["\'](\d+)'), r'data-id=["\'](\d+)'),
webpage, 'internal video ID') webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
m3u8_url = _decrypt_url(png)
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title),
'formats': formats, 'formats': self._extract_png_formats(vidplayer_id),
'is_live': True, 'is_live': True,
} }

View File

@ -10,7 +10,7 @@ from ..utils import (
class SBSIE(InfoExtractor): class SBSIE(InfoExtractor):
IE_DESC = 'sbs.com.au' IE_DESC = 'sbs.com.au'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=)|news/(?:embeds/)?video/)(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=|/watch/)|news/(?:embeds/)?video/)(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
# Original URL is handled by the generic IE which finds the iframe: # Original URL is handled by the generic IE which finds the iframe:
@ -43,6 +43,9 @@ class SBSIE(InfoExtractor):
}, { }, {
'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866', 'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.sbs.com.au/ondemand/watch/1698704451971',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -2,12 +2,18 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json from ..utils import (
get_element_by_class,
int_or_none,
remove_start,
strip_or_none,
unified_strdate,
)
class ScreencastOMaticIE(InfoExtractor): class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://screencast-o-matic\.com/(?:(?:watch|player)/|embed\?.*?\bsc=)(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TESTS = [{
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl', 'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
'md5': '483583cb80d92588f15ccbedd90f0c18', 'md5': '483583cb80d92588f15ccbedd90f0c18',
'info_dict': { 'info_dict': {
@ -16,22 +22,30 @@ class ScreencastOMaticIE(InfoExtractor):
'title': 'Welcome to 3-4 Philosophy @ DECV!', 'title': 'Welcome to 3-4 Philosophy @ DECV!',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.', 'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
'duration': 369.163, 'duration': 369,
} 'upload_date': '20141216',
} }
}, {
'url': 'http://screencast-o-matic.com/player/c2lD3BeOPl',
'only_matching': True,
}, {
'url': 'http://screencast-o-matic.com/embed?ff=true&sc=cbV2r4Q5TL&fromPH=true&a=1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(
'https://screencast-o-matic.com/player/' + video_id, video_id)
jwplayer_data = self._parse_json( info = self._parse_html5_media_entries(url, webpage, video_id)[0]
self._search_regex( info.update({
r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'), 'id': video_id,
video_id, transform_source=js_to_json) 'title': get_element_by_class('overlayTitle', webpage),
'description': strip_or_none(get_element_by_class('overlayDescription', webpage)) or None,
info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False) 'duration': int_or_none(self._search_regex(
info_dict.update({ r'player\.duration\s*=\s*function\(\)\s*{\s*return\s+(\d+);\s*};',
'title': self._og_search_title(webpage), webpage, 'duration', default=None)),
'description': self._og_search_description(webpage), 'upload_date': unified_strdate(remove_start(
get_element_by_class('overlayPublished', webpage), 'Published: ')),
}) })
return info_dict return info

View File

@ -51,13 +51,16 @@ class ShahidIE(ShahidBaseIE):
_NETRC_MACHINE = 'shahid' _NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)' _VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AC%D9%84%D8%B3-%D8%A7%D9%84%D8%B4%D8%A8%D8%A7%D8%A8-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-275286', 'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924',
'info_dict': { 'info_dict': {
'id': '275286', 'id': '816924',
'ext': 'mp4', 'ext': 'mp4',
'title': 'مجلس الشباب الموسم 1 كليب 1', 'title': 'متحف الدحيح الموسم 1 كليب 1',
'timestamp': 1506988800, 'timestamp': 1602806400,
'upload_date': '20171003', 'upload_date': '20201016',
'description': 'برومو',
'duration': 22,
'categories': ['كوميديا'],
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -109,12 +112,15 @@ class ShahidIE(ShahidBaseIE):
page_type = 'episode' page_type = 'episode'
playout = self._call_api( playout = self._call_api(
'playout/url/' + video_id, video_id)['playout'] 'playout/new/url/' + video_id, video_id)['playout']
if playout.get('drm'): if playout.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
formats = self._extract_m3u8_formats(playout['url'], video_id, 'mp4') formats = self._extract_m3u8_formats(re.sub(
# https://docs.aws.amazon.com/mediapackage/latest/ug/manifest-filtering.html
r'aws\.manifestfilter=[\w:;,-]+&?',
'', playout['url']), video_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
# video = self._call_api( # video = self._call_api(

View File

@ -6,9 +6,9 @@ from .mtv import MTVServicesInfoExtractor
class SouthParkIE(MTVServicesInfoExtractor): class SouthParkIE(MTVServicesInfoExtractor):
IE_NAME = 'southpark.cc.com' IE_NAME = 'southpark.cc.com'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark(?:\.cc|studios)\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss' _FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
_TESTS = [{ _TESTS = [{
'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured', 'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured',
@ -23,8 +23,20 @@ class SouthParkIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1', 'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.southparkstudios.com/episodes/h4o269/south-park-stunning-and-brave-season-19-ep-1',
'only_matching': True,
}] }]
def _get_feed_query(self, uri):
return {
'accountOverride': 'intl.mtvi.com',
'arcEp': 'shared.southpark.global',
'ep': '90877963',
'imageEp': 'shared.southpark.global',
'mgid': uri,
}
class SouthParkEsIE(SouthParkIE): class SouthParkEsIE(SouthParkIE):
IE_NAME = 'southpark.cc.com:español' IE_NAME = 'southpark.cc.com:español'

View File

@ -1,82 +1,105 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
clean_html,
float_or_none,
int_or_none,
parse_iso8601, parse_iso8601,
sanitized_Request, strip_or_none,
try_get,
) )
class SportDeutschlandIE(InfoExtractor): class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])' _VALID_URL = r'https?://sportdeutschland\.tv/(?P<id>(?:[^/]+/)?[^?#/&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0', 'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': { 'info_dict': {
'id': 're-live-deutsche-meisterschaften-2020-halbfinals', 'id': '5318cac0275701382770543d7edaf0a0',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals', 'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals - Teil 1',
'categories': ['Badminton-Deutschland'], 'duration': 16106.36,
'view_count': int,
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': int,
'upload_date': '20200201',
'description': 're:.*', # meaningless description for THIS video
}, },
'params': {
'noplaylist': True,
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': {
'id': 'c6e2fdd01f63013854c47054d2ab776f',
'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals',
'description': 'md5:5263ff4c31c04bb780c9f91130b48530',
'duration': 31397,
},
'playlist_count': 2,
}, {
'url': 'https://sportdeutschland.tv/freeride-world-tour-2021-fieberbrunn-oesterreich',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
video_id = mobj.group('id') data = self._download_json(
sport_id = mobj.group('sport') 'https://backend.sportdeutschland.tv/api/permalinks/' + display_id,
display_id, query={'access_token': 'true'})
api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id)
req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json',
'Referer': url,
})
data = self._download_json(req, video_id)
asset = data['asset'] asset = data['asset']
categories = [data['section']['title']] title = (asset.get('title') or asset['label']).strip()
asset_id = asset.get('id') or asset.get('uuid')
formats = [] info = {
smil_url = asset['video'] 'id': asset_id,
if '.smil' in smil_url: 'title': title,
m3u8_url = smil_url.replace('.smil', '.m3u8') 'description': clean_html(asset.get('body') or asset.get('description')) or asset.get('teaser'),
formats.extend( 'duration': int_or_none(asset.get('seconds')),
self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')) }
videos = asset.get('videos') or []
smil_doc = self._download_xml( if len(videos) > 1:
smil_url, video_id, note='Downloading SMIL metadata') playlist_id = compat_parse_qs(compat_urllib_parse_urlparse(url).query).get('playlistId', [None])[0]
base_url_el = smil_doc.find('./head/meta') if playlist_id:
if base_url_el: if self._downloader.params.get('noplaylist'):
base_url = base_url_el.attrib['base'] videos = [videos[int(playlist_id)]]
formats.extend([{ self.to_screen('Downloading just a single video because of --no-playlist')
'format_id': 'rmtp',
'url': base_url if base_url_el else n.attrib['src'],
'play_path': n.attrib['src'],
'ext': 'flv',
'preference': -100,
'format_note': 'Seems to fail at example stream',
} for n in smil_doc.findall('./body/video')])
else: else:
formats.append({'url': smil_url}) self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % asset_id)
self._sort_formats(formats) def entries():
for i, video in enumerate(videos, 1):
return { video_id = video.get('uuid')
video_url = video.get('url')
if not (video_id and video_url):
continue
formats = self._extract_m3u8_formats(
video_url.replace('.smil', '.m3u8'), video_id, 'mp4', fatal=False)
if not formats:
continue
yield {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': asset['title'], 'title': title + ' - ' + (video.get('label') or 'Teil %d' % i),
'thumbnail': asset.get('image'), 'duration': float_or_none(video.get('duration')),
'description': asset.get('teaser'),
'duration': asset.get('duration'),
'categories': categories,
'view_count': asset.get('views'),
'rtmp_live': asset.get('live'),
'timestamp': parse_iso8601(asset.get('date')),
} }
info.update({
'_type': 'multi_video',
'entries': entries(),
})
else:
formats = self._extract_m3u8_formats(
videos[0]['url'].replace('.smil', '.m3u8'), asset_id, 'mp4')
section_title = strip_or_none(try_get(data, lambda x: x['section']['title']))
info.update({
'formats': formats,
'display_id': asset.get('permalink'),
'thumbnail': try_get(asset, lambda x: x['images'][0]),
'categories': [section_title] if section_title else None,
'view_count': int_or_none(asset.get('views')),
'is_live': asset.get('is_live') is True,
'timestamp': parse_iso8601(asset.get('date') or asset.get('published_at')),
})
return info

View File

@ -4,16 +4,32 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none,
int_or_none,
parse_iso8601, parse_iso8601,
qualities, qualities,
try_get,
) )
class SRGSSRIE(InfoExtractor): class SRGSSRIE(InfoExtractor):
_VALID_URL = r'(?:https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|srgssr):(?P<bu>srf|rts|rsi|rtr|swi):(?:[^:]+:)?(?P<type>video|audio):(?P<id>[0-9a-f\-]{36}|\d+)' _VALID_URL = r'''(?x)
(?:
https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|
srgssr
):
(?P<bu>
srf|rts|rsi|rtr|swi
):(?:[^:]+:)?
(?P<type>
video|audio
):
(?P<id>
[0-9a-f\-]{36}|\d+
)
'''
_GEO_BYPASS = False _GEO_BYPASS = False
_GEO_COUNTRIES = ['CH'] _GEO_COUNTRIES = ['CH']
@ -25,25 +41,39 @@ class SRGSSRIE(InfoExtractor):
'LEGAL': 'The video cannot be transmitted for legal reasons.', 'LEGAL': 'The video cannot be transmitted for legal reasons.',
'STARTDATE': 'This video is not yet available. Please try again later.', 'STARTDATE': 'This video is not yet available. Please try again later.',
} }
_DEFAULT_LANGUAGE_CODES = {
'srf': 'de',
'rts': 'fr',
'rsi': 'it',
'rtr': 'rm',
'swi': 'en',
}
def _get_tokenized_src(self, url, video_id, format_id): def _get_tokenized_src(self, url, video_id, format_id):
sp = compat_urllib_parse_urlparse(url).path.split('/')
token = self._download_json( token = self._download_json(
'http://tp.srgssr.ch/akahd/token?acl=/%s/%s/*' % (sp[1], sp[2]), 'http://tp.srgssr.ch/akahd/token?acl=*',
video_id, 'Downloading %s token' % format_id, fatal=False) or {} video_id, 'Downloading %s token' % format_id, fatal=False) or {}
auth_params = token.get('token', {}).get('authparams') auth_params = try_get(token, lambda x: x['token']['authparams'])
if auth_params: if auth_params:
url += '?' + auth_params url += ('?' if '?' not in url else '&') + auth_params
return url return url
def get_media_data(self, bu, media_type, media_id): def _get_media_data(self, bu, media_type, media_id):
media_data = self._download_json( query = {'onlyChapters': True} if media_type == 'video' else {}
'http://il.srgssr.ch/integrationlayer/1.0/ue/%s/%s/play/%s.json' % (bu, media_type, media_id), full_media_data = self._download_json(
media_id)[media_type.capitalize()] 'https://il.srgssr.ch/integrationlayer/2.0/%s/mediaComposition/%s/%s.json'
% (bu, media_type, media_id),
media_id, query=query)['chapterList']
try:
media_data = next(
x for x in full_media_data if x.get('id') == media_id)
except StopIteration:
raise ExtractorError('No media information found')
if media_data.get('block') and media_data['block'] in self._ERRORS: block_reason = media_data.get('blockReason')
message = self._ERRORS[media_data['block']] if block_reason and block_reason in self._ERRORS:
if media_data['block'] == 'GEOBLOCK': message = self._ERRORS[block_reason]
if block_reason == 'GEOBLOCK':
self.raise_geo_restricted( self.raise_geo_restricted(
msg=message, countries=self._GEO_COUNTRIES) msg=message, countries=self._GEO_COUNTRIES)
raise ExtractorError( raise ExtractorError(
@ -53,53 +83,75 @@ class SRGSSRIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
bu, media_type, media_id = re.match(self._VALID_URL, url).groups() bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
media_data = self._get_media_data(bu, media_type, media_id)
title = media_data['title']
media_data = self.get_media_data(bu, media_type, media_id)
metadata = media_data['AssetMetadatas']['AssetMetadata'][0]
title = metadata['title']
description = metadata.get('description')
created_date = media_data.get('createdDate') or metadata.get('createdDate')
timestamp = parse_iso8601(created_date)
thumbnails = [{
'id': image.get('id'),
'url': image['url'],
} for image in media_data.get('Image', {}).get('ImageRepresentations', {}).get('ImageRepresentation', [])]
preference = qualities(['LQ', 'MQ', 'SD', 'HQ', 'HD'])
formats = [] formats = []
for source in media_data.get('Playlists', {}).get('Playlist', []) + media_data.get('Downloads', {}).get('Download', []): q = qualities(['SD', 'HD'])
protocol = source.get('@protocol') for source in (media_data.get('resourceList') or []):
for asset in source['url']: format_url = source.get('url')
asset_url = asset['text'] if not format_url:
quality = asset['@quality'] continue
format_id = '%s-%s' % (protocol, quality) protocol = source.get('protocol')
if protocol.startswith('HTTP-HDS') or protocol.startswith('HTTP-HLS'): quality = source.get('quality')
asset_url = self._get_tokenized_src(asset_url, media_id, format_id) format_id = []
if protocol.startswith('HTTP-HDS'): for e in (protocol, source.get('encoding'), quality):
formats.extend(self._extract_f4m_formats( if e:
asset_url + ('?' if '?' not in asset_url else '&') + 'hdcore=3.4.0', format_id.append(e)
media_id, f4m_id=format_id, fatal=False)) format_id = '-'.join(format_id)
elif protocol.startswith('HTTP-HLS'):
if protocol in ('HDS', 'HLS'):
if source.get('tokenType') == 'AKAMAI':
format_url = self._get_tokenized_src(
format_url, media_id, format_id)
formats.extend(self._extract_akamai_formats(
format_url, media_id))
elif protocol == 'HLS':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
asset_url, media_id, 'mp4', 'm3u8_native', format_url, media_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
else: elif protocol in ('HTTP', 'HTTPS'):
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': asset_url, 'url': format_url,
'preference': preference(quality), 'quality': q(quality),
'ext': 'flv' if protocol == 'RTMP' else None, })
# This is needed because for audio medias the podcast url is usually
# always included, even if is only an audio segment and not the
# whole episode.
if int_or_none(media_data.get('position')) == 0:
for p in ('S', 'H'):
podcast_url = media_data.get('podcast%sdUrl' % p)
if not podcast_url:
continue
quality = p + 'D'
formats.append({
'format_id': 'PODCAST-' + quality,
'url': podcast_url,
'quality': q(quality),
}) })
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
if media_type == 'video':
for sub in (media_data.get('subtitleList') or []):
sub_url = sub.get('url')
if not sub_url:
continue
lang = sub.get('locale') or self._DEFAULT_LANGUAGE_CODES[bu]
subtitles.setdefault(lang, []).append({
'url': sub_url,
})
return { return {
'id': media_id, 'id': media_id,
'title': title, 'title': title,
'description': description, 'description': media_data.get('description'),
'timestamp': timestamp, 'timestamp': parse_iso8601(media_data.get('date')),
'thumbnails': thumbnails, 'thumbnail': media_data.get('imageUrl'),
'duration': float_or_none(media_data.get('duration'), 1000),
'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }
@ -119,26 +171,17 @@ class SRGSSRPlayIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5', 'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
'md5': 'da6b5b3ac9fa4761a942331cef20fcb3', 'md5': '6db2226ba97f62ad42ce09783680046c',
'info_dict': { 'info_dict': {
'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5', 'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20130701', 'upload_date': '20130701',
'title': 'Snowden beantragt Asyl in Russland', 'title': 'Snowden beantragt Asyl in Russland',
'timestamp': 1372713995, 'timestamp': 1372708215,
} 'duration': 113.827,
}, { 'thumbnail': r're:^https?://.*1383719781\.png$',
# No Speichern (Save) button
'url': 'http://www.srf.ch/play/tv/top-gear/video/jaguar-xk120-shadow-und-tornado-dampflokomotive?id=677f5829-e473-4823-ac83-a1087fe97faa',
'md5': '0a274ce38fda48c53c01890651985bc6',
'info_dict': {
'id': '677f5829-e473-4823-ac83-a1087fe97faa',
'ext': 'flv',
'upload_date': '20130710',
'title': 'Jaguar XK120, Shadow und Tornado-Dampflokomotive',
'description': 'md5:88604432b60d5a38787f152dec89cd56',
'timestamp': 1373493600,
}, },
'expected_warnings': ['Unable to download f4m manifest'],
}, { }, {
'url': 'http://www.rtr.ch/play/radio/actualitad/audio/saira-tujetsch-tuttina-cuntinuar-cun-sedrun-muster-turissem?id=63cb0778-27f8-49af-9284-8c7a8c6d15fc', 'url': 'http://www.rtr.ch/play/radio/actualitad/audio/saira-tujetsch-tuttina-cuntinuar-cun-sedrun-muster-turissem?id=63cb0778-27f8-49af-9284-8c7a8c6d15fc',
'info_dict': { 'info_dict': {
@ -146,7 +189,8 @@ class SRGSSRPlayIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'upload_date': '20151013', 'upload_date': '20151013',
'title': 'Saira: Tujetsch - tuttina cuntinuar cun Sedrun Mustér Turissem', 'title': 'Saira: Tujetsch - tuttina cuntinuar cun Sedrun Mustér Turissem',
'timestamp': 1444750398, 'timestamp': 1444709160,
'duration': 336.816,
}, },
'params': { 'params': {
# rtmp download # rtmp download
@ -159,19 +203,32 @@ class SRGSSRPlayIE(InfoExtractor):
'id': '6348260', 'id': '6348260',
'display_id': '6348260', 'display_id': '6348260',
'ext': 'mp4', 'ext': 'mp4',
'duration': 1796, 'duration': 1796.76,
'title': 'Le 19h30', 'title': 'Le 19h30',
'description': '',
'uploader': '19h30',
'upload_date': '20141201', 'upload_date': '20141201',
'timestamp': 1417458600, 'timestamp': 1417458600,
'thumbnail': r're:^https?://.*\.image', 'thumbnail': r're:^https?://.*\.image',
'view_count': int,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'http://play.swissinfo.ch/play/tv/business/video/why-people-were-against-tax-reforms?id=42960270',
'info_dict': {
'id': '42960270',
'ext': 'mp4',
'title': 'Why people were against tax reforms',
'description': 'md5:7ac442c558e9630e947427469c4b824d',
'duration': 94.0,
'upload_date': '20170215',
'timestamp': 1487173560,
'thumbnail': r're:https?://www\.swissinfo\.ch/srgscalableimage/42961964',
'subtitles': 'count:9',
},
'params': {
'skip_download': True,
}
}, { }, {
'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01', 'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01',
'only_matching': True, 'only_matching': True,
@ -181,6 +238,10 @@ class SRGSSRPlayIE(InfoExtractor):
}, { }, {
'url': 'https://www.rts.ch/play/tv/19h30/video/le-19h30?urn=urn:rts:video:6348260', 'url': 'https://www.rts.ch/play/tv/19h30/video/le-19h30?urn=urn:rts:video:6348260',
'only_matching': True, 'only_matching': True,
}, {
# audio segment, has podcastSdUrl of the full episode
'url': 'https://www.srf.ch/play/radio/popupaudioplayer?id=50b20dc8-f05b-4972-bf03-e438ff2833eb',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -188,5 +249,4 @@ class SRGSSRPlayIE(InfoExtractor):
bu = mobj.group('bu') bu = mobj.group('bu')
media_type = mobj.group('type') or mobj.group('type_2') media_type = mobj.group('type') or mobj.group('type_2')
media_id = mobj.group('id') media_id = mobj.group('id')
# other info can be extracted from url + '&layout=json'
return self.url_result('srgssr:%s:%s:%s' % (bu[:3], media_type, media_id), 'SRGSSR') return self.url_result('srgssr:%s:%s:%s' % (bu[:3], media_type, media_id), 'SRGSSR')

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none
class StretchInternetIE(InfoExtractor): class StretchInternetIE(InfoExtractor):
@ -11,22 +10,28 @@ class StretchInternetIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '573272', 'id': '573272',
'ext': 'mp4', 'ext': 'mp4',
'title': 'University of Mary Wrestling vs. Upper Iowa', 'title': 'UNIVERSITY OF MARY WRESTLING VS UPPER IOWA',
'timestamp': 1575668361, # 'timestamp': 1575668361,
'upload_date': '20191206', # 'upload_date': '20191206',
'uploader_id': '99997',
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
media_url = self._download_json(
'https://core.stretchlive.com/trinity/event/tcg/' + video_id,
video_id)[0]['media'][0]['url']
event = self._download_json( event = self._download_json(
'https://api.stretchinternet.com/trinity/event/tcg/' + video_id, 'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
video_id)[0] video_id, query={'eventID': video_id, 'token': 'asdf'})['event']
return { return {
'id': video_id, 'id': video_id,
'title': event['title'], 'title': event['title'],
'timestamp': int_or_none(event.get('dateCreated'), 1000), # TODO: parse US timezone abbreviations
'url': 'https://' + event['media'][0]['url'], # 'timestamp': event.get('dateTimeString'),
'url': 'https://' + media_url,
'uploader_id': event.get('ownerID'),
} }

View File

@ -146,18 +146,19 @@ class SVTPlayIE(SVTPlayBaseIE):
) )
(?P<svt_id>[^/?#&]+)| (?P<svt_id>[^/?#&]+)|
https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+) https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+)
(?:.*?(?:modalId|id)=(?P<modal_id>[\da-zA-Z-]+))?
) )
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://www.svtplay.se/video/26194546/det-har-ar-himlen', 'url': 'https://www.svtplay.se/video/30479064',
'md5': '2382036fd6f8c994856c323fe51c426e', 'md5': '2382036fd6f8c994856c323fe51c426e',
'info_dict': { 'info_dict': {
'id': 'jNwpV9P', 'id': '8zVbDPA',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Det här är himlen', 'title': 'Designdrömmar i Stenungsund',
'timestamp': 1586044800, 'timestamp': 1615770000,
'upload_date': '20200405', 'upload_date': '20210315',
'duration': 3515, 'duration': 3519,
'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$', 'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$',
'age_limit': 0, 'age_limit': 0,
'subtitles': { 'subtitles': {
@ -173,6 +174,12 @@ class SVTPlayIE(SVTPlayBaseIE):
# AssertionError: Expected test_SVTPlay_jNwpV9P.mp4 to be at least 9.77KiB, but it's only 864.00B # AssertionError: Expected test_SVTPlay_jNwpV9P.mp4 to be at least 9.77KiB, but it's only 864.00B
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://www.svtplay.se/video/30479064/husdrommar/husdrommar-sasong-8-designdrommar-i-stenungsund?modalId=8zVbDPA',
'only_matching': True,
}, {
'url': 'https://www.svtplay.se/video/30684086/rapport/rapport-24-apr-18-00-7?id=e72gVpa',
'only_matching': True,
}, { }, {
# geo restricted to Sweden # geo restricted to Sweden
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten', 'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
@ -219,7 +226,8 @@ class SVTPlayIE(SVTPlayBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id, svt_id = mobj.group('id', 'svt_id') video_id = mobj.group('id')
svt_id = mobj.group('svt_id') or mobj.group('modal_id')
if svt_id: if svt_id:
return self._extract_by_video_id(svt_id) return self._extract_by_video_id(svt_id)
@ -254,6 +262,7 @@ class SVTPlayIE(SVTPlayBaseIE):
if not svt_id: if not svt_id:
svt_id = self._search_regex( svt_id = self._search_regex(
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)', (r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
r'<[^>]+\bdata-rt=["\']top-area-play-button["\'][^>]+\bhref=["\'][^"\']*video/%s/[^"\']*\b(?:modalId|id)=([\da-zA-Z-]+)' % re.escape(video_id),
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)', r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)', r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)',
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"', r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',

View File

@ -1,92 +1,87 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..utils import (
int_or_none,
parse_iso8601,
try_get,
)
class TF1IE(InfoExtractor): class TF1IE(InfoExtractor):
"""TF1 uses the wat.tv player.""" _VALID_URL = r'https?://(?:www\.)?tf1\.fr/[^/]+/(?P<program_slug>[^/]+)/videos/(?P<id>[^/?&#]+)\.html'
_VALID_URL = r'https?://(?:(?:videos|www|lci)\.tf1|(?:www\.)?(?:tfou|ushuaiatv|histoire|tvbreizh))\.fr/(?:[^/]+/)*(?P<id>[^/?#.]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
'info_dict': {
'id': '10635995',
'ext': 'mp4',
'title': 'Citroën Grand C4 Picasso 2013 : présentation officielle',
'description': 'Vidéo officielle du nouveau Citroën Grand C4 Picasso, lancé à l\'automne 2013.',
},
'params': {
# Sometimes wat serves the whole file with the --test option
'skip_download': True,
},
'expected_warnings': ['HTTP Error 404'],
}, {
'url': 'http://www.tfou.fr/chuggington/videos/le-grand-mysterioso-chuggington-7085291-739.html',
'info_dict': {
'id': 'le-grand-mysterioso-chuggington-7085291-739',
'ext': 'mp4',
'title': 'Le grand Mystérioso - Chuggington',
'description': 'Le grand Mystérioso - Emery rêve qu\'un article lui soit consacré dans le journal.',
'upload_date': '20150103',
},
'params': {
# Sometimes wat serves the whole file with the --test option
'skip_download': True,
},
'skip': 'HTTP Error 410: Gone',
}, {
'url': 'http://www.tf1.fr/tf1/koh-lanta/videos/replay-koh-lanta-22-mai-2015.html',
'only_matching': True,
}, {
'url': 'http://lci.tf1.fr/sept-a-huit/videos/sept-a-huit-du-24-mai-2015-8611550.html',
'only_matching': True,
}, {
'url': 'http://www.tf1.fr/hd1/documentaire/videos/mylene-farmer-d-une-icone.html',
'only_matching': True,
}, {
'url': 'https://www.tf1.fr/tmc/quotidien-avec-yann-barthes/videos/quotidien-premiere-partie-11-juin-2019.html', 'url': 'https://www.tf1.fr/tmc/quotidien-avec-yann-barthes/videos/quotidien-premiere-partie-11-juin-2019.html',
'info_dict': { 'info_dict': {
'id': '13641379', 'id': '13641379',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:f392bc52245dc5ad43771650c96fb620', 'title': 'md5:f392bc52245dc5ad43771650c96fb620',
'description': 'md5:44bc54f0a21322f5b91d68e76a544eae', 'description': 'md5:a02cdb217141fb2d469d6216339b052f',
'upload_date': '20190611', 'upload_date': '20190611',
'timestamp': 1560273989,
'duration': 1738,
'series': 'Quotidien avec Yann Barthès',
'tags': ['intégrale', 'quotidien', 'Replay'],
}, },
'params': { 'params': {
# Sometimes wat serves the whole file with the --test option # Sometimes wat serves the whole file with the --test option
'skip_download': True, 'skip_download': True,
'format': 'bestvideo',
}, },
}, {
'url': 'http://www.tf1.fr/tf1/koh-lanta/videos/replay-koh-lanta-22-mai-2015.html',
'only_matching': True,
}, {
'url': 'http://www.tf1.fr/hd1/documentaire/videos/mylene-farmer-d-une-icone.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) program_slug, slug = re.match(self._VALID_URL, url).groups()
video = self._download_json(
'https://www.tf1.fr/graphql/web', slug, query={
'id': '9b80783950b85247541dd1d851f9cc7fa36574af015621f853ab111a679ce26f',
'variables': json.dumps({
'programSlug': program_slug,
'slug': slug,
})
})['data']['videoBySlug']
wat_id = video['streamId']
webpage = self._download_webpage(url, video_id) tags = []
for tag in (video.get('tags') or []):
label = tag.get('label')
if not label:
continue
tags.append(label)
wat_id = None decoration = video.get('decoration') or {}
data = self._parse_json( thumbnails = []
self._search_regex( for source in (try_get(decoration, lambda x: x['image']['sources'], list) or []):
r'__APOLLO_STATE__\s*=\s*({.+?})\s*(?:;|</script>)', webpage, source_url = source.get('url')
'data', default='{}'), video_id, fatal=False) if not source_url:
continue
thumbnails.append({
'url': source_url,
'width': int_or_none(source.get('width')),
})
if data: return {
try: '_type': 'url_transparent',
wat_id = next( 'id': wat_id,
video.get('streamId') 'url': 'wat:' + wat_id,
for key, video in data.items() 'title': video.get('title'),
if isinstance(video, dict) 'thumbnails': thumbnails,
and video.get('slug') == video_id) 'description': decoration.get('description'),
if not isinstance(wat_id, compat_str) or not wat_id.isdigit(): 'timestamp': parse_iso8601(video.get('date')),
wat_id = None 'duration': int_or_none(try_get(video, lambda x: x['publicPlayingInfos']['duration'])),
except StopIteration: 'tags': tags,
pass 'series': decoration.get('programLabel'),
'season_number': int_or_none(video.get('season')),
if not wat_id: 'episode_number': int_or_none(video.get('episode')),
wat_id = self._html_search_regex( }
(r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
r'(["\']?)streamId\1\s*:\s*(["\']?)(?P<id>\d+)\2'),
webpage, 'wat id', group='id')
return self.url_result('wat:%s' % wat_id, 'Wat')

View File

@ -107,9 +107,12 @@ class TikTokIE(TikTokBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data = self._parse_json(self._search_regex( page_props = self._parse_json(self._search_regex(
r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script', r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
webpage, 'data'), video_id)['props']['pageProps']['itemInfo']['itemStruct'] webpage, 'data'), video_id)['props']['pageProps']
data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict)
if not data and page_props.get('statusCode') == 10216:
raise ExtractorError('This video is private', expected=True)
return self._extract_video(data, video_id) return self._extract_video(data, video_id)

View File

@ -2,55 +2,110 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformIE
from .kaltura import KalturaIE
from ..utils import (
int_or_none,
unified_timestamp,
)
class TMZIE(InfoExtractor): class TMZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.tmz.com/videos/0_okj015ty/',
'md5': '4d22a51ef205b6c06395d8394f72d560',
'info_dict': {
'id': '0_okj015ty',
'ext': 'mp4',
'title': 'Kim Kardashian\'s Boobs Unlock a Mystery!',
'description': 'Did Kim Kardasain try to one-up Khloe by one-upping Kylie??? Or is she just showing off her amazing boobs?',
'timestamp': 1394747163,
'uploader_id': 'batchUser',
'upload_date': '20140313',
}
}, {
'url': 'http://www.tmz.com/videos/0-cegprt2p/', 'url': 'http://www.tmz.com/videos/0-cegprt2p/',
'md5': '31f9223e20eef55954973359afa61a20',
'info_dict': {
'id': 'P6YjLBLk',
'ext': 'mp4',
'title': "No Charges Against Hillary Clinton? Harvey Says It Ain't Over Yet",
'description': 'md5:b714359fc18607715ebccbd2da8ff488',
'timestamp': 1467831837,
'upload_date': '20160706',
},
'add_ie': [JWPlatformIE.ie_key()],
}, {
'url': 'http://www.tmz.com/videos/0_okj015ty/',
'only_matching': True,
}, {
'url': 'https://www.tmz.com/videos/071119-chris-morgan-women-4590005-0-zcsejvcr/',
'only_matching': True,
}, {
'url': 'https://www.tmz.com/videos/2021-02-19-021921-floyd-mayweather-1043872/',
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url).replace('-', '_') video_id = self._match_id(url).replace('-', '_')
return self.url_result('kaltura:591531:%s' % video_id, 'Kaltura', video_id)
webpage = self._download_webpage(url, video_id, fatal=False)
if webpage:
tmz_video_id = self._search_regex(
r'nodeRef\s*:\s*["\']tmz:video:([\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12})',
webpage, 'video id', default=None)
video = self._download_json(
'https://www.tmz.com/_/video/%s' % tmz_video_id, video_id,
fatal=False)
if video:
message = video['message']
info = {
'_type': 'url_transparent',
'title': message.get('title'),
'description': message.get('description'),
'timestamp': unified_timestamp(message.get('published_at')),
'duration': int_or_none(message.get('duration')),
}
jwplatform_id = message.get('jwplayer_media_id')
if jwplatform_id:
info.update({
'url': 'jwplatform:%s' % jwplatform_id,
'ie_key': JWPlatformIE.ie_key(),
})
else:
kaltura_entry_id = message.get('kaltura_entry_id') or video_id
kaltura_partner_id = message.get('kaltura_partner_id') or '591531'
info.update({
'url': 'kaltura:%s:%s' % (kaltura_partner_id, kaltura_entry_id),
'ie_key': KalturaIE.ie_key(),
})
return info
return self.url_result(
'kaltura:591531:%s' % video_id, KalturaIE.ie_key(), video_id)
class TMZArticleIE(InfoExtractor): class TMZArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tmz\.com/\d{4}/\d{2}/\d{2}/(?P<id>[^/]+)/?' _VALID_URL = r'https?://(?:www\.)?tmz\.com/\d{4}/\d{2}/\d{2}/(?P<id>[^/?#&]+)'
_TEST = { _TEST = {
'url': 'http://www.tmz.com/2015/04/19/bobby-brown-bobbi-kristina-awake-video-concert', 'url': 'http://www.tmz.com/2015/04/19/bobby-brown-bobbi-kristina-awake-video-concert',
'md5': '3316ff838ae5bb7f642537825e1e90d2',
'info_dict': { 'info_dict': {
'id': '0_6snoelag', 'id': 'PAKZa97W',
'ext': 'mov', 'ext': 'mp4',
'title': 'Bobby Brown Tells Crowd ... Bobbi Kristina is Awake', 'title': 'Bobby Brown Tells Crowd ... Bobbi Kristina is Awake',
'description': 'Bobby Brown stunned his audience during a concert Saturday night, when he told the crowd, "Bobbi is awake. She\'s watching me."', 'description': 'Bobby Brown stunned his audience during a concert Saturday night, when he told the crowd, "Bobbi is awake. She\'s watching me."',
'timestamp': 1429467813, 'timestamp': 1429466400,
'upload_date': '20150419', 'upload_date': '20150419',
'uploader_id': 'batchUser', },
} 'params': {
'skip_download': True,
},
'add_ie': [JWPlatformIE.ie_key()],
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
tmz_url = self._search_regex(
r'clickLink\s*\(\s*["\'](?P<url>%s)' % TMZIE._VALID_URL, webpage,
'video id', default=None, group='url')
if tmz_url:
return self.url_result(tmz_url, ie=TMZIE.ie_key())
embedded_video_info = self._parse_json(self._html_search_regex( embedded_video_info = self._parse_json(self._html_search_regex(
r'tmzVideoEmbed\(({.+?})\);', webpage, 'embedded video info'), r'tmzVideoEmbed\(({.+?})\);', webpage, 'embedded video info'),
video_id) video_id)
return self.url_result( return self.url_result(
'http://www.tmz.com/videos/%s/' % embedded_video_info['id']) 'http://www.tmz.com/videos/%s/' % embedded_video_info['id'],
ie=TMZIE.ie_key())

View File

@ -153,6 +153,7 @@ class TrovoVodIE(TrovoBaseIE):
'protocol': 'm3u8_native', 'protocol': 'm3u8_native',
'tbr': int_or_none(play_info.get('bitrate')), 'tbr': int_or_none(play_info.get('bitrate')),
'url': play_url, 'url': play_url,
'http_headers': {'Origin': 'https://trovo.live'},
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -74,6 +74,12 @@ class TV2DKIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
entries = [] entries = []
def add_entry(partner_id, kaltura_id):
entries.append(self.url_result(
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura',
video_id=kaltura_id))
for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage): for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage):
video = extract_attributes(video_el) video = extract_attributes(video_el)
kaltura_id = video.get('data-entryid') kaltura_id = video.get('data-entryid')
@ -82,9 +88,14 @@ class TV2DKIE(InfoExtractor):
partner_id = video.get('data-partnerid') partner_id = video.get('data-partnerid')
if not partner_id: if not partner_id:
continue continue
entries.append(self.url_result( add_entry(partner_id, kaltura_id)
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura', if not entries:
video_id=kaltura_id)) kaltura_id = self._search_regex(
r'entry_id\s*:\s*["\']([0-9a-z_]+)', webpage, 'kaltura id')
partner_id = self._search_regex(
(r'\\u002Fp\\u002F(\d+)\\u002F', r'/p/(\d+)/'), webpage,
'partner id')
add_entry(partner_id, kaltura_id)
return self.playlist_result(entries) return self.playlist_result(entries)

View File

@ -25,6 +25,10 @@ class TVerIE(InfoExtractor):
}, { }, {
'url': 'https://tver.jp/episode/79622438', 'url': 'https://tver.jp/episode/79622438',
'only_matching': True, 'only_matching': True,
}, {
# subtitle = ' '
'url': 'https://tver.jp/corner/f0068870',
'only_matching': True,
}] }]
_TOKEN = None _TOKEN = None
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
@ -40,28 +44,18 @@ class TVerIE(InfoExtractor):
query={'token': self._TOKEN})['main'] query={'token': self._TOKEN})['main']
p_id = main['publisher_id'] p_id = main['publisher_id']
service = remove_start(main['service'], 'ts_') service = remove_start(main['service'], 'ts_')
info = {
'_type': 'url_transparent',
'description': try_get(main, lambda x: x['note'][0]['text'], compat_str),
'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])),
}
if service == 'cx':
info.update({
'title': main.get('subtitle') or main['title'],
'url': 'https://i.fod.fujitv.co.jp/plus7/web/%s/%s.html' % (p_id[:4], p_id),
'ie_key': 'FujiTVFODPlus7',
})
else:
r_id = main['reference_id'] r_id = main['reference_id']
if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'): if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'):
r_id = 'ref:' + r_id r_id = 'ref:' + r_id
bc_url = smuggle_url( bc_url = smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id), self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id),
{'geo_countries': ['JP']}) {'geo_countries': ['JP']})
info.update({
return {
'_type': 'url_transparent',
'description': try_get(main, lambda x: x['note'][0]['text'], compat_str),
'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])),
'url': bc_url, 'url': bc_url,
'ie_key': 'BrightcoveNew', 'ie_key': 'BrightcoveNew',
}) }
return info

View File

@ -19,6 +19,7 @@ from ..utils import (
strip_or_none, strip_or_none,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_or_none,
xpath_text, xpath_text,
) )
@ -52,6 +53,9 @@ class TwitterBaseIE(InfoExtractor):
return [f] return [f]
def _extract_formats_from_vmap_url(self, vmap_url, video_id): def _extract_formats_from_vmap_url(self, vmap_url, video_id):
vmap_url = url_or_none(vmap_url)
if not vmap_url:
return []
vmap_data = self._download_xml(vmap_url, video_id) vmap_data = self._download_xml(vmap_url, video_id)
formats = [] formats = []
urls = [] urls = []

View File

@ -21,6 +21,11 @@ class URPlayIE(InfoExtractor):
'description': 'md5:5344508a52aa78c1ced6c1b8b9e44e9a', 'description': 'md5:5344508a52aa78c1ced6c1b8b9e44e9a',
'timestamp': 1513292400, 'timestamp': 1513292400,
'upload_date': '20171214', 'upload_date': '20171214',
'series': 'UR Samtiden - Livet, universum och rymdens märkliga musik',
'duration': 2269,
'categories': ['Kultur & historia'],
'tags': ['Kritiskt tänkande', 'Vetenskap', 'Vetenskaplig verksamhet'],
'episode': 'Om vetenskap, kritiskt tänkande och motstånd',
}, },
}, { }, {
'url': 'https://urskola.se/Produkter/190031-Tripp-Trapp-Trad-Sovkudde', 'url': 'https://urskola.se/Produkter/190031-Tripp-Trapp-Trad-Sovkudde',
@ -31,6 +36,10 @@ class URPlayIE(InfoExtractor):
'description': 'md5:b86bffdae04a7e9379d1d7e5947df1d1', 'description': 'md5:b86bffdae04a7e9379d1d7e5947df1d1',
'timestamp': 1440086400, 'timestamp': 1440086400,
'upload_date': '20150820', 'upload_date': '20150820',
'series': 'Tripp, Trapp, Träd',
'duration': 865,
'tags': ['Sova'],
'episode': 'Sovkudde',
}, },
}, { }, {
'url': 'http://urskola.se/Produkter/155794-Smasagor-meankieli-Grodan-i-vida-varlden', 'url': 'http://urskola.se/Produkter/155794-Smasagor-meankieli-Grodan-i-vida-varlden',
@ -41,9 +50,11 @@ class URPlayIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
url = url.replace('skola.se/Produkter', 'play.se/program') url = url.replace('skola.se/Produkter', 'play.se/program')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
urplayer_data = self._parse_json(self._html_search_regex( vid = int(video_id)
accessible_episodes = self._parse_json(self._html_search_regex(
r'data-react-class="routes/Product/components/ProgramContainer/ProgramContainer"[^>]+data-react-props="({.+?})"', r'data-react-class="routes/Product/components/ProgramContainer/ProgramContainer"[^>]+data-react-props="({.+?})"',
webpage, 'urplayer data'), video_id)['accessibleEpisodes'][0] webpage, 'urplayer data'), video_id)['accessibleEpisodes']
urplayer_data = next(e for e in accessible_episodes if e.get('id') == vid)
episode = urplayer_data['title'] episode = urplayer_data['title']
raw_streaming_info = urplayer_data['streamingInfo']['raw'] raw_streaming_info = urplayer_data['streamingInfo']['raw']
host = self._download_json( host = self._download_json(

View File

@ -23,6 +23,8 @@ class VGTVIE(XstreamIE):
'fvn.no/fvntv': 'fvntv', 'fvn.no/fvntv': 'fvntv',
'aftenposten.no/webtv': 'aptv', 'aftenposten.no/webtv': 'aptv',
'ap.vgtv.no/webtv': 'aptv', 'ap.vgtv.no/webtv': 'aptv',
'tv.aftonbladet.se': 'abtv',
# obsolete URL schemas, kept in order to save one HTTP redirect
'tv.aftonbladet.se/abtv': 'abtv', 'tv.aftonbladet.se/abtv': 'abtv',
'www.aftonbladet.se/tv': 'abtv', 'www.aftonbladet.se/tv': 'abtv',
} }
@ -140,6 +142,10 @@ class VGTVIE(XstreamIE):
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk', 'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://tv.aftonbladet.se/video/36015/vulkanutbrott-i-rymden-nu-slapper-nasa-bilderna',
'only_matching': True,
},
{ {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015', 'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'only_matching': True, 'only_matching': True,

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import base64 import base64
import functools import functools
import json
import re import re
import itertools import itertools
@ -17,14 +16,14 @@ from ..compat import (
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
dict_get,
ExtractorError, ExtractorError,
get_element_by_class,
js_to_json, js_to_json,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
OnDemandPagedList, OnDemandPagedList,
parse_filesize, parse_filesize,
RegexNotFoundError, parse_iso8601,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
std_headers, std_headers,
@ -74,25 +73,28 @@ class VimeoBaseInfoExtractor(InfoExtractor):
expected=True) expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _verify_video_password(self, url, video_id, webpage): def _get_video_password(self):
password = self._downloader.params.get('videopassword') password = self._downloader.params.get('videopassword')
if password is None: if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True) raise ExtractorError(
token, vuid = self._extract_xsrft_and_vuid(webpage) 'This video is protected by a password, use the --video-password option',
data = urlencode_postdata({ expected=True)
'password': password, return password
'token': token,
}) def _verify_video_password(self, url, video_id, password, token, vuid):
if url.startswith('http://'): if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url # vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://') url = url.replace('http://', 'https://')
password_request = sanitized_Request(url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Referer', url)
self._set_vimeo_cookie('vuid', vuid) self._set_vimeo_cookie('vuid', vuid)
return self._download_webpage( return self._download_webpage(
password_request, video_id, url + '/password', video_id, 'Verifying the password',
'Verifying the password', 'Wrong password') 'Wrong password', data=urlencode_postdata({
'password': password,
'token': token,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': url,
})
def _extract_xsrft_and_vuid(self, webpage): def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex( xsrft = self._search_regex(
@ -123,10 +125,11 @@ class VimeoBaseInfoExtractor(InfoExtractor):
video_title = video_data['title'] video_title = video_data['title']
live_event = video_data.get('live_event') or {} live_event = video_data.get('live_event') or {}
is_live = live_event.get('status') == 'started' is_live = live_event.get('status') == 'started'
request = config.get('request') or {}
formats = [] formats = []
config_files = video_data.get('files') or config['request'].get('files', {}) config_files = video_data.get('files') or request.get('files') or {}
for f in config_files.get('progressive', []): for f in (config_files.get('progressive') or []):
video_url = f.get('url') video_url = f.get('url')
if not video_url: if not video_url:
continue continue
@ -142,7 +145,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
# TODO: fix handling of 308 status code returned for live archive manifest requests # TODO: fix handling of 308 status code returned for live archive manifest requests
sep_pattern = r'/sep/video/' sep_pattern = r'/sep/video/'
for files_type in ('hls', 'dash'): for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items(): for cdn_name, cdn_data in (try_get(config_files, lambda x: x[files_type]['cdns']) or {}).items():
manifest_url = cdn_data.get('url') manifest_url = cdn_data.get('url')
if not manifest_url: if not manifest_url:
continue continue
@ -188,9 +191,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
f['preference'] = -40 f['preference'] = -40
subtitles = {} subtitles = {}
text_tracks = config['request'].get('text_tracks') for tt in (request.get('text_tracks') or []):
if text_tracks:
for tt in text_tracks:
subtitles[tt['lang']] = [{ subtitles[tt['lang']] = [{
'ext': 'vtt', 'ext': 'vtt',
'url': urljoin('https://vimeo.com', tt['url']), 'url': urljoin('https://vimeo.com', tt['url']),
@ -198,7 +199,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
thumbnails = [] thumbnails = []
if not is_live: if not is_live:
for key, thumb in video_data.get('thumbs', {}).items(): for key, thumb in (video_data.get('thumbs') or {}).items():
thumbnails.append({ thumbnails.append({
'id': key, 'id': key,
'width': int_or_none(key), 'width': int_or_none(key),
@ -278,7 +279,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
)? )?
(?:videos?/)? (?:videos?/)?
(?P<id>[0-9]+) (?P<id>[0-9]+)
(?:/[\da-f]+)? (?:/(?P<unlisted_hash>[\da-f]{10}))?
/?(?:[?&].*)?(?:[#].*)?$ /?(?:[?&].*)?(?:[#].*)?$
''' '''
IE_NAME = 'vimeo' IE_NAME = 'vimeo'
@ -318,6 +319,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 1595, 'duration': 1595,
'upload_date': '20130610', 'upload_date': '20130610',
'timestamp': 1370893156, 'timestamp': 1370893156,
'license': 'by',
}, },
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
@ -331,9 +333,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
'id': '54469442', 'id': '54469442',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012', 'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
'uploader': 'The BLN & Business of Software', 'uploader': 'Business of Software',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
'uploader_id': 'theblnbusinessofsoftware', 'uploader_id': 'businessofsoftware',
'duration': 3610, 'duration': 3610,
'description': None, 'description': None,
}, },
@ -396,6 +398,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'staff', 'uploader_id': 'staff',
'uploader': 'Vimeo Staff', 'uploader': 'Vimeo Staff',
'duration': 62, 'duration': 62,
'subtitles': {
'de': [{'ext': 'vtt'}],
'en': [{'ext': 'vtt'}],
'es': [{'ext': 'vtt'}],
'fr': [{'ext': 'vtt'}],
},
} }
}, },
{ {
@ -468,6 +476,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Unable to download JSON metadata'], 'expected_warnings': ['Unable to download JSON metadata'],
'skip': 'this page is no longer available.',
}, },
{ {
'url': 'http://player.vimeo.com/video/68375962', 'url': 'http://player.vimeo.com/video/68375962',
@ -550,9 +559,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
return urls[0] if urls else None return urls[0] if urls else None
def _verify_player_video_password(self, url, video_id, headers): def _verify_player_video_password(self, url, video_id, headers):
password = self._downloader.params.get('videopassword') password = self._get_video_password()
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
data = urlencode_postdata({ data = urlencode_postdata({
'password': base64.b64encode(password.encode()), 'password': base64.b64encode(password.encode()),
}) })
@ -569,6 +576,37 @@ class VimeoIE(VimeoBaseInfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _extract_from_api(self, video_id, unlisted_hash=None):
token = self._download_json(
'https://vimeo.com/_rv/jwt', video_id, headers={
'X-Requested-With': 'XMLHttpRequest'
})['token']
api_url = 'https://api.vimeo.com/videos/' + video_id
if unlisted_hash:
api_url += ':' + unlisted_hash
video = self._download_json(
api_url, video_id, headers={
'Authorization': 'jwt ' + token,
}, query={
'fields': 'config_url,created_time,description,license,metadata.connections.comments.total,metadata.connections.likes.total,release_time,stats.plays',
})
info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id)
self._vimeo_sort_formats(info['formats'])
get_timestamp = lambda x: parse_iso8601(video.get(x + '_time'))
info.update({
'description': video.get('description'),
'license': video.get('license'),
'release_timestamp': get_timestamp('release'),
'timestamp': get_timestamp('created'),
'view_count': int_or_none(try_get(video, lambda x: x['stats']['plays'])),
})
connections = try_get(
video, lambda x: x['metadata']['connections'], dict) or {}
for k in ('comment', 'like'):
info[k + '_count'] = int_or_none(try_get(connections, lambda x: x[k + 's']['total']))
return info
def _real_extract(self, url): def _real_extract(self, url):
url, data = unsmuggle_url(url, {}) url, data = unsmuggle_url(url, {})
headers = std_headers.copy() headers = std_headers.copy()
@ -577,22 +615,19 @@ class VimeoIE(VimeoBaseInfoExtractor):
if 'Referer' not in headers: if 'Referer' not in headers:
headers['Referer'] = url headers['Referer'] = url
channel_id = self._search_regex( mobj = re.match(self._VALID_URL, url).groupdict()
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None) video_id, unlisted_hash = mobj['id'], mobj.get('unlisted_hash')
if unlisted_hash:
return self._extract_from_api(video_id, unlisted_hash)
# Extract ID from URL
video_id = self._match_id(url)
orig_url = url orig_url = url
is_pro = 'vimeopro.com/' in url is_pro = 'vimeopro.com/' in url
is_player = '://player.vimeo.com/video/' in url
if is_pro: if is_pro:
# some videos require portfolio_id to be present in player url # some videos require portfolio_id to be present in player url
# https://github.com/ytdl-org/youtube-dl/issues/20070 # https://github.com/ytdl-org/youtube-dl/issues/20070
url = self._extract_url(url, self._download_webpage(url, video_id)) url = self._extract_url(url, self._download_webpage(url, video_id))
if not url: if not url:
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
elif is_player:
url = 'https://player.vimeo.com/video/' + video_id
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')): elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
@ -612,14 +647,25 @@ class VimeoIE(VimeoBaseInfoExtractor):
expected=True) expected=True)
raise raise
# Now we begin extracting as much information as we can from what we if '://player.vimeo.com/video/' in url:
# retrieved. First we extract the information common to all extractors, config = self._parse_json(self._search_regex(
# and latter we extract those that are Vimeo specific. r'\bconfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
self.report_extraction(video_id) if config.get('view') == 4:
config = self._verify_player_video_password(
redirect_url, video_id, headers)
info = self._parse_config(config, video_id)
self._vimeo_sort_formats(info['formats'])
return info
if re.search(r'<form[^>]+?id="pw_form"', webpage):
video_password = self._get_video_password()
token, vuid = self._extract_xsrft_and_vuid(webpage)
webpage = self._verify_video_password(
redirect_url, video_id, video_password, token, vuid)
vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None) vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
if vimeo_config: if vimeo_config:
seed_status = vimeo_config.get('seed_status', {}) seed_status = vimeo_config.get('seed_status') or {}
if seed_status.get('state') == 'failed': if seed_status.get('state') == 'failed':
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, seed_status['title']), '%s said: %s' % (self.IE_NAME, seed_status['title']),
@ -628,67 +674,40 @@ class VimeoIE(VimeoBaseInfoExtractor):
cc_license = None cc_license = None
timestamp = None timestamp = None
video_description = None video_description = None
info_dict = {}
# Extract the config JSON channel_id = self._search_regex(
try: r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
try: if channel_id:
config_url = self._html_search_regex( config_url = self._html_search_regex(
r' data-config-url="(.+?)"', webpage, r'\bdata-config-url="([^"]+)"', webpage, 'config URL')
'config URL', default=None) video_description = clean_html(get_element_by_class('description', webpage))
if not config_url: info_dict.update({
# Sometimes new react-based page is served instead of old one that require 'channel_id': channel_id,
# different config URL extraction approach (see 'channel_url': 'https://vimeo.com/channels/' + channel_id,
# https://github.com/ytdl-org/youtube-dl/pull/7209) })
else:
page_config = self._parse_json(self._search_regex( page_config = self._parse_json(self._search_regex(
r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});', r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
webpage, 'page config'), video_id) webpage, 'page config', default='{}'), video_id, fatal=False)
if not page_config:
return self._extract_from_api(video_id)
config_url = page_config['player']['config_url'] config_url = page_config['player']['config_url']
cc_license = page_config.get('cc_license') cc_license = page_config.get('cc_license')
timestamp = try_get( clip = page_config.get('clip') or {}
page_config, lambda x: x['clip']['uploaded_on'], timestamp = clip.get('uploaded_on')
compat_str) video_description = clean_html(
video_description = clean_html(dict_get( clip.get('description') or page_config.get('description_html_escaped'))
page_config, ('description', 'description_html_escaped')))
config = self._download_json(config_url, video_id) config = self._download_json(config_url, video_id)
except RegexNotFoundError:
# For pro videos or player.vimeo.com urls
# We try to find out to which variable is assigned the config dic
m_variable_name = re.search(r'(\w)\.video\.id', webpage)
if m_variable_name is not None:
config_re = [r'%s=({[^}].+?});' % re.escape(m_variable_name.group(1))]
else:
config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;')
config_re.append(r'\bconfig\s*=\s*({.+?})\s*;')
config = self._search_regex(config_re, webpage, 'info section',
flags=re.DOTALL)
config = json.loads(config)
except Exception as e:
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
if '_video_password_verified' in data:
raise ExtractorError('video password verification failed!')
self._verify_video_password(redirect_url, video_id, webpage)
return self._real_extract(
smuggle_url(redirect_url, {'_video_password_verified': 'verified'}))
else:
raise ExtractorError('Unable to extract info section',
cause=e)
else:
if config.get('view') == 4:
config = self._verify_player_video_password(redirect_url, video_id, headers)
video = config.get('video') or {} video = config.get('video') or {}
vod = video.get('vod') or {} vod = video.get('vod') or {}
def is_rented(): def is_rented():
if '>You rented this title.<' in webpage: if '>You rented this title.<' in webpage:
return True return True
if config.get('user', {}).get('purchased'): if try_get(config, lambda x: x['user']['purchased']):
return True return True
for purchase_option in vod.get('purchase_options', []): for purchase_option in (vod.get('purchase_options') or []):
if purchase_option.get('purchased'): if purchase_option.get('purchased'):
return True return True
label = purchase_option.get('label_string') label = purchase_option.get('label_string')
@ -703,14 +722,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
'https://player.vimeo.com/player/%s' % feature_id, 'https://player.vimeo.com/player/%s' % feature_id,
{'force_feature_id': True}), 'Vimeo') {'force_feature_id': True}), 'Vimeo')
# Extract video description
if not video_description:
video_description = self._html_search_regex(
r'(?s)<div\s+class="[^"]*description[^"]*"[^>]*>(.*?)</div>',
webpage, 'description', default=None)
if not video_description: if not video_description:
video_description = self._html_search_meta( video_description = self._html_search_meta(
'description', webpage, default=None) ['description', 'og:description', 'twitter:description'],
webpage, default=None)
if not video_description and is_pro: if not video_description and is_pro:
orig_webpage = self._download_webpage( orig_webpage = self._download_webpage(
orig_url, video_id, orig_url, video_id,
@ -719,25 +734,14 @@ class VimeoIE(VimeoBaseInfoExtractor):
if orig_webpage: if orig_webpage:
video_description = self._html_search_meta( video_description = self._html_search_meta(
'description', orig_webpage, default=None) 'description', orig_webpage, default=None)
if not video_description and not is_player: if not video_description:
self._downloader.report_warning('Cannot find video description') self._downloader.report_warning('Cannot find video description')
# Extract upload date
if not timestamp: if not timestamp:
timestamp = self._search_regex( timestamp = self._search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, r'<time[^>]+datetime="([^"]+)"', webpage,
'timestamp', default=None) 'timestamp', default=None)
try:
view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, 'view count'))
like_count = int(self._search_regex(r'UserLikes:(\d+)', webpage, 'like count'))
comment_count = int(self._search_regex(r'UserComments:(\d+)', webpage, 'comment count'))
except RegexNotFoundError:
# This info is only available in vimeo.com/{id} urls
view_count = None
like_count = None
comment_count = None
formats = [] formats = []
source_format = self._extract_original_format( source_format = self._extract_original_format(
@ -756,29 +760,20 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1', r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license') webpage, 'license', default=None, group='license')
channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None info_dict.update({
info_dict = {
'formats': formats, 'formats': formats,
'timestamp': unified_timestamp(timestamp), 'timestamp': unified_timestamp(timestamp),
'description': video_description, 'description': video_description,
'webpage_url': url, 'webpage_url': url,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'license': cc_license, 'license': cc_license,
'channel_id': channel_id, })
'channel_url': channel_url,
}
info_dict = merge_dicts(info_dict, info_dict_config, json_ld) return merge_dicts(info_dict, info_dict_config, json_ld)
return info_dict
class VimeoOndemandIE(VimeoIE): class VimeoOndemandIE(VimeoIE):
IE_NAME = 'vimeo:ondemand' IE_NAME = 'vimeo:ondemand'
_VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/([^/]+/)?(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/(?:[^/]+/)?(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# ondemand video not available via https://vimeo.com/id # ondemand video not available via https://vimeo.com/id
'url': 'https://vimeo.com/ondemand/20704', 'url': 'https://vimeo.com/ondemand/20704',
@ -939,11 +934,15 @@ class VimeoAlbumIE(VimeoBaseInfoExtractor):
} }
if hashed_pass: if hashed_pass:
query['_hashed_pass'] = hashed_pass query['_hashed_pass'] = hashed_pass
try:
videos = self._download_json( videos = self._download_json(
'https://api.vimeo.com/albums/%s/videos' % album_id, 'https://api.vimeo.com/albums/%s/videos' % album_id,
album_id, 'Downloading page %d' % api_page, query=query, headers={ album_id, 'Downloading page %d' % api_page, query=query, headers={
'Authorization': 'jwt ' + authorization, 'Authorization': 'jwt ' + authorization,
})['data'] })['data']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
return
for video in videos: for video in videos:
link = video.get('link') link = video.get('link')
if not link: if not link:
@ -1058,9 +1057,22 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
page_url, video_id = re.match(self._VALID_URL, url).groups() page_url, video_id = re.match(self._VALID_URL, url).groups()
clip_data = self._download_json( data = self._download_json(
page_url.replace('/review/', '/review/data/'), page_url.replace('/review/', '/review/data/'), video_id)
video_id)['clipData'] if data.get('isLocked') is True:
video_password = self._get_video_password()
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', video_id)
webpage = self._verify_video_password(
'https://vimeo.com/' + video_id, video_id,
video_password, viewer['xsrft'], viewer['vuid'])
clip_page_config = self._parse_json(self._search_regex(
r'window\.vimeo\.clip_page_config\s*=\s*({.+?});',
webpage, 'clip page config'), video_id)
config_url = clip_page_config['player']['config_url']
clip_data = clip_page_config.get('clip') or {}
else:
clip_data = data['clipData']
config_url = clip_data['configUrl'] config_url = clip_data['configUrl']
config = self._download_json(config_url, video_id) config = self._download_json(config_url, video_id)
info_dict = self._parse_config(config, video_id) info_dict = self._parse_config(config, video_id)

View File

@ -106,7 +106,7 @@ class VLiveIE(VLiveBaseIE):
raise ExtractorError('Unable to log in', expected=True) raise ExtractorError('Unable to log in', expected=True)
def _call_api(self, path_template, video_id, fields=None): def _call_api(self, path_template, video_id, fields=None):
query = {'appId': self._APP_ID, 'gcc': 'KR'} query = {'appId': self._APP_ID, 'gcc': 'KR', 'platformType': 'PC'}
if fields: if fields:
query['fields'] = fields query['fields'] = fields
try: try:

View File

@ -7,6 +7,8 @@ from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
try_get,
unified_timestamp,
) )
@ -19,14 +21,17 @@ class VoxMediaVolumeIE(OnceIE):
setup = self._parse_json(self._search_regex( setup = self._parse_json(self._search_regex(
r'setup\s*=\s*({.+});', webpage, 'setup'), video_id) r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
video_data = setup.get('video') or {} player_setup = setup.get('player_setup') or setup
video_data = player_setup.get('video') or {}
formatted_metadata = video_data.get('formatted_metadata') or {}
info = { info = {
'id': video_id, 'id': video_id,
'title': video_data.get('title_short'), 'title': player_setup.get('title') or video_data.get('title_short'),
'description': video_data.get('description_long') or video_data.get('description_short'), 'description': video_data.get('description_long') or video_data.get('description_short'),
'thumbnail': video_data.get('brightcove_thumbnail') 'thumbnail': formatted_metadata.get('thumbnail') or video_data.get('brightcove_thumbnail'),
'timestamp': unified_timestamp(formatted_metadata.get('video_publish_date')),
} }
asset = setup.get('asset') or setup.get('params') or {} asset = try_get(setup, lambda x: x['embed_assets']['chorus'], dict) or {}
formats = [] formats = []
hls_url = asset.get('hls_url') hls_url = asset.get('hls_url')
@ -47,6 +52,7 @@ class VoxMediaVolumeIE(OnceIE):
if formats: if formats:
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats info['formats'] = formats
info['duration'] = int_or_none(asset.get('duration'))
return info return info
for provider_video_type in ('ooyala', 'youtube', 'brightcove'): for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
@ -84,7 +90,7 @@ class VoxMediaIE(InfoExtractor):
}, { }, {
# Volume embed, Youtube # Volume embed, Youtube
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet', 'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
'md5': '4c8f4a0937752b437c3ebc0ed24802b5', 'md5': 'fd19aa0cf3a0eea515d4fd5c8c0e9d68',
'info_dict': { 'info_dict': {
'id': 'Gy8Md3Eky38', 'id': 'Gy8Md3Eky38',
'ext': 'mp4', 'ext': 'mp4',
@ -93,6 +99,7 @@ class VoxMediaIE(InfoExtractor):
'uploader_id': 'TheVerge', 'uploader_id': 'TheVerge',
'upload_date': '20141021', 'upload_date': '20141021',
'uploader': 'The Verge', 'uploader': 'The Verge',
'timestamp': 1413907200,
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'skip': 'similar to the previous test', 'skip': 'similar to the previous test',
@ -100,13 +107,13 @@ class VoxMediaIE(InfoExtractor):
# Volume embed, Youtube # Volume embed, Youtube
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill', 'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
'info_dict': { 'info_dict': {
'id': 'YCjDnX-Xzhg', 'id': '22986359b',
'ext': 'mp4', 'ext': 'mp4',
'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination", 'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
'description': 'md5:fc1317922057de31cd74bce91eb1c66c', 'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
'uploader_id': 'voxdotcom',
'upload_date': '20150915', 'upload_date': '20150915',
'uploader': 'Vox', 'timestamp': 1442332800,
'duration': 285,
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'skip': 'similar to the previous test', 'skip': 'similar to the previous test',
@ -160,6 +167,9 @@ class VoxMediaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Post-Post-PC CEO: The Full Code Conference Video of Microsoft\'s Satya Nadella', 'title': 'Post-Post-PC CEO: The Full Code Conference Video of Microsoft\'s Satya Nadella',
'description': 'The longtime veteran was chosen earlier this year as the software giant\'s third leader in its history.', 'description': 'The longtime veteran was chosen earlier this year as the software giant\'s third leader in its history.',
'timestamp': 1402938000,
'upload_date': '20140616',
'duration': 4114,
}, },
'add_ie': ['VoxMediaVolume'], 'add_ie': ['VoxMediaVolume'],
}] }]

View File

@ -75,12 +75,15 @@ class VVVVIDIE(InfoExtractor):
'https://www.vvvvid.it/user/login', 'https://www.vvvvid.it/user/login',
None, headers=self.geo_verification_headers())['data']['conn_id'] None, headers=self.geo_verification_headers())['data']['conn_id']
def _download_info(self, show_id, path, video_id, fatal=True): def _download_info(self, show_id, path, video_id, fatal=True, query=None):
q = {
'conn_id': self._conn_id,
}
if query:
q.update(query)
response = self._download_json( response = self._download_json(
'https://www.vvvvid.it/vvvvid/ondemand/%s/%s' % (show_id, path), 'https://www.vvvvid.it/vvvvid/ondemand/%s/%s' % (show_id, path),
video_id, headers=self.geo_verification_headers(), query={ video_id, headers=self.geo_verification_headers(), query=q, fatal=fatal)
'conn_id': self._conn_id,
}, fatal=fatal)
if not (response or fatal): if not (response or fatal):
return return
if response.get('result') == 'error': if response.get('result') == 'error':
@ -98,7 +101,8 @@ class VVVVIDIE(InfoExtractor):
show_id, season_id, video_id = re.match(self._VALID_URL, url).groups() show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
response = self._download_info( response = self._download_info(
show_id, 'season/%s' % season_id, video_id) show_id, 'season/%s' % season_id,
video_id, query={'video_id': video_id})
vid = int(video_id) vid = int(video_id)
video_data = list(filter( video_data = list(filter(
@ -178,8 +182,8 @@ class VVVVIDIE(InfoExtractor):
if not embed_code: if not embed_code:
continue continue
embed_code = ds(embed_code) embed_code = ds(embed_code)
if video_type in ('video/rcs', 'video/kenc'):
if video_type == 'video/kenc': if video_type == 'video/kenc':
embed_code = re.sub(r'https?(://[^/]+)/z/', r'https\1/i/', embed_code).replace('/manifest.f4m', '/master.m3u8')
kenc = self._download_json( kenc = self._download_json(
'https://www.vvvvid.it/kenc', video_id, query={ 'https://www.vvvvid.it/kenc', video_id, query={
'action': 'kt', 'action': 'kt',
@ -189,6 +193,9 @@ class VVVVIDIE(InfoExtractor):
kenc_message = kenc.get('message') kenc_message = kenc.get('message')
if kenc_message: if kenc_message:
embed_code += '?' + ds(kenc_message) embed_code += '?' + ds(kenc_message)
formats.extend(self._extract_m3u8_formats(
embed_code, video_id, 'mp4', m3u8_id='hls', fatal=False))
elif video_type == 'video/rcs':
formats.extend(self._extract_akamai_formats(embed_code, video_id)) formats.extend(self._extract_akamai_formats(embed_code, video_id))
elif video_type == 'video/youtube': elif video_type == 'video/youtube':
info.update({ info.update({
@ -247,9 +254,13 @@ class VVVVIDShowIE(VVVVIDIE):
show_info = self._download_info( show_info = self._download_info(
show_id, 'info/', show_title, fatal=False) show_id, 'info/', show_title, fatal=False)
if not show_title:
base_url += "/title"
entries = [] entries = []
for season in (seasons or []): for season in (seasons or []):
episodes = season.get('episodes') or [] episodes = season.get('episodes') or []
playlist_title = season.get('name') or show_info.get('title')
for episode in episodes: for episode in episodes:
if episode.get('playable') is False: if episode.get('playable') is False:
continue continue
@ -259,12 +270,13 @@ class VVVVIDShowIE(VVVVIDIE):
continue continue
info = self._extract_common_video_info(episode) info = self._extract_common_video_info(episode)
info.update({ info.update({
'_type': 'url', '_type': 'url_transparent',
'ie_key': VVVVIDIE.ie_key(), 'ie_key': VVVVIDIE.ie_key(),
'url': '/'.join([base_url, season_id, video_id]), 'url': '/'.join([base_url, season_id, video_id]),
'title': episode.get('title'), 'title': episode.get('title'),
'description': episode.get('description'), 'description': episode.get('description'),
'season_id': season_id, 'season_id': season_id,
'playlist_title': playlist_title,
}) })
entries.append(info) entries.append(info)

View File

@ -4,9 +4,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
unified_strdate, ExtractorError,
HEADRequest,
int_or_none, int_or_none,
try_get,
unified_strdate,
) )
@ -29,6 +30,7 @@ class WatIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['HTTP Error 404'], 'expected_warnings': ['HTTP Error 404'],
'skip': 'This content is no longer available',
}, },
{ {
'url': 'http://www.wat.tv/video/gregory-lemarchal-voix-ange-6z1v7_6ygkj_.html', 'url': 'http://www.wat.tv/video/gregory-lemarchal-voix-ange-6z1v7_6ygkj_.html',
@ -40,8 +42,10 @@ class WatIE(InfoExtractor):
'upload_date': '20140816', 'upload_date': '20140816',
}, },
'expected_warnings': ["Ce contenu n'est pas disponible pour l'instant."], 'expected_warnings': ["Ce contenu n'est pas disponible pour l'instant."],
'skip': 'This content is no longer available',
}, },
] ]
_GEO_BYPASS = False
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -49,71 +53,54 @@ class WatIE(InfoExtractor):
# 'contentv4' is used in the website, but it also returns the related # 'contentv4' is used in the website, but it also returns the related
# videos, we don't need them # videos, we don't need them
# video_data = self._download_json(
# 'http://www.wat.tv/interface/contentv4s/' + video_id, video_id)
video_data = self._download_json( video_data = self._download_json(
'http://www.wat.tv/interface/contentv4s/' + video_id, video_id) 'https://mediainfo.tf1.fr/mediainfocombo/' + video_id,
video_id, query={'context': 'MYTF1'})
video_info = video_data['media'] video_info = video_data['media']
error_desc = video_info.get('error_desc') error_desc = video_info.get('error_desc')
if error_desc: if error_desc:
self.report_warning( if video_info.get('error_code') == 'GEOBLOCKED':
'%s returned error: %s' % (self.IE_NAME, error_desc)) self.raise_geo_restricted(error_desc, video_info.get('geoList'))
raise ExtractorError(error_desc, expected=True)
chapters = video_info['chapters'] title = video_info['title']
if chapters:
first_chapter = chapters[0]
def video_id_for_chapter(chapter):
return chapter['tc_start'].split('-')[0]
if video_id_for_chapter(first_chapter) != video_id:
self.to_screen('Multipart video detected')
entries = [self.url_result('wat:%s' % video_id_for_chapter(chapter)) for chapter in chapters]
return self.playlist_result(entries, video_id, video_info['title'])
# Otherwise we can continue and extract just one part, we have to use
# the video id for getting the video url
else:
first_chapter = video_info
title = first_chapter['title']
def extract_url(path_template, url_type):
req_url = 'http://www.wat.tv/get/%s' % (path_template % video_id)
head = self._request_webpage(HEADRequest(req_url), video_id, 'Extracting %s url' % url_type, fatal=False)
if head:
red_url = head.geturl()
if req_url != red_url:
return red_url
return None
formats = [] formats = []
manifest_urls = self._download_json(
'http://www.wat.tv/get/webhtml/' + video_id, video_id)
m3u8_url = manifest_urls.get('hls')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
mpd_url = manifest_urls.get('mpd')
if mpd_url:
formats.extend(self._extract_mpd_formats(
mpd_url.replace('://das-q1.tf1.fr/', '://das-q1-ssl.tf1.fr/'),
video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
date_diffusion = first_chapter.get('date_diffusion') or video_data.get('configv4', {}).get('estatS4') def extract_formats(manifest_urls):
upload_date = unified_strdate(date_diffusion) if date_diffusion else None for f, f_url in manifest_urls.items():
duration = None if not f_url:
files = video_info['files'] continue
if files: if f in ('dash', 'mpd'):
duration = int_or_none(files[0].get('duration')) formats.extend(self._extract_mpd_formats(
f_url.replace('://das-q1.tf1.fr/', '://das-q1-ssl.tf1.fr/'),
video_id, mpd_id='dash', fatal=False))
elif f == 'hls':
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
delivery = video_data.get('delivery') or {}
extract_formats({delivery.get('format'): delivery.get('url')})
if not formats:
if delivery.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
manifest_urls = self._download_json(
'http://www.wat.tv/get/webhtml/' + video_id, video_id, fatal=False)
if manifest_urls:
extract_formats(manifest_urls)
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': first_chapter.get('preview'), 'thumbnail': video_info.get('preview'),
'description': first_chapter.get('description'), 'upload_date': unified_strdate(try_get(
'view_count': int_or_none(video_info.get('views')), video_data, lambda x: x['mediametrie']['chapters'][0]['estatS4'])),
'upload_date': upload_date, 'duration': int_or_none(video_info.get('duration')),
'duration': duration,
'formats': formats, 'formats': formats,
} }

View File

@ -58,6 +58,7 @@ class XFileShareIE(InfoExtractor):
(r'vidlocker\.xyz', 'VidLocker'), (r'vidlocker\.xyz', 'VidLocker'),
(r'vidshare\.tv', 'VidShare'), (r'vidshare\.tv', 'VidShare'),
(r'vup\.to', 'VUp'), (r'vup\.to', 'VUp'),
(r'wolfstream\.tv', 'WolfStream'),
(r'xvideosharing\.com', 'XVideoSharing'), (r'xvideosharing\.com', 'XVideoSharing'),
) )
@ -82,6 +83,9 @@ class XFileShareIE(InfoExtractor):
}, { }, {
'url': 'https://aparat.cam/n4d6dh0wvlpr', 'url': 'https://aparat.cam/n4d6dh0wvlpr',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://wolfstream.tv/nthme29v9u2x',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -11,6 +11,7 @@ from ..utils import (
parse_duration, parse_duration,
sanitized_Request, sanitized_Request,
str_to_int, str_to_int,
url_or_none,
) )
@ -87,10 +88,10 @@ class XTubeIE(InfoExtractor):
'Cookie': 'age_verified=1; cookiesAccepted=1', 'Cookie': 'age_verified=1; cookiesAccepted=1',
}) })
title, thumbnail, duration = [None] * 3 title, thumbnail, duration, sources, media_definition = [None] * 5
config = self._parse_json(self._search_regex( config = self._parse_json(self._search_regex(
r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf)', webpage, 'config', r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf|playerWrapper)', webpage, 'config',
default='{}'), video_id, transform_source=js_to_json, fatal=False) default='{}'), video_id, transform_source=js_to_json, fatal=False)
if config: if config:
config = config.get('mainRoll') config = config.get('mainRoll')
@ -99,20 +100,52 @@ class XTubeIE(InfoExtractor):
thumbnail = config.get('poster') thumbnail = config.get('poster')
duration = int_or_none(config.get('duration')) duration = int_or_none(config.get('duration'))
sources = config.get('sources') or config.get('format') sources = config.get('sources') or config.get('format')
media_definition = config.get('mediaDefinition')
if not isinstance(sources, dict): if not isinstance(sources, dict) and not media_definition:
sources = self._parse_json(self._search_regex( sources = self._parse_json(self._search_regex(
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),', r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id, webpage, 'sources', group='sources'), video_id,
transform_source=js_to_json) transform_source=js_to_json)
formats = [] formats = []
format_urls = set()
if isinstance(sources, dict):
for format_id, format_url in sources.items(): for format_id, format_url in sources.items():
format_url = url_or_none(format_url)
if not format_url:
continue
if format_url in format_urls:
continue
format_urls.add(format_url)
formats.append({ formats.append({
'url': format_url, 'url': format_url,
'format_id': format_id, 'format_id': format_id,
'height': int_or_none(format_id), 'height': int_or_none(format_id),
}) })
if isinstance(media_definition, list):
for media in media_definition:
video_url = url_or_none(media.get('videoUrl'))
if not video_url:
continue
if video_url in format_urls:
continue
format_urls.add(video_url)
format_id = media.get('format')
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif format_id == 'mp4':
height = int_or_none(media.get('quality'))
formats.append({
'url': video_url,
'format_id': '%s-%d' % (format_id, height) if height else format_id,
'height': height,
})
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -154,7 +154,7 @@ class YoukuIE(InfoExtractor):
# request basic data # request basic data
basic_data_params = { basic_data_params = {
'vid': video_id, 'vid': video_id,
'ccode': '0590', 'ccode': '0532',
'client_ip': '192.168.1.1', 'client_ip': '192.168.1.1',
'utid': cna, 'utid': cna,
'client_ts': time.time() / 1000, 'client_ts': time.time() / 1000,

View File

@ -25,6 +25,7 @@ class YouPornIE(InfoExtractor):
'title': 'Sex Ed: Is It Safe To Masturbate Daily?', 'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?', 'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 210,
'uploader': 'Ask Dan And Jennifer', 'uploader': 'Ask Dan And Jennifer',
'upload_date': '20101217', 'upload_date': '20101217',
'average_rating': int, 'average_rating': int,
@ -54,6 +55,7 @@ class YouPornIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': '404',
}, { }, {
'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/', 'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'only_matching': True, 'only_matching': True,
@ -153,6 +155,8 @@ class YouPornIE(InfoExtractor):
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1', r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1',
webpage, 'thumbnail', fatal=False, group='thumbnail') webpage, 'thumbnail', fatal=False, group='thumbnail')
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'duration', fatal=False))
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>', r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
@ -194,6 +198,7 @@ class YouPornIE(InfoExtractor):
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration,
'uploader': uploader, 'uploader': uploader,
'upload_date': upload_date, 'upload_date': upload_date,
'average_rating': average_rating, 'average_rating': average_rating,

View File

@ -24,6 +24,7 @@ from ..jsinterp import JSInterpreter
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html, clean_html,
dict_get,
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
@ -45,6 +46,10 @@ from ..utils import (
) )
def parse_qs(url):
return compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
class YoutubeBaseInfoExtractor(InfoExtractor): class YoutubeBaseInfoExtractor(InfoExtractor):
"""Provide base functions for Youtube extractors""" """Provide base functions for Youtube extractors"""
_LOGIN_URL = 'https://accounts.google.com/ServiceLogin' _LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
@ -60,11 +65,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)' _PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)'
def _ids_to_results(self, ids):
return [
self.url_result(vid_id, 'Youtube', video_id=vid_id)
for vid_id in ids]
def _login(self): def _login(self):
""" """
Attempt to log in to YouTube. Attempt to log in to YouTube.
@ -248,7 +248,23 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return True return True
def _initialize_consent(self):
cookies = self._get_cookies('https://www.youtube.com/')
if cookies.get('__Secure-3PSID'):
return
consent_id = None
consent = cookies.get('CONSENT')
if consent:
if 'YES' in consent.value:
return
consent_id = self._search_regex(
r'PENDING\+(\d+)', consent.value, 'consent', default=None)
if not consent_id:
consent_id = random.randint(100, 999)
self._set_cookie('.youtube.com', 'CONSENT', 'YES+cb.20210328-17-p0.en+FX+%s' % consent_id)
def _real_initialize(self): def _real_initialize(self):
self._initialize_consent()
if self._downloader is None: if self._downloader is None:
return return
if not self._login(): if not self._login():
@ -289,7 +305,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return self._parse_json( return self._parse_json(
self._search_regex( self._search_regex(
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg', r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
default='{}'), video_id, fatal=False) default='{}'), video_id, fatal=False) or {}
def _extract_video(self, renderer): def _extract_video(self, renderer):
video_id = renderer['videoId'] video_id = renderer['videoId']
@ -312,7 +328,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
(lambda x: x['ownerText']['runs'][0]['text'], (lambda x: x['ownerText']['runs'][0]['text'],
lambda x: x['shortBylineText']['runs'][0]['text']), compat_str) lambda x: x['shortBylineText']['runs'][0]['text']), compat_str)
return { return {
'_type': 'url_transparent', '_type': 'url',
'ie_key': YoutubeIE.ie_key(), 'ie_key': YoutubeIE.ie_key(),
'id': video_id, 'id': video_id,
'url': video_id, 'url': video_id,
@ -338,21 +354,28 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'(?:www\.)?invidious\.mastodon\.host', r'(?:www\.)?invidious\.mastodon\.host',
r'(?:www\.)?invidious\.zapashcanon\.fr', r'(?:www\.)?invidious\.zapashcanon\.fr',
r'(?:www\.)?invidious\.kavin\.rocks', r'(?:www\.)?invidious\.kavin\.rocks',
r'(?:www\.)?invidious\.tinfoil-hat\.net',
r'(?:www\.)?invidious\.himiko\.cloud',
r'(?:www\.)?invidious\.reallyancient\.tech',
r'(?:www\.)?invidious\.tube', r'(?:www\.)?invidious\.tube',
r'(?:www\.)?invidiou\.site', r'(?:www\.)?invidiou\.site',
r'(?:www\.)?invidious\.site', r'(?:www\.)?invidious\.site',
r'(?:www\.)?invidious\.xyz', r'(?:www\.)?invidious\.xyz',
r'(?:www\.)?invidious\.nixnet\.xyz', r'(?:www\.)?invidious\.nixnet\.xyz',
r'(?:www\.)?invidious\.048596\.xyz',
r'(?:www\.)?invidious\.drycat\.fr', r'(?:www\.)?invidious\.drycat\.fr',
r'(?:www\.)?inv\.skyn3t\.in',
r'(?:www\.)?tube\.poal\.co', r'(?:www\.)?tube\.poal\.co',
r'(?:www\.)?tube\.connect\.cafe', r'(?:www\.)?tube\.connect\.cafe',
r'(?:www\.)?vid\.wxzm\.sx', r'(?:www\.)?vid\.wxzm\.sx',
r'(?:www\.)?vid\.mint\.lgbt', r'(?:www\.)?vid\.mint\.lgbt',
r'(?:www\.)?vid\.puffyan\.us',
r'(?:www\.)?yewtu\.be', r'(?:www\.)?yewtu\.be',
r'(?:www\.)?yt\.elukerio\.org', r'(?:www\.)?yt\.elukerio\.org',
r'(?:www\.)?yt\.lelux\.fi', r'(?:www\.)?yt\.lelux\.fi',
r'(?:www\.)?invidious\.ggc-project\.de', r'(?:www\.)?invidious\.ggc-project\.de',
r'(?:www\.)?yt\.maisputain\.ovh', r'(?:www\.)?yt\.maisputain\.ovh',
r'(?:www\.)?ytprivate\.com',
r'(?:www\.)?invidious\.13ad\.de', r'(?:www\.)?invidious\.13ad\.de',
r'(?:www\.)?invidious\.toot\.koeln', r'(?:www\.)?invidious\.toot\.koeln',
r'(?:www\.)?invidious\.fdn\.fr', r'(?:www\.)?invidious\.fdn\.fr',
@ -397,15 +420,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
) )
)? # all until now is optional -> you can pass the naked ID )? # all until now is optional -> you can pass the naked ID
(?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID (?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?!.*?\blist=
(?:
%(playlist_id)s| # combined list/video URLs are handled by the playlist IE
WL # WL are handled by the watch later IE
)
)
(?(1).+)? # if we found the ID, everything can follow (?(1).+)? # if we found the ID, everything can follow
$""" % { $""" % {
'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE,
'invidious': '|'.join(_INVIDIOUS_SITES), 'invidious': '|'.join(_INVIDIOUS_SITES),
} }
_PLAYER_INFO_RE = ( _PLAYER_INFO_RE = (
@ -791,6 +807,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}, },
'skip': 'This video does not exist.', 'skip': 'This video does not exist.',
}, },
{
# Video with incomplete 'yt:stretch=16:'
'url': 'https://www.youtube.com/watch?v=FRhJzUSJbGI',
'only_matching': True,
},
{ {
# Video licensed under Creative Commons # Video licensed under Creative Commons
'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA', 'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA',
@ -1067,6 +1088,23 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtube.com/watch?v=nGC3D_FkCmg', 'url': 'https://www.youtube.com/watch?v=nGC3D_FkCmg',
'only_matching': True, 'only_matching': True,
}, },
{
# restricted location, https://github.com/ytdl-org/youtube-dl/issues/28685
'url': 'cBvYw8_A0vQ',
'info_dict': {
'id': 'cBvYw8_A0vQ',
'ext': 'mp4',
'title': '4K Ueno Okachimachi Street Scenes 上野御徒町歩き',
'description': 'md5:ea770e474b7cd6722b4c95b833c03630',
'upload_date': '20201120',
'uploader': 'Walk around Japan',
'uploader_id': 'UC3o_t8PzBmXf5S9b7GLx1Mw',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UC3o_t8PzBmXf5S9b7GLx1Mw',
},
'params': {
'skip_download': True,
},
},
] ]
_formats = { _formats = {
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'}, '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
@ -1174,6 +1212,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'}, '397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'},
} }
@classmethod
def suitable(cls, url):
# Hack for lazy extractors until more generic solution is implemented
# (see #28780)
from .youtube import parse_qs
qs = parse_qs(url)
if qs.get('list', [None])[0]:
return False
return super(YoutubeIE, cls).suitable(url)
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(YoutubeIE, self).__init__(*args, **kwargs) super(YoutubeIE, self).__init__(*args, **kwargs)
self._code_cache = {} self._code_cache = {}
@ -1431,7 +1479,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
base_url = self.http_scheme() + '//www.youtube.com/' base_url = self.http_scheme() + '//www.youtube.com/'
webpage_url = base_url + 'watch?v=' + video_id webpage_url = base_url + 'watch?v=' + video_id
webpage = self._download_webpage( webpage = self._download_webpage(
webpage_url + '&bpctr=9999999999', video_id, fatal=False) webpage_url + '&bpctr=9999999999&has_verified=1', video_id, fatal=False)
player_response = None player_response = None
if webpage: if webpage:
@ -1450,7 +1498,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'Refetching age-gated info webpage', 'Refetching age-gated info webpage',
'unable to download video info webpage', query={ 'unable to download video info webpage', query={
'video_id': video_id, 'video_id': video_id,
'eurl': 'https://www.youtube.com/embed/' + video_id, 'eurl': 'https://youtube.googleapis.com/v/' + video_id,
}, fatal=False)), }, fatal=False)),
lambda x: x['player_response'][0], lambda x: x['player_response'][0],
compat_str) or '{}', video_id) compat_str) or '{}', video_id)
@ -1468,7 +1516,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def get_text(x): def get_text(x):
if not x: if not x:
return return
return x.get('simpleText') or ''.join([r['text'] for r in x['runs']]) text = x.get('simpleText')
if text and isinstance(text, compat_str):
return text
runs = x.get('runs')
if not isinstance(runs, list):
return
return ''.join([r['text'] for r in runs if isinstance(r.get('text'), compat_str)])
search_meta = ( search_meta = (
lambda x: self._html_search_meta(x, webpage, default=None)) \ lambda x: self._html_search_meta(x, webpage, default=None)) \
@ -1617,7 +1671,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
f['format_id'] = itag f['format_id'] = itag
formats.append(f) formats.append(f)
if self._downloader.params.get('youtube_include_dash_manifest'): if self._downloader.params.get('youtube_include_dash_manifest', True):
dash_manifest_url = streaming_data.get('dashManifestUrl') dash_manifest_url = streaming_data.get('dashManifestUrl')
if dash_manifest_url: if dash_manifest_url:
for f in self._extract_mpd_formats( for f in self._extract_mpd_formats(
@ -1666,13 +1720,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
for m in re.finditer(self._meta_regex('og:video:tag'), webpage)] for m in re.finditer(self._meta_regex('og:video:tag'), webpage)]
for keyword in keywords: for keyword in keywords:
if keyword.startswith('yt:stretch='): if keyword.startswith('yt:stretch='):
w, h = keyword.split('=')[1].split(':') mobj = re.search(r'(\d+)\s*:\s*(\d+)', keyword)
w, h = int(w), int(h) if mobj:
# NB: float is intentional for forcing float division
w, h = (float(v) for v in mobj.groups())
if w > 0 and h > 0: if w > 0 and h > 0:
ratio = w / h ratio = w / h
for f in formats: for f in formats:
if f.get('vcodec') != 'none': if f.get('vcodec') != 'none':
f['stretched_ratio'] = ratio f['stretched_ratio'] = ratio
break
thumbnails = [] thumbnails = []
for container in (video_details, microformat): for container in (video_details, microformat):
@ -1895,7 +1952,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
info['channel'] = get_text(try_get( info['channel'] = get_text(try_get(
vsir, vsir,
lambda x: x['owner']['videoOwnerRenderer']['title'], lambda x: x['owner']['videoOwnerRenderer']['title'],
compat_str)) dict))
rows = try_get( rows = try_get(
vsir, vsir,
lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'], lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'],
@ -1942,7 +1999,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
invidio\.us invidio\.us
)/ )/
(?: (?:
(?:channel|c|user|feed)/| (?:channel|c|user|feed|hashtag)/|
(?:playlist|watch)\?.*?\blist=| (?:playlist|watch)\?.*?\blist=|
(?!(?:watch|embed|v|e)\b) (?!(?:watch|embed|v|e)\b)
) )
@ -1968,6 +2025,15 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'title': 'Игорь Клейнер - Playlists', 'title': 'Игорь Клейнер - Playlists',
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2', 'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
}, },
}, {
# playlists, series
'url': 'https://www.youtube.com/c/3blue1brown/playlists?view=50&sort=dd&shelf_id=3',
'playlist_mincount': 5,
'info_dict': {
'id': 'UCYO_jab_esuFRV4b17AJtAw',
'title': '3Blue1Brown - Playlists',
'description': 'md5:e1384e8a133307dd10edee76e875d62f',
},
}, { }, {
# playlists, singlepage # playlists, singlepage
'url': 'https://www.youtube.com/user/ThirstForScience/playlists', 'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
@ -2228,6 +2294,16 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
}, { }, {
'url': 'https://www.youtube.com/TheYoungTurks/live', 'url': 'https://www.youtube.com/TheYoungTurks/live',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.youtube.com/hashtag/cctv9',
'info_dict': {
'id': 'cctv9',
'title': '#cctv9',
},
'playlist_mincount': 350,
}, {
'url': 'https://www.youtube.com/watch?list=PLW4dVinRY435CBE_JD3t-0SRXKfnZHS1P&feature=youtu.be&v=M9cJMXmQ_ZU',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -2250,9 +2326,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
@staticmethod @staticmethod
def _extract_grid_item_renderer(item): def _extract_grid_item_renderer(item):
for item_kind in ('Playlist', 'Video', 'Channel'): assert isinstance(item, dict)
renderer = item.get('grid%sRenderer' % item_kind) for key, renderer in item.items():
if renderer: if not key.startswith('grid') or not key.endswith('Renderer'):
continue
if not isinstance(renderer, dict):
continue
return renderer return renderer
def _grid_entries(self, grid_renderer): def _grid_entries(self, grid_renderer):
@ -2263,7 +2342,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if not isinstance(renderer, dict): if not isinstance(renderer, dict):
continue continue
title = try_get( title = try_get(
renderer, lambda x: x['title']['runs'][0]['text'], compat_str) renderer, (lambda x: x['title']['runs'][0]['text'],
lambda x: x['title']['simpleText']), compat_str)
# playlist # playlist
playlist_id = renderer.get('playlistId') playlist_id = renderer.get('playlistId')
if playlist_id: if playlist_id:
@ -2271,10 +2351,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'https://www.youtube.com/playlist?list=%s' % playlist_id, 'https://www.youtube.com/playlist?list=%s' % playlist_id,
ie=YoutubeTabIE.ie_key(), video_id=playlist_id, ie=YoutubeTabIE.ie_key(), video_id=playlist_id,
video_title=title) video_title=title)
continue
# video # video
video_id = renderer.get('videoId') video_id = renderer.get('videoId')
if video_id: if video_id:
yield self._extract_video(renderer) yield self._extract_video(renderer)
continue
# channel # channel
channel_id = renderer.get('channelId') channel_id = renderer.get('channelId')
if channel_id: if channel_id:
@ -2283,6 +2365,17 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
yield self.url_result( yield self.url_result(
'https://www.youtube.com/channel/%s' % channel_id, 'https://www.youtube.com/channel/%s' % channel_id,
ie=YoutubeTabIE.ie_key(), video_title=title) ie=YoutubeTabIE.ie_key(), video_title=title)
continue
# generic endpoint URL support
ep_url = urljoin('https://www.youtube.com/', try_get(
renderer, lambda x: x['navigationEndpoint']['commandMetadata']['webCommandMetadata']['url'],
compat_str))
if ep_url:
for ie in (YoutubeTabIE, YoutubePlaylistIE, YoutubeIE):
if ie.suitable(ep_url):
yield self.url_result(
ep_url, ie=ie.ie_key(), video_id=ie._match_id(ep_url), video_title=title)
break
def _shelf_entries_from_content(self, shelf_renderer): def _shelf_entries_from_content(self, shelf_renderer):
content = shelf_renderer.get('content') content = shelf_renderer.get('content')
@ -2375,6 +2468,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
for entry in self._post_thread_entries(renderer): for entry in self._post_thread_entries(renderer):
yield entry yield entry
def _rich_grid_entries(self, contents):
for content in contents:
video_renderer = try_get(content, lambda x: x['richItemRenderer']['content']['videoRenderer'], dict)
if video_renderer:
entry = self._video_entry(video_renderer)
if entry:
yield entry
@staticmethod @staticmethod
def _build_continuation_query(continuation, ctp=None): def _build_continuation_query(continuation, ctp=None):
query = { query = {
@ -2420,13 +2521,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
ctp = continuation_ep.get('clickTrackingParams') ctp = continuation_ep.get('clickTrackingParams')
return YoutubeTabIE._build_continuation_query(continuation, ctp) return YoutubeTabIE._build_continuation_query(continuation, ctp)
def _entries(self, tab, identity_token): def _entries(self, tab, item_id, webpage):
tab_content = try_get(tab, lambda x: x['content'], dict) tab_content = try_get(tab, lambda x: x['content'], dict)
if not tab_content: if not tab_content:
return return
slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict) slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict)
if not slr_renderer: if slr_renderer:
return
is_channels_tab = tab.get('title') == 'Channels' is_channels_tab = tab.get('title') == 'Channels'
continuation = None continuation = None
slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or [] slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or []
@ -2471,31 +2571,61 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if not continuation: if not continuation:
continuation = self._extract_continuation(is_renderer) continuation = self._extract_continuation(is_renderer)
if not continuation: if not continuation:
continuation = self._extract_continuation(slr_renderer) continuation = self._extract_continuation(slr_renderer)
else:
rich_grid_renderer = tab_content.get('richGridRenderer')
if not rich_grid_renderer:
return
for entry in self._rich_grid_entries(rich_grid_renderer.get('contents') or []):
yield entry
continuation = self._extract_continuation(rich_grid_renderer)
ytcfg = self._extract_ytcfg(item_id, webpage)
client_version = try_get(
ytcfg, lambda x: x['INNERTUBE_CLIENT_VERSION'], compat_str) or '2.20210407.08.00'
headers = { headers = {
'x-youtube-client-name': '1', 'x-youtube-client-name': '1',
'x-youtube-client-version': '2.20201112.04.01', 'x-youtube-client-version': client_version,
'content-type': 'application/json',
} }
context = try_get(ytcfg, lambda x: x['INNERTUBE_CONTEXT'], dict) or {
'client': {
'clientName': 'WEB',
'clientVersion': client_version,
}
}
visitor_data = try_get(context, lambda x: x['client']['visitorData'], compat_str)
identity_token = self._extract_identity_token(ytcfg, webpage)
if identity_token: if identity_token:
headers['x-youtube-identity-token'] = identity_token headers['x-youtube-identity-token'] = identity_token
data = {
'context': context,
}
for page_num in itertools.count(1): for page_num in itertools.count(1):
if not continuation: if not continuation:
break break
if visitor_data:
headers['x-goog-visitor-id'] = visitor_data
data['continuation'] = continuation['continuation']
data['clickTracking'] = {
'clickTrackingParams': continuation['itct']
}
count = 0 count = 0
retries = 3 retries = 3
while count <= retries: while count <= retries:
try: try:
# Downloading page may result in intermittent 5xx HTTP error # Downloading page may result in intermittent 5xx HTTP error
# that is usually worked around with a retry # that is usually worked around with a retry
browse = self._download_json( response = self._download_json(
'https://www.youtube.com/browse_ajax', None, 'https://www.youtube.com/youtubei/v1/browse?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'Downloading page %d%s' None, 'Downloading page %d%s' % (page_num, ' (retry #%d)' % count if count else ''),
% (page_num, ' (retry #%d)' % count if count else ''), headers=headers, data=json.dumps(data).encode('utf8'))
headers=headers, query=continuation)
break break
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503): if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
@ -2503,12 +2633,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if count <= retries: if count <= retries:
continue continue
raise raise
if not browse:
break
response = try_get(browse, lambda x: x[1]['response'], dict)
if not response: if not response:
break break
visitor_data = try_get(
response, lambda x: x['responseContext']['visitorData'], compat_str) or visitor_data
continuation_contents = try_get( continuation_contents = try_get(
response, lambda x: x['continuationContents'], dict) response, lambda x: x['continuationContents'], dict)
if continuation_contents: if continuation_contents:
@ -2531,13 +2661,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
continuation = self._extract_continuation(continuation_renderer) continuation = self._extract_continuation(continuation_renderer)
continue continue
on_response_received = dict_get(response, ('onResponseReceivedActions', 'onResponseReceivedEndpoints'))
continuation_items = try_get( continuation_items = try_get(
response, lambda x: x['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems'], list) on_response_received, lambda x: x[0]['appendContinuationItemsAction']['continuationItems'], list)
if continuation_items: if continuation_items:
continuation_item = continuation_items[0] continuation_item = continuation_items[0]
if not isinstance(continuation_item, dict): if not isinstance(continuation_item, dict):
continue continue
renderer = continuation_item.get('gridVideoRenderer') renderer = self._extract_grid_item_renderer(continuation_item)
if renderer: if renderer:
grid_renderer = {'items': continuation_items} grid_renderer = {'items': continuation_items}
for entry in self._grid_entries(grid_renderer): for entry in self._grid_entries(grid_renderer):
@ -2551,6 +2682,19 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
yield entry yield entry
continuation = self._extract_continuation(video_list_renderer) continuation = self._extract_continuation(video_list_renderer)
continue continue
renderer = continuation_item.get('backstagePostThreadRenderer')
if renderer:
continuation_renderer = {'contents': continuation_items}
for entry in self._post_thread_continuation_entries(continuation_renderer):
yield entry
continuation = self._extract_continuation(continuation_renderer)
continue
renderer = continuation_item.get('richItemRenderer')
if renderer:
for entry in self._rich_grid_entries(continuation_items):
yield entry
continuation = self._extract_continuation({'contents': continuation_items})
continue
break break
@ -2603,11 +2747,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
alerts.append(text) alerts.append(text)
return '\n'.join(alerts) return '\n'.join(alerts)
def _extract_from_tabs(self, item_id, webpage, data, tabs, identity_token): def _extract_from_tabs(self, item_id, webpage, data, tabs):
selected_tab = self._extract_selected_tab(tabs) selected_tab = self._extract_selected_tab(tabs)
renderer = try_get( renderer = try_get(
data, lambda x: x['metadata']['channelMetadataRenderer'], dict) data, lambda x: x['metadata']['channelMetadataRenderer'], dict)
playlist_id = title = description = None playlist_id = item_id
title = description = None
if renderer: if renderer:
channel_title = renderer.get('title') or item_id channel_title = renderer.get('title') or item_id
tab_title = selected_tab.get('title') tab_title = selected_tab.get('title')
@ -2616,14 +2761,18 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
title += ' - %s' % tab_title title += ' - %s' % tab_title
description = renderer.get('description') description = renderer.get('description')
playlist_id = renderer.get('externalId') playlist_id = renderer.get('externalId')
else:
renderer = try_get( renderer = try_get(
data, lambda x: x['metadata']['playlistMetadataRenderer'], dict) data, lambda x: x['metadata']['playlistMetadataRenderer'], dict)
if renderer: if renderer:
title = renderer.get('title') title = renderer.get('title')
description = None else:
playlist_id = item_id renderer = try_get(
data, lambda x: x['header']['hashtagHeaderRenderer'], dict)
if renderer:
title = try_get(renderer, lambda x: x['hashtag']['simpleText'])
playlist = self.playlist_result( playlist = self.playlist_result(
self._entries(selected_tab, identity_token), self._entries(selected_tab, item_id, webpage),
playlist_id=playlist_id, playlist_title=title, playlist_id=playlist_id, playlist_title=title,
playlist_description=description) playlist_description=description)
playlist.update(self._extract_uploader(data)) playlist.update(self._extract_uploader(data))
@ -2647,8 +2796,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
self._playlist_entries(playlist), playlist_id=playlist_id, self._playlist_entries(playlist), playlist_id=playlist_id,
playlist_title=title) playlist_title=title)
def _extract_identity_token(self, webpage, item_id): def _extract_identity_token(self, ytcfg, webpage):
ytcfg = self._extract_ytcfg(item_id, webpage)
if ytcfg: if ytcfg:
token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str) token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str)
if token: if token:
@ -2662,7 +2810,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
url = compat_urlparse.urlunparse( url = compat_urlparse.urlunparse(
compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com')) compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com'))
# Handle both video/playlist URLs # Handle both video/playlist URLs
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = parse_qs(url)
video_id = qs.get('v', [None])[0] video_id = qs.get('v', [None])[0]
playlist_id = qs.get('list', [None])[0] playlist_id = qs.get('list', [None])[0]
if video_id and playlist_id: if video_id and playlist_id:
@ -2671,12 +2819,11 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id) return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id)) self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
webpage = self._download_webpage(url, item_id) webpage = self._download_webpage(url, item_id)
identity_token = self._extract_identity_token(webpage, item_id)
data = self._extract_yt_initial_data(item_id, webpage) data = self._extract_yt_initial_data(item_id, webpage)
tabs = try_get( tabs = try_get(
data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list) data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list)
if tabs: if tabs:
return self._extract_from_tabs(item_id, webpage, data, tabs, identity_token) return self._extract_from_tabs(item_id, webpage, data, tabs)
playlist = try_get( playlist = try_get(
data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict) data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict)
if playlist: if playlist:
@ -2759,12 +2906,19 @@ class YoutubePlaylistIE(InfoExtractor):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if YoutubeTabIE.suitable(url) else super( if YoutubeTabIE.suitable(url):
YoutubePlaylistIE, cls).suitable(url) return False
# Hack for lazy extractors until more generic solution is implemented
# (see #28780)
from .youtube import parse_qs
qs = parse_qs(url)
if qs.get('v', [None])[0]:
return False
return super(YoutubePlaylistIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = parse_qs(url)
if not qs: if not qs:
qs = {'list': playlist_id} qs = {'list': playlist_id}
return self.url_result( return self.url_result(

View File

@ -7,7 +7,9 @@ from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none,
int_or_none, int_or_none,
merge_dicts,
NO_DEFAULT, NO_DEFAULT,
orderedSet, orderedSet,
parse_codecs, parse_codecs,
@ -21,49 +23,17 @@ from ..utils import (
class ZDFBaseIE(InfoExtractor): class ZDFBaseIE(InfoExtractor):
def _call_api(self, url, player, referrer, video_id, item):
return self._download_json(
url, video_id, 'Downloading JSON %s' % item,
headers={
'Referer': referrer,
'Api-Auth': 'Bearer %s' % player['apiToken'],
})
def _extract_player(self, webpage, video_id, fatal=True):
return self._parse_json(
self._search_regex(
r'(?s)data-zdfplayer-jsb=(["\'])(?P<json>{.+?})\1', webpage,
'player JSON', default='{}' if not fatal else NO_DEFAULT,
group='json'),
video_id)
class ZDFIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html'
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd')
_GEO_COUNTRIES = ['DE'] _GEO_COUNTRIES = ['DE']
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd')
_TESTS = [{ def _call_api(self, url, video_id, item, api_token=None, referrer=None):
'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html', headers = {}
'info_dict': { if api_token:
'id': 'die-magie-der-farben-von-koenigspurpur-und-jeansblau-100', headers['Api-Auth'] = 'Bearer %s' % api_token
'ext': 'mp4', if referrer:
'title': 'Die Magie der Farben (2/2)', headers['Referer'] = referrer
'description': 'md5:a89da10c928c6235401066b60a6d5c1a', return self._download_json(
'duration': 2615, url, video_id, 'Downloading JSON %s' % item, headers=headers)
'timestamp': 1465021200,
'upload_date': '20160604',
},
}, {
'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
'only_matching': True,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html',
'only_matching': True,
}, {
'url': 'https://www.zdf.de/dokumentation/planet-e/planet-e-uebersichtsseite-weitere-dokumentationen-von-planet-e-100.html',
'only_matching': True,
}]
@staticmethod @staticmethod
def _extract_subtitles(src): def _extract_subtitles(src):
@ -109,20 +79,11 @@ class ZDFIE(ZDFBaseIE):
}) })
formats.append(f) formats.append(f)
def _extract_entry(self, url, player, content, video_id): def _extract_ptmd(self, ptmd_url, video_id, api_token, referrer):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
if not ptmd_path:
ptmd_path = t[
'http://zdf.de/rels/streams/ptmd-template'].replace(
'{playerId}', 'ngplayer_2_4')
ptmd = self._call_api( ptmd = self._call_api(
urljoin(url, ptmd_path), player, url, video_id, 'metadata') ptmd_url, video_id, 'metadata', api_token, referrer)
content_id = ptmd.get('basename') or ptmd_url.split('/')[-1]
formats = [] formats = []
track_uris = set() track_uris = set()
@ -140,7 +101,7 @@ class ZDFIE(ZDFBaseIE):
continue continue
for track in tracks: for track in tracks:
self._extract_format( self._extract_format(
video_id, formats, track_uris, { content_id, formats, track_uris, {
'url': track.get('uri'), 'url': track.get('uri'),
'type': f.get('type'), 'type': f.get('type'),
'mimeType': f.get('mimeType'), 'mimeType': f.get('mimeType'),
@ -149,6 +110,103 @@ class ZDFIE(ZDFBaseIE):
}) })
self._sort_formats(formats) self._sort_formats(formats)
duration = float_or_none(try_get(
ptmd, lambda x: x['attributes']['duration']['value']), scale=1000)
return {
'extractor_key': ZDFIE.ie_key(),
'id': content_id,
'duration': duration,
'formats': formats,
'subtitles': self._extract_subtitles(ptmd),
}
def _extract_player(self, webpage, video_id, fatal=True):
return self._parse_json(
self._search_regex(
r'(?s)data-zdfplayer-jsb=(["\'])(?P<json>{.+?})\1', webpage,
'player JSON', default='{}' if not fatal else NO_DEFAULT,
group='json'),
video_id)
class ZDFIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
_TESTS = [{
# Same as https://www.phoenix.de/sendungen/ereignisse/corona-nachgehakt/wohin-fuehrt-der-protest-in-der-pandemie-a-2050630.html
'url': 'https://www.zdf.de/politik/phoenix-sendungen/wohin-fuehrt-der-protest-in-der-pandemie-100.html',
'md5': '34ec321e7eb34231fd88616c65c92db0',
'info_dict': {
'id': '210222_phx_nachgehakt_corona_protest',
'ext': 'mp4',
'title': 'Wohin führt der Protest in der Pandemie?',
'description': 'md5:7d643fe7f565e53a24aac036b2122fbd',
'duration': 1691,
'timestamp': 1613948400,
'upload_date': '20210221',
},
}, {
# Same as https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html
'url': 'https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html',
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
'info_dict': {
'id': '141007_ab18_10wochensommer_film',
'ext': 'mp4',
'title': 'Ab 18! - 10 Wochen Sommer',
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26',
'duration': 2660,
'timestamp': 1608604200,
'upload_date': '20201222',
},
}, {
'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html',
'info_dict': {
'id': '151025_magie_farben2_tex',
'ext': 'mp4',
'title': 'Die Magie der Farben (2/2)',
'description': 'md5:a89da10c928c6235401066b60a6d5c1a',
'duration': 2615,
'timestamp': 1465021200,
'upload_date': '20160604',
},
}, {
# Same as https://www.phoenix.de/sendungen/dokumentationen/gesten-der-maechtigen-i-a-89468.html?ref=suche
'url': 'https://www.zdf.de/politik/phoenix-sendungen/die-gesten-der-maechtigen-100.html',
'only_matching': True,
}, {
# Same as https://www.3sat.de/film/spielfilm/der-hauptmann-100.html
'url': 'https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html',
'only_matching': True,
}, {
# Same as https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html, equal media ids
'url': 'https://www.zdf.de/wissen/nano/nano-21-mai-2019-102.html',
'only_matching': True,
}, {
'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
'only_matching': True,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html',
'only_matching': True,
}, {
'url': 'https://www.zdf.de/dokumentation/planet-e/planet-e-uebersichtsseite-weitere-dokumentationen-von-planet-e-100.html',
'only_matching': True,
}]
def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
if not ptmd_path:
ptmd_path = t[
'http://zdf.de/rels/streams/ptmd-template'].replace(
'{playerId}', 'ngplayer_2_4')
info = self._extract_ptmd(
urljoin(url, ptmd_path), video_id, player['apiToken'], url)
thumbnails = [] thumbnails = []
layouts = try_get( layouts = try_get(
content, lambda x: x['teaserImageRef']['layouts'], dict) content, lambda x: x['teaserImageRef']['layouts'], dict)
@ -169,33 +227,33 @@ class ZDFIE(ZDFBaseIE):
}) })
thumbnails.append(thumbnail) thumbnails.append(thumbnail)
return { return merge_dicts(info, {
'id': video_id,
'title': title, 'title': title,
'description': content.get('leadParagraph') or content.get('teasertext'), 'description': content.get('leadParagraph') or content.get('teasertext'),
'duration': int_or_none(t.get('duration')), 'duration': int_or_none(t.get('duration')),
'timestamp': unified_timestamp(content.get('editorialDate')), 'timestamp': unified_timestamp(content.get('editorialDate')),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'subtitles': self._extract_subtitles(ptmd), })
'formats': formats,
}
def _extract_regular(self, url, player, video_id): def _extract_regular(self, url, player, video_id):
content = self._call_api( content = self._call_api(
player['content'], player, url, video_id, 'content') player['content'], video_id, 'content', player['apiToken'], url)
return self._extract_entry(player['content'], player, content, video_id) return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id): def _extract_mobile(self, video_id):
document = self._download_json( video = self._download_json(
'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id, 'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id,
video_id)['document'] video_id)
document = video['document']
title = document['titel'] title = document['titel']
content_id = document['basename']
formats = [] formats = []
format_urls = set() format_urls = set()
for f in document['formitaeten']: for f in document['formitaeten']:
self._extract_format(video_id, formats, format_urls, f) self._extract_format(content_id, formats, format_urls, f)
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [] thumbnails = []
@ -213,12 +271,12 @@ class ZDFIE(ZDFBaseIE):
}) })
return { return {
'id': video_id, 'id': content_id,
'title': title, 'title': title,
'description': document.get('beschreibung'), 'description': document.get('beschreibung'),
'duration': int_or_none(document.get('length')), 'duration': int_or_none(document.get('length')),
'timestamp': unified_timestamp(try_get( 'timestamp': unified_timestamp(document.get('date')) or unified_timestamp(
document, lambda x: x['meta']['editorialDate'], compat_str)), try_get(video, lambda x: x['meta']['editorialDate'], compat_str)),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'subtitles': self._extract_subtitles(document), 'subtitles': self._extract_subtitles(document),
'formats': formats, 'formats': formats,

View File

@ -1,93 +1,94 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
update_url_query,
) )
class ZingMp3BaseInfoExtractor(InfoExtractor): class ZingMp3BaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:mp3\.zing|zingmp3)\.vn/(?:%s)/[^/]+/(?P<id>\w+)\.html'
_GEO_COUNTRIES = ['VN']
def _extract_item(self, item, page_type, fatal=True): def _extract_item(self, item, fatal):
error_message = item.get('msg') item_id = item['id']
if error_message: title = item.get('name') or item['title']
if not fatal:
return
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message),
expected=True)
formats = [] formats = []
for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])): for k, v in (item.get('source') or {}).items():
if not source_url or source_url == 'require vip': if not v:
continue continue
if not re.match(r'https?://', source_url): if k in ('mp4', 'hls'):
source_url = '//' + source_url for res, video_url in v.items():
source_url = self._proto_relative_url(source_url, 'http:') if not video_url:
quality_num = int_or_none(quality) continue
f = { if k == 'hls':
'format_id': quality, formats.extend(self._extract_m3u8_formats(
'url': source_url, video_url, item_id, 'mp4',
} 'm3u8_native', m3u8_id=k, fatal=False))
if page_type == 'video': elif k == 'mp4':
f.update({ formats.append({
'height': quality_num, 'format_id': 'mp4-' + res,
'ext': 'mp4', 'url': video_url,
'height': int_or_none(self._search_regex(
r'^(\d+)p', res, 'resolution', default=None)),
}) })
else: else:
f.update({ formats.append({
'abr': quality_num,
'ext': 'mp3', 'ext': 'mp3',
'format_id': k,
'tbr': int_or_none(k),
'url': self._proto_relative_url(v),
'vcodec': 'none',
}) })
formats.append(f) if not formats:
if not fatal:
return
msg = item['msg']
if msg == 'Sorry, this content is not available in your country.':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(msg, expected=True)
self._sort_formats(formats)
cover = item.get('cover') subtitles = None
lyric = item.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}],
}
album = item.get('album') or {}
return { return {
'title': (item.get('name') or item.get('title')).strip(), 'id': item_id,
'title': title,
'formats': formats, 'formats': formats,
'thumbnail': 'http:/' + cover if cover else None, 'thumbnail': item.get('thumbnail'),
'artist': item.get('artist'), 'subtitles': subtitles,
'duration': int_or_none(item.get('duration')),
'track': title,
'artist': item.get('artists_names'),
'album': album.get('name') or album.get('title'),
'album_artist': album.get('artists_names'),
} }
def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None): def _real_extract(self, url):
player_json = self._download_json(player_json_url, id, 'Downloading Player JSON') page_id = self._match_id(url)
items = player_json['data'] webpage = self._download_webpage(
if 'item' in items: url.replace('://zingmp3.vn/', '://mp3.zing.vn/'),
items = items['item'] page_id, query={'play_song': 1})
data_path = self._search_regex(
if len(items) == 1: r'data-xml="([^"]+)', webpage, 'data path')
# one single song return self._process_data(self._download_json(
data = self._extract_item(items[0], page_type) 'https://mp3.zing.vn/xhr' + data_path, page_id)['data'])
data['id'] = id
return data
else:
# playlist of songs
entries = []
for i, item in enumerate(items, 1):
entry = self._extract_item(item, page_type, fatal=False)
if not entry:
continue
entry['id'] = '%s-%d' % (id, i)
entries.append(entry)
return {
'_type': 'playlist',
'id': id,
'title': playlist_title,
'entries': entries,
}
class ZingMp3IE(ZingMp3BaseInfoExtractor): class ZingMp3IE(ZingMp3BaseIE):
_VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html' _VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'bai-hat|video-clip'
_TESTS = [{ _TESTS = [{
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html', 'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'md5': 'ead7ae13693b3205cbc89536a077daed', 'md5': 'ead7ae13693b3205cbc89536a077daed',
@ -95,49 +96,66 @@ class ZingMp3IE(ZingMp3BaseInfoExtractor):
'id': 'ZWZB9WAB', 'id': 'ZWZB9WAB',
'title': 'Xa Mãi Xa', 'title': 'Xa Mãi Xa',
'ext': 'mp3', 'ext': 'mp3',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.+\.jpg',
'subtitles': {
'origin': [{
'ext': 'lrc',
}]
},
'duration': 255,
'track': 'Xa Mãi Xa',
'artist': 'Bảo Thy',
'album': 'Special Album',
'album_artist': 'Bảo Thy',
}, },
}, { }, {
'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html', 'url': 'https://mp3.zing.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html',
'md5': '870295a9cd8045c0e15663565902618d', 'md5': 'e9c972b693aa88301ef981c8151c4343',
'info_dict': { 'info_dict': {
'id': 'ZW6BAEA0', 'id': 'ZO8ZF7C7',
'title': 'Let It Go (Frozen OST)', 'title': 'Sương Hoa Đưa Lối',
'ext': 'mp4', 'ext': 'mp4',
'thumbnail': r're:^https?://.+\.jpg',
'duration': 207,
'track': 'Sương Hoa Đưa Lối',
'artist': 'K-ICM, RYO',
}, },
}, { }, {
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html', 'url': 'https://zingmp3.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
},
'playlist_count': 10,
'skip': 'removed at the request of the owner',
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True, 'only_matching': True,
}] }]
IE_NAME = 'zingmp3' IE_NAME = 'zingmp3'
IE_DESC = 'mp3.zing.vn' IE_DESC = 'mp3.zing.vn'
def _real_extract(self, url): def _process_data(self, data):
page_id = self._match_id(url) return self._extract_item(data, True)
webpage = self._download_webpage(url, page_id)
player_json_url = self._search_regex([ class ZingMp3AlbumIE(ZingMp3BaseIE):
r'data-xml="([^"]+)', _VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'album|playlist'
r'&amp;xmlURL=([^&]+)&' _TESTS = [{
], webpage, 'player xml url') 'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái',
},
'playlist_count': 10,
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True,
}, {
'url': 'https://zingmp3.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'only_matching': True,
}]
IE_NAME = 'zingmp3:album'
playlist_title = None def _process_data(self, data):
page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type') def entries():
if page_type == 'video': for item in (data.get('items') or []):
player_json_url = update_url_query(player_json_url, {'format': 'json'}) entry = self._extract_item(item, False)
else: if entry:
player_json_url = player_json_url.replace('/xml/', '/html5xml/') yield entry
if page_type == 'album': info = data.get('info') or {}
playlist_title = self._og_search_title(webpage) return self.playlist_result(
entries(), info.get('id'), info.get('name') or info.get('title'))
return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)

View File

@ -0,0 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
parse_filesize,
urlencode_postdata,
)
class ZoomIE(InfoExtractor):
IE_NAME = 'zoom'
_VALID_URL = r'(?P<base_url>https?://(?:[^.]+\.)?zoom.us/)rec(?:ording)?/(?:play|share)/(?P<id>[A-Za-z0-9_.-]+)'
_TEST = {
'url': 'https://economist.zoom.us/rec/play/dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'md5': 'ab445e8c911fddc4f9adc842c2c5d434',
'info_dict': {
'id': 'dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'ext': 'mp4',
'title': 'China\'s "two sessions" and the new five-year plan',
}
}
def _real_extract(self, url):
base_url, play_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, play_id)
try:
form = self._form_hidden_inputs('password_form', webpage)
except ExtractorError:
form = None
if form:
password = self._downloader.params.get('videopassword')
if not password:
raise ExtractorError(
'This video is protected by a passcode, use the --video-password option', expected=True)
is_meeting = form.get('useWhichPasswd') == 'meeting'
validation = self._download_json(
base_url + 'rec/validate%s_passwd' % ('_meet' if is_meeting else ''),
play_id, 'Validating passcode', 'Wrong passcode', data=urlencode_postdata({
'id': form[('meet' if is_meeting else 'file') + 'Id'],
'passwd': password,
'action': form.get('action'),
}))
if not validation.get('status'):
raise ExtractorError(validation['errorMessage'], expected=True)
webpage = self._download_webpage(url, play_id)
data = self._parse_json(self._search_regex(
r'(?s)window\.__data__\s*=\s*({.+?});',
webpage, 'data'), play_id, js_to_json)
return {
'id': play_id,
'title': data['topic'],
'url': data['viewMp4Url'],
'width': int_or_none(data.get('viewResolvtionsWidth')),
'height': int_or_none(data.get('viewResolvtionsHeight')),
'http_headers': {
'Referer': base_url,
},
'filesize_approx': parse_filesize(data.get('fileSize')),
}

View File

@ -39,6 +39,7 @@ import zlib
from .compat import ( from .compat import (
compat_HTMLParseError, compat_HTMLParseError,
compat_HTMLParser, compat_HTMLParser,
compat_HTTPError,
compat_basestring, compat_basestring,
compat_chr, compat_chr,
compat_cookiejar, compat_cookiejar,
@ -2879,12 +2880,60 @@ class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler): class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
if sys.version_info[0] < 3: """YoutubeDL redirect handler
The code is based on HTTPRedirectHandler implementation from CPython [1].
This redirect handler solves two issues:
- ensures redirect URL is always unicode under python 2
- introduces support for experimental HTTP response status code
308 Permanent Redirect [2] used by some sites [3]
1. https://github.com/python/cpython/blob/master/Lib/urllib/request.py
2. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/308
3. https://github.com/ytdl-org/youtube-dl/issues/28768
"""
http_error_301 = http_error_303 = http_error_307 = http_error_308 = compat_urllib_request.HTTPRedirectHandler.http_error_302
def redirect_request(self, req, fp, code, msg, headers, newurl): def redirect_request(self, req, fp, code, msg, headers, newurl):
"""Return a Request or None in response to a redirect.
This is called by the http_error_30x methods when a
redirection response is received. If a redirection should
take place, return a new Request to allow http_error_30x to
perform the redirect. Otherwise, raise HTTPError if no-one
else should try to handle this url. Return None if you can't
but another Handler might.
"""
m = req.get_method()
if (not (code in (301, 302, 303, 307, 308) and m in ("GET", "HEAD")
or code in (301, 302, 303) and m == "POST")):
raise compat_HTTPError(req.full_url, code, msg, headers, fp)
# Strictly (according to RFC 2616), 301 or 302 in response to
# a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib.request, in this case). In practice,
# essentially all clients do redirect in this case, so we do
# the same.
# On python 2 urlh.geturl() may sometimes return redirect URL # On python 2 urlh.geturl() may sometimes return redirect URL
# as byte string instead of unicode. This workaround allows # as byte string instead of unicode. This workaround allows
# to force it always return unicode. # to force it always return unicode.
return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl)) if sys.version_info[0] < 3:
newurl = compat_str(newurl)
# Be conciliant with URIs containing a space. This is mainly
# redundant with the more complete encoding done in http_error_302(),
# but it is kept for compatibility with other callers.
newurl = newurl.replace(' ', '%20')
CONTENT_HEADERS = ("content-length", "content-type")
# NB: don't use dict comprehension for python 2.6 compatibility
newheaders = dict((k, v) for k, v in req.headers.items()
if k.lower() not in CONTENT_HEADERS)
return compat_urllib_request.Request(
newurl, headers=newheaders, origin_req_host=req.origin_req_host,
unverifiable=True)
def extract_timezone(date_str): def extract_timezone(date_str):

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2021.02.10' __version__ = '2021.04.26'