Merge branch 'master' into afreecatv-fix-adult-vods

This commit is contained in:
Luc Ritchie 2021-05-21 02:36:17 -04:00
commit 7a450f053f
78 changed files with 2707 additions and 1067 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.05.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2021.03.03** - [ ] I've verified that I'm running youtube-dl version **2021.05.16**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.03.03 [debug] youtube-dl version 2021.05.16
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.05.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2021.03.03** - [ ] I've verified that I'm running youtube-dl version **2021.05.16**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.05.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2021.03.03** - [ ] I've verified that I'm running youtube-dl version **2021.05.16**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.05.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2021.03.03** - [ ] I've verified that I'm running youtube-dl version **2021.05.16**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.03.03 [debug] youtube-dl version 2021.05.16
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.05.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2021.03.03** - [ ] I've verified that I'm running youtube-dl version **2021.05.16**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -49,11 +49,18 @@ jobs:
- name: Install Jython - name: Install Jython
if: ${{ matrix.python-impl == 'jython' }} if: ${{ matrix.python-impl == 'jython' }}
run: | run: |
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython" java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose - name: Install nose
if: ${{ matrix.python-impl != 'jython' }}
run: pip install nose run: pip install nose
- name: Install nose (Jython)
if: ${{ matrix.python-impl == 'jython' }}
# Working around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
run: |
wget https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl
pip install nose-1.3.7-py2-none-any.whl
- name: Run tests - name: Run tests
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }} continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env: env:

160
ChangeLog
View File

@ -1,3 +1,163 @@
version 2021.05.16
Core
* [options] Fix thumbnail option group name (#29042)
* [YoutubeDL] Improve extract_info doc (#28946)
Extractors
+ [playstuff] Add support for play.stuff.co.nz (#28901, #28931)
* [eroprofile] Fix extraction (#23200, #23626, #29008)
+ [vivo] Add support for vivo.st (#29009)
+ [generic] Add support for og:audio (#28311, #29015)
* [phoenix] Fix extraction (#29057)
+ [generic] Add support for sibnet embeds
+ [vk] Add support for sibnet embeds (#9500)
+ [generic] Add Referer header for direct videojs download URLs (#2879,
#20217, #29053)
* [orf:radio] Switch download URLs to HTTPS (#29012, #29046)
- [blinkx] Remove extractor (#28941)
* [medaltv] Relax URL regular expression (#28884)
+ [funimation] Add support for optional lang code in URLs (#28950)
+ [gdcvault] Add support for HTML5 videos
* [dispeak] Improve FLV extraction (#13513, #28970)
* [kaltura] Improve iframe extraction (#28969)
* [kaltura] Make embed code alternatives actually work
* [cda] Improve extraction (#28709, #28937)
* [twitter] Improve formats extraction from vmap URL (#28909)
* [xtube] Fix formats extraction (#28870)
* [svtplay] Improve extraction (#28507, #28876)
* [tv2dk] Fix extraction (#28888)
version 2021.04.26
Extractors
+ [xfileshare] Add support for wolfstream.tv (#28858)
* [francetvinfo] Improve video id extraction (#28792)
* [medaltv] Fix extraction (#28807)
* [tver] Redirect all downloads to Brightcove (#28849)
* [go] Improve video id extraction (#25207, #25216, #26058)
* [youtube] Fix lazy extractors (#28780)
+ [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774)
* [cbsnews] Fix extraction for python <3.6 (#23359)
version 2021.04.17
Core
+ [utils] Add support for experimental HTTP response status code
308 Permanent Redirect (#27877, #28768)
Extractors
+ [lbry] Add support for HLS videos (#27877, #28768)
* [youtube] Fix stretched ratio calculation
* [youtube] Improve stretch extraction (#28769)
* [youtube:tab] Improve grid extraction (#28725)
+ [youtube:tab] Detect series playlist on playlists page (#28723)
+ [youtube] Add more invidious instances (#28706)
* [pluralsight] Extend anti-throttling timeout (#28712)
* [youtube] Improve URL to extractor routing (#27572, #28335, #28742)
+ [maoritv] Add support for maoritelevision.com (#24552)
+ [youtube:tab] Pass innertube context and x-goog-visitor-id header along with
continuation requests (#28702)
* [mtv] Fix Viacom A/B Testing Video Player extraction (#28703)
+ [pornhub] Extract DASH and HLS formats from get_media end point (#28698)
* [cbssports] Fix extraction (#28682)
* [jamendo] Fix track extraction (#28686)
* [curiositystream] Fix format extraction (#26845, #28668)
version 2021.04.07
Core
* [extractor/common] Use compat_cookies_SimpleCookie for _get_cookies
+ [compat] Introduce compat_cookies_SimpleCookie
* [extractor/common] Improve JSON-LD author extraction
* [extractor/common] Fix _get_cookies on python 2 (#20673, #23256, #20326,
#28640)
Extractors
* [youtube] Fix extraction of videos with restricted location (#28685)
+ [line] Add support for live.line.me (#17205, #28658)
* [vimeo] Improve extraction (#28591)
* [youku] Update ccode (#17852, #28447, #28460, #28648)
* [youtube] Prefer direct entry metadata over entry metadata from playlist
(#28619, #28636)
* [screencastomatic] Fix extraction (#11976, #24489)
+ [palcomp3] Add support for palcomp3.com (#13120)
+ [arnes] Add support for video.arnes.si (#28483)
+ [youtube:tab] Add support for hashtags (#28308)
version 2021.04.01
Extractors
* [youtube] Setup CONSENT cookie when needed (#28604)
* [vimeo] Fix password protected review extraction (#27591)
* [youtube] Improve age-restricted video extraction (#28578)
version 2021.03.31
Extractors
* [vlive] Fix inkey request (#28589)
* [francetvinfo] Improve video id extraction (#28584)
+ [instagram] Extract duration (#28469)
* [instagram] Improve title extraction (#28469)
+ [sbs] Add support for ondemand watch URLs (#28566)
* [youtube] Fix video's channel extraction (#28562)
* [picarto] Fix live stream extraction (#28532)
* [vimeo] Fix unlisted video extraction (#28414)
* [youtube:tab] Fix playlist/community continuation items extraction (#28266)
* [ard] Improve clip id extraction (#22724, #28528)
version 2021.03.25
Extractors
+ [zoom] Add support for zoom.us (#16597, #27002, #28531)
* [bbc] Fix BBC IPlayer Episodes/Group extraction (#28360)
* [youtube] Fix default value for youtube_include_dash_manifest (#28523)
* [zingmp3] Fix extraction (#11589, #16409, #16968, #27205)
+ [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
+ [tiktok] Detect private videos (#28453)
* [vimeo:album] Fix extraction for albums with number of videos multiple
to page size (#28486)
* [vvvvid] Fix kenc format extraction (#28473)
* [mlb] Fix video extraction (#21241)
* [svtplay] Improve extraction (#28448)
* [applepodcasts] Fix extraction (#28445)
* [rtve] Improve extraction
+ Extract all formats
* Fix RTVE Infantil extraction (#24851)
+ Extract is_live and series
version 2021.03.14
Core
+ Introduce release_timestamp meta field (#28386)
Extractors
+ [southpark] Add support for southparkstudios.com (#28413)
* [southpark] Fix extraction (#26763, #28413)
* [sportdeutschland] Fix extraction (#21856, #28425)
* [pinterest] Reduce the number of HLS format requests
* [peertube] Improve thumbnail extraction (#28419)
* [tver] Improve title extraction (#28418)
* [fujitv] Fix HLS formats extension (#28416)
* [shahid] Fix format extraction (#28383)
+ [lbry] Add support for channel filters (#28385)
+ [bandcamp] Extract release timestamp
+ [lbry] Extract release timestamp (#28386)
* [pornhub] Detect flagged videos
+ [pornhub] Extract formats from get_media end point (#28395)
* [bilibili] Fix video info extraction (#28341)
+ [cbs] Add support for Paramount+ (#28342)
+ [trovo] Add Origin header to VOD formats (#28346)
* [voxmedia] Fix volume embed extraction (#28338)
version 2021.03.03 version 2021.03.03
Extractors Extractors

View File

@ -287,7 +287,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--no-cache-dir Disable filesystem caching --no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files --rm-cache-dir Delete all filesystem cache files
## Thumbnail images: ## Thumbnail Options:
--write-thumbnail Write thumbnail image to disk --write-thumbnail Write thumbnail image to disk
--write-all-thumbnails Write all thumbnail image formats to --write-all-thumbnails Write all thumbnail image formats to
disk disk

View File

@ -3,6 +3,7 @@
- **20min** - **20min**
- **220.ro** - **220.ro**
- **23video** - **23video**
- **247sports**
- **24video** - **24video**
- **3qsdn**: 3Q SDN - **3qsdn**: 3Q SDN
- **3sat** - **3sat**
@ -90,7 +91,8 @@
- **bbc**: BBC - **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer - **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist** - **bbc.co.uk:iplayer:episodes**
- **bbc.co.uk:iplayer:group**
- **bbc.co.uk:playlist** - **bbc.co.uk:playlist**
- **BBVTV** - **BBVTV**
- **Beatport** - **Beatport**
@ -117,7 +119,6 @@
- **BitChuteChannel** - **BitChuteChannel**
- **BleacherReport** - **BleacherReport**
- **BleacherReportCMS** - **BleacherReportCMS**
- **blinkx**
- **Bloomberg** - **Bloomberg**
- **BokeCC** - **BokeCC**
- **BongaCams** - **BongaCams**
@ -159,7 +160,8 @@
- **cbsnews**: CBS News - **cbsnews**: CBS News
- **cbsnews:embed** - **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports** - **cbssports**
- **cbssports:embed**
- **CCMA** - **CCMA**
- **CCTV**: 央视网 - **CCTV**: 央视网
- **CDA** - **CDA**
@ -462,6 +464,8 @@
- **limelight** - **limelight**
- **limelight:channel** - **limelight:channel**
- **limelight:channel_list** - **limelight:channel_list**
- **LineLive**
- **LineLiveChannel**
- **LineTV** - **LineTV**
- **linkedin:learning** - **linkedin:learning**
- **linkedin:learning:course** - **linkedin:learning:course**
@ -487,6 +491,7 @@
- **mangomolo:live** - **mangomolo:live**
- **mangomolo:video** - **mangomolo:video**
- **ManyVids** - **ManyVids**
- **MaoriTV**
- **Markiza** - **Markiza**
- **MarkizaPage** - **MarkizaPage**
- **massengeschmack.tv** - **massengeschmack.tv**
@ -522,6 +527,7 @@
- **mixcloud:playlist** - **mixcloud:playlist**
- **mixcloud:user** - **mixcloud:user**
- **MLB** - **MLB**
- **MLBVideo**
- **Mnet** - **Mnet**
- **MNetTV** - **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
@ -677,6 +683,9 @@
- **OutsideTV** - **OutsideTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PalcoMP3:artist**
- **PalcoMP3:song**
- **PalcoMP3:video**
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **ParamountNetwork** - **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
@ -703,6 +712,7 @@
- **play.fm** - **play.fm**
- **player.sky.it** - **player.sky.it**
- **PlayPlusTV** - **PlayPlusTV**
- **PlayStuff**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid** - **Playvid**
@ -1057,6 +1067,7 @@
- **Vidbit** - **Vidbit**
- **Viddler** - **Viddler**
- **Videa** - **Videa**
- **video.arnes.si**: Arnes Video
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.sky.it** - **video.sky.it**
- **video.sky.it:live** - **video.sky.it:live**
@ -1151,7 +1162,7 @@
- **WWE** - **WWE**
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing - **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, WolfStream, XVideoSharing
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **XHamsterUser** - **XHamsterUser**
@ -1212,4 +1223,6 @@
- **ZDFChannel** - **ZDFChannel**
- **Zhihu** - **Zhihu**
- **zingmp3**: mp3.zing.vn - **zingmp3**: mp3.zing.vn
- **zingmp3:album**
- **zoom**
- **Zype** - **Zype**

View File

@ -70,15 +70,6 @@ class TestAllURLsMatching(unittest.TestCase):
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url']) # self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url']) # self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_facebook_matching(self): def test_facebook_matching(self):
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268')) self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793')) self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793'))

View File

@ -39,6 +39,16 @@ class TestExecution(unittest.TestCase):
_, stderr = p.communicate() _, stderr = p.communicate()
self.assertFalse(stderr) self.assertFalse(stderr)
def test_lazy_extractors(self):
try:
subprocess.check_call([sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, 'test/test_all_urls.py'], cwd=rootDir, stdout=_DEV_NULL)
finally:
try:
os.remove('youtube_dl/extractor/lazy_extractors.py')
except (IOError, OSError):
pass
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

26
test/test_youtube_misc.py Normal file
View File

@ -0,0 +1,26 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeIE
class TestYoutubeMisc(unittest.TestCase):
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
if __name__ == '__main__':
unittest.main()

View File

@ -773,11 +773,20 @@ class YoutubeDL(object):
def extract_info(self, url, download=True, ie_key=None, extra_info={}, def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True, force_generic_extractor=False): process=True, force_generic_extractor=False):
''' """
Returns a list with a dictionary for each video we find. Return a list with a dictionary for each video extracted.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result Arguments:
''' url -- URL to extract
Keyword arguments:
download -- whether to download videos during extraction
ie_key -- extractor key hint
extra_info -- dictionary containing the extra values to add to each result
process -- whether to resolve all unresolved references (URLs, playlist items),
must be True for download to work.
force_generic_extractor -- force using the generic extractor
"""
if not ie_key and force_generic_extractor: if not ie_key and force_generic_extractor:
ie_key = 'Generic' ie_key = 'Generic'

View File

@ -73,6 +73,15 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import Cookie as compat_cookies import Cookie as compat_cookies
if sys.version_info[0] == 2:
class compat_cookies_SimpleCookie(compat_cookies.SimpleCookie):
def load(self, rawdata):
if isinstance(rawdata, compat_str):
rawdata = str(rawdata)
return super(compat_cookies_SimpleCookie, self).load(rawdata)
else:
compat_cookies_SimpleCookie = compat_cookies.SimpleCookie
try: try:
import html.entities as compat_html_entities import html.entities as compat_html_entities
except ImportError: # Python 2 except ImportError: # Python 2
@ -3000,6 +3009,7 @@ __all__ = [
'compat_cookiejar', 'compat_cookiejar',
'compat_cookiejar_Cookie', 'compat_cookiejar_Cookie',
'compat_cookies', 'compat_cookies',
'compat_cookies_SimpleCookie',
'compat_ctypes_WINFUNCTYPE', 'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element', 'compat_etree_Element',
'compat_etree_fromstring', 'compat_etree_fromstring',

View File

@ -42,6 +42,7 @@ class ApplePodcastsIE(InfoExtractor):
ember_data = self._parse_json(self._search_regex( ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<', r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id) webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes'] episode = ember_data['data']['attributes']
description = episode.get('description') or {} description = episode.get('description') or {}

View File

@ -335,7 +335,7 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE): class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)' _VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?:[^/]+/)?(?:player|live|video)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/', 'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
'md5': 'a1dc75a39c61601b980648f7c9f9f71d', 'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
@ -365,22 +365,22 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}, { }, {
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg', 'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/video/coronavirus-update-ndr-info/astrazeneca-kurz-lockdown-und-pims-syndrom-81/ndr/Y3JpZDovL25kci5kZS84NzE0M2FjNi0wMWEwLTQ5ODEtOTE5NS1mOGZhNzdhOTFmOTI/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
player_page = self._download_json( player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway', 'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({ video_id, data=json.dumps({
'query': '''{ 'query': '''{
playerPage(client:"%s", clipId: "%s") { playerPage(client: "ard", clipId: "%s") {
blockedByFsk blockedByFsk
broadcastedOn broadcastedOn
maturityContentRating maturityContentRating
@ -410,7 +410,7 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
} }
} }
} }
}''' % (mobj.group('client'), video_id), }''' % video_id,
}).encode(), headers={ }).encode(), headers={
'Content-Type': 'application/json' 'Content-Type': 'application/json'
})['data']['playerPage'] })['data']['playerPage']
@ -435,7 +435,6 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None)) r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({ info.update({
'age_limit': age_limit, 'age_limit': age_limit,
'display_id': display_id,
'title': title, 'title': title,
'description': description, 'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')), 'timestamp': unified_timestamp(player_page.get('broadcastedOn')),

View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View File

@ -1,17 +1,23 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import functools
import itertools import itertools
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_etree_Element, compat_etree_Element,
compat_HTTPError, compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList,
clean_html, clean_html,
dict_get, dict_get,
float_or_none, float_or_none,
@ -20,8 +26,10 @@ from ..utils import (
js_to_json, js_to_json,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
strip_or_none,
try_get, try_get,
unescapeHTML, unescapeHTML,
unified_timestamp,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@ -756,8 +764,17 @@ class BBCIE(BBCCoUkIE):
'only_matching': True, 'only_matching': True,
}, { }, {
# custom redirection to www.bbc.com # custom redirection to www.bbc.com
# also, video with window.__INITIAL_DATA__
'url': 'http://www.bbc.co.uk/news/science-environment-33661876', 'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True, 'info_dict': {
'id': 'p02xzws1',
'ext': 'mp4',
'title': "Pluto may have 'nitrogen glaciers'",
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, { }, {
# single video article embedded with data-media-vpid # single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187', 'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
@ -811,7 +828,7 @@ class BBCIE(BBCCoUkIE):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE) EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE) return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url)) else super(BBCIE, cls).suitable(url))
@ -1159,12 +1176,29 @@ class BBCIE(BBCCoUkIE):
continue continue
formats, subtitles = self._download_media_selector(item_id) formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats) self._sort_formats(formats)
item_desc = None
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
if blocks:
summary = []
for block in blocks:
text = try_get(block, lambda x: x['model']['text'], compat_str)
if text:
summary.append(text)
if summary:
item_desc = '\n\n'.join(summary)
item_time = None
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
if try_get(meta, lambda x: x['label']) == 'Published':
item_time = unified_timestamp(meta.get('timestamp'))
break
entries.append({ entries.append({
'id': item_id, 'id': item_id,
'title': item_title, 'title': item_title,
'thumbnail': item.get('holdingImageUrl'), 'thumbnail': item.get('holdingImageUrl'),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'timestamp': item_time,
'description': strip_or_none(item_desc),
}) })
for resp in (initial_data.get('data') or {}).values(): for resp in (initial_data.get('data') or {}).values():
name = resp.get('name') name = resp.get('name')
@ -1338,21 +1372,149 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
playlist_id, title, description) playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE): class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
IE_NAME = 'bbc.co.uk:iplayer:playlist' _VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s' @staticmethod
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)' def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
_TESTS = [{ _TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v', 'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': { 'info_dict': {
'id': 'b05rcz9v', 'id': 'b05rcz9v',
'title': 'The Disappearance', 'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.', 'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
}, },
'playlist_mincount': 6, 'playlist_mincount': 8,
'skip': 'This programme is not currently available on BBC iPlayer',
}, { }, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes # Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32', 'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': { 'info_dict': {
@ -1361,14 +1523,56 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7', 'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
}, },
'playlist_mincount': 10, 'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}] }]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _extract_title_and_description(self, webpage): def _get_episode_image(self, episode):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False) return self._get_default(episode, 'images', 'standard')
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>', def _get_episode_field(self, episode, field):
webpage, 'description', fatal=False, group='value') return episode.get(field)
return title, description
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE): class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View File

@ -1,86 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
remove_start,
int_or_none,
)
class BlinkxIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
IE_NAME = 'blinkx'
_TEST = {
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
'md5': '337cf7a344663ec79bf93a526a2e06c7',
'info_dict': {
'id': 'Da0Gw3xc',
'ext': 'mp4',
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
'uploader': 'IGN News',
'upload_date': '20150217',
'timestamp': 1424215740,
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
'duration': 47.743333,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id[:8]
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
+ 'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
duration = None
thumbnails = []
formats = []
for m in data['media']:
if m['type'] == 'jpg':
thumbnails.append({
'url': m['link'],
'width': int(m['w']),
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
tbr = vbr + abr if vbr and abr else None
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
'vcodec': vcodec,
'acodec': acodec,
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(m.get('w')),
'height': int_or_none(m.get('h')),
})
self._sort_formats(formats)
return {
'id': display_id,
'fullid': video_id,
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,
}

View File

@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
def _real_extract(self, url): def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode( item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))), compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0] -zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews') return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View File

@ -1,38 +1,113 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .cbs import CBSBaseIE import re
# from .cbs import CBSBaseIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
)
class CBSSportsIE(CBSBaseIE): # class CBSSportsEmbedIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)' class CBSSportsEmbedIE(InfoExtractor):
IE_NAME = 'cbssports:embed'
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
(?:
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
pcid%3D(?P<pcid>\d+)
)'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/', 'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
'info_dict': { 'only_matching': True,
'id': '1214315075735',
'ext': 'mp4',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/', 'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
'only_matching': True, 'only_matching': True,
}] }]
def _extract_video_info(self, filter_query, video_id): # def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id) # return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
uuid, pcid = re.match(self._VALID_URL, url).groups()
query = {'id': uuid} if uuid else {'pcid': pcid}
video = self._download_json(
'https://www.cbssports.com/api/content/video/',
uuid or pcid, query=query)[0]
video_id = video['id']
title = video['title']
metadata = video.get('metaData') or {}
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
formats = self._extract_m3u8_formats(
metadata['files'][0]['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
image = video.get('image')
thumbnails = None
if image:
image_path = image.get('path')
if image_path:
thumbnails = [{
'url': image_path,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
'filesize': int_or_none(image.get('size')),
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video.get('description'),
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
'duration': int_or_none(metadata.get('duration')),
}
class CBSSportsBaseIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( iframe_url = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'], r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
webpage, 'video id') webpage, 'embed url')
return self._extract_video_info('byId=%s' % video_id, video_id) return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
class CBSSportsIE(CBSSportsBaseIE):
IE_NAME = 'cbssports'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
'info_dict': {
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
'ext': 'mp4',
'title': 'Cover 3: Stanford Spring Gleaning',
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
'timestamp': 1617218398,
'upload_date': '20210331',
'duration': 502,
},
}]
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
IE_NAME = '247sports'
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
'info_dict': {
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
'ext': 'mp4',
'title': '2021 QB Jake Garcia senior highlights through five games',
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
'timestamp': 1607114223,
'upload_date': '20201204',
'duration': 208,
},
}]

View File

@ -133,6 +133,8 @@ class CDAIE(InfoExtractor):
'age_limit': 18 if need_confirm_age else 0, 'age_limit': 18 if need_confirm_age else 0,
} }
info = self._search_json_ld(webpage, video_id, default={})
# Source: https://www.cda.pl/js/player.js?t=1606154898 # Source: https://www.cda.pl/js/player.js?t=1606154898
def decrypt_file(a): def decrypt_file(a):
for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'): for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'):
@ -197,7 +199,7 @@ class CDAIE(InfoExtractor):
handler = self._download_webpage handler = self._download_webpage
webpage = handler( webpage = handler(
self._BASE_URL + href, video_id, urljoin(self._BASE_URL, href), video_id,
'Downloading %s version information' % resolution, fatal=False) 'Downloading %s version information' % resolution, fatal=False)
if not webpage: if not webpage:
# Manually report warning because empty page is returned when # Manually report warning because empty page is returned when
@ -209,6 +211,4 @@ class CDAIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts(info_dict, info) return merge_dicts(info_dict, info)

View File

@ -17,7 +17,7 @@ import math
from ..compat import ( from ..compat import (
compat_cookiejar_Cookie, compat_cookiejar_Cookie,
compat_cookies, compat_cookies_SimpleCookie,
compat_etree_Element, compat_etree_Element,
compat_etree_fromstring, compat_etree_fromstring,
compat_getpass, compat_getpass,
@ -1275,6 +1275,7 @@ class InfoExtractor(object):
def extract_video_object(e): def extract_video_object(e):
assert e['@type'] == 'VideoObject' assert e['@type'] == 'VideoObject'
author = e.get('author')
info.update({ info.update({
'url': url_or_none(e.get('contentUrl')), 'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')), 'title': unescapeHTML(e.get('name')),
@ -1282,7 +1283,11 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')), 'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')), 'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')), 'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')), # author can be an instance of 'Organization' or 'Person' types.
# both types can have 'name' property(inherited from 'Thing' type). [1]
# however some websites are using 'Text' type instead.
# 1. https://schema.org/VideoObject
'uploader': author.get('name') if isinstance(author, dict) else author if isinstance(author, compat_str) else None,
'filesize': float_or_none(e.get('contentSize')), 'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')), 'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')), 'width': int_or_none(e.get('width')),
@ -2896,10 +2901,10 @@ class InfoExtractor(object):
self._downloader.cookiejar.set_cookie(cookie) self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url): def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """ """ Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url) req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req) self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie')) return compat_cookies_SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie): def _apply_first_set_cookie_header(self, url_handle, cookie):
""" """

View File

@ -25,12 +25,12 @@ class CuriosityStreamBaseIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True) '%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id): def _call_api(self, path, video_id, query=None):
headers = {} headers = {}
if self._auth_token: if self._auth_token:
headers['X-Auth-Token'] = self._auth_token headers['X-Auth-Token'] = self._auth_token
result = self._download_json( result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers) self._API_BASE_URL + path, video_id, headers=headers, query=query)
self._handle_errors(result) self._handle_errors(result)
return result['data'] return result['data']
@ -52,62 +52,75 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)' _VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'https://app.curiositystream.com/video/2', 'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': { 'info_dict': {
'id': '2', 'id': '2',
'ext': 'mp4', 'ext': 'mp4',
'title': 'How Did You Develop The Internet?', 'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.', 'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
} },
'params': {
'format': 'bestvideo',
# m3u8 download
'skip_download': True,
},
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title']
formats = [] formats = []
for encoding in media.get('encodings', []): for encoding_format in ('m3u8', 'mpd'):
m3u8_url = encoding.get('master_playlist_url') media = self._call_api('media/' + video_id, video_id, query={
if m3u8_url: 'encodingsNew': 'true',
formats.extend(self._extract_m3u8_formats( 'encodingsFormat': encoding_format,
m3u8_url, video_id, 'mp4', 'm3u8_native', })
m3u8_id='hls', fatal=False)) for encoding in media.get('encodings', []):
encoding_url = encoding.get('url') playlist_url = encoding.get('master_playlist_url')
file_url = encoding.get('file_url') if encoding_format == 'm3u8':
if not encoding_url and not file_url: # use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
continue formats.extend(self._extract_m3u8_formats(
f = { playlist_url, video_id, 'mp4',
'width': int_or_none(encoding.get('width')), m3u8_id='hls', fatal=False))
'height': int_or_none(encoding.get('height')), elif encoding_format == 'mpd':
'vbr': int_or_none(encoding.get('video_bitrate')), formats.extend(self._extract_mpd_formats(
'abr': int_or_none(encoding.get('audio_bitrate')), playlist_url, video_id, mpd_id='dash', fatal=False))
'filesize': int_or_none(encoding.get('size_in_bytes')), encoding_url = encoding.get('url')
'vcodec': encoding.get('video_codec'), file_url = encoding.get('file_url')
'acodec': encoding.get('audio_codec'), if not encoding_url and not file_url:
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
continue continue
fmt = f.copy() f = {
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url) 'width': int_or_none(encoding.get('width')),
if rtmp: 'height': int_or_none(encoding.get('height')),
fmt.update({ 'vbr': int_or_none(encoding.get('video_bitrate')),
'url': rtmp.group('url'), 'abr': int_or_none(encoding.get('audio_bitrate')),
'play_path': rtmp.group('playpath'), 'filesize': int_or_none(encoding.get('size_in_bytes')),
'app': rtmp.group('app'), 'vcodec': encoding.get('video_codec'),
'ext': 'flv', 'acodec': encoding.get('audio_codec'),
'format_id': 'rtmp', 'container': encoding.get('container_type'),
}) }
else: for f_url in (encoding_url, file_url):
fmt.update({ if not f_url:
'url': f_url, continue
'format_id': 'http', fmt = f.copy()
}) rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
formats.append(fmt) if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
self._sort_formats(formats) self._sort_formats(formats)
title = media['title']
subtitles = {} subtitles = {}
for closed_caption in media.get('closed_captions', []): for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file') sub_url = closed_caption.get('file')
@ -140,7 +153,7 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'title': 'Curious Minds: The Internet', 'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?', 'description': 'How is the internet shaping our lives in the 21st Century?',
}, },
'playlist_mincount': 17, 'playlist_mincount': 16,
}, { }, {
'url': 'https://curiositystream.com/series/2', 'url': 'https://curiositystream.com/series/2',
'only_matching': True, 'only_matching': True,

View File

@ -32,6 +32,18 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1013700/Advanced-Material # From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml', 'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True, 'only_matching': True,
}, {
# From https://gdcvault.com/play/1016624, empty speakerVideo
'url': 'https://sevt.dispeak.com/ubm/gdc/online12/xml/201210-822101_1349794556671DDDD.xml',
'info_dict': {
'id': '201210-822101_1349794556671DDDD',
'ext': 'flv',
'title': 'Pre-launch - Preparing to Take the Plunge',
},
}, {
# From http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru, empty slideVideo
'url': 'http://events.digitallyspeaking.com/gdc/project25/xml/p25-miyamoto1999_1282467389849HSVB.xml',
'only_matching': True,
}] }]
def _parse_mp4(self, metadata): def _parse_mp4(self, metadata):
@ -84,26 +96,20 @@ class DigitallySpeakingIE(InfoExtractor):
'vcodec': 'none', 'vcodec': 'none',
'format_id': audio.get('code'), 'format_id': audio.get('code'),
}) })
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True) for video_key, format_id, preference in (
formats.append({ ('slide', 'slides', -2), ('speaker', 'speaker', -1)):
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, video_path = xpath_text(metadata, './%sVideo' % video_key)
'play_path': remove_end(slide_video_path, '.flv'), if not video_path:
'ext': 'flv', continue
'format_note': 'slide deck video', formats.append({
'quality': -2, 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'preference': -2, 'play_path': remove_end(video_path, '.flv'),
'format_id': 'slides', 'ext': 'flv',
}) 'format_note': '%s video' % video_key,
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True) 'quality': preference,
formats.append({ 'preference': preference,
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, 'format_id': format_id,
'play_path': remove_end(speaker_video_path, '.flv'), })
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats return formats
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -6,7 +6,7 @@ from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode from ..compat import compat_urllib_parse_urlencode
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
unescapeHTML merge_dicts,
) )
@ -24,7 +24,8 @@ class EroProfileIE(InfoExtractor):
'title': 'sexy babe softcore', 'title': 'sexy babe softcore',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} },
'skip': 'Video not found',
}, { }, {
'url': 'http://www.eroprofile.com/m/videos/view/Try-It-On-Pee_cut_2-wmv-4shared-com-file-sharing-download-movie-file', 'url': 'http://www.eroprofile.com/m/videos/view/Try-It-On-Pee_cut_2-wmv-4shared-com-file-sharing-download-movie-file',
'md5': '1baa9602ede46ce904c431f5418d8916', 'md5': '1baa9602ede46ce904c431f5418d8916',
@ -77,19 +78,15 @@ class EroProfileIE(InfoExtractor):
[r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'], [r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'],
webpage, 'video id', default=None) webpage, 'video id', default=None)
video_url = unescapeHTML(self._search_regex(
r'<source src="([^"]+)', webpage, 'video url'))
title = self._html_search_regex( title = self._html_search_regex(
r'Title:</th><td>([^<]+)</td>', webpage, 'title') (r'Title:</th><td>([^<]+)</td>', r'<h1[^>]*>(.+?)</h1>'),
thumbnail = self._search_regex( webpage, 'title')
r'onclick="showVideoPlayer\(\)"><img src="([^"]+)',
webpage, 'thumbnail', fatal=False)
return { info = self._parse_html5_media_entries(url, webpage, video_id)[0]
return merge_dicts(info, {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'url': video_url,
'title': title, 'title': title,
'thumbnail': thumbnail,
'age_limit': 18, 'age_limit': 18,
} })

View File

@ -72,6 +72,7 @@ from .arte import (
ArteTVEmbedIE, ArteTVEmbedIE,
ArteTVPlaylistIE, ArteTVPlaylistIE,
) )
from .arnes import ArnesIE
from .asiancrush import ( from .asiancrush import (
AsianCrushIE, AsianCrushIE,
AsianCrushPlaylistIE, AsianCrushPlaylistIE,
@ -95,7 +96,8 @@ from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import ( from .bbc import (
BBCCoUkIE, BBCCoUkIE,
BBCCoUkArticleIE, BBCCoUkArticleIE,
BBCCoUkIPlayerPlaylistIE, BBCCoUkIPlayerEpisodesIE,
BBCCoUkIPlayerGroupIE,
BBCCoUkPlaylistIE, BBCCoUkPlaylistIE,
BBCIE, BBCIE,
) )
@ -130,7 +132,6 @@ from .bleacherreport import (
BleacherReportIE, BleacherReportIE,
BleacherReportCMSIE, BleacherReportCMSIE,
) )
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE from .bongacams import BongaCamsIE
@ -189,7 +190,11 @@ from .cbsnews import (
CBSNewsIE, CBSNewsIE,
CBSNewsLiveVideoIE, CBSNewsLiveVideoIE,
) )
from .cbssports import CBSSportsIE from .cbssports import (
CBSSportsEmbedIE,
CBSSportsIE,
TwentyFourSevenSportsIE,
)
from .ccc import ( from .ccc import (
CCCIE, CCCIE,
CCCPlaylistIE, CCCPlaylistIE,
@ -593,7 +598,11 @@ from .limelight import (
LimelightChannelIE, LimelightChannelIE,
LimelightChannelListIE, LimelightChannelListIE,
) )
from .line import LineTVIE from .line import (
LineTVIE,
LineLiveIE,
LineLiveChannelIE,
)
from .linkedin import ( from .linkedin import (
LinkedInLearningIE, LinkedInLearningIE,
LinkedInLearningCourseIE, LinkedInLearningCourseIE,
@ -630,6 +639,7 @@ from .mangomolo import (
MangomoloLiveIE, MangomoloLiveIE,
) )
from .manyvids import ManyVidsIE from .manyvids import ManyVidsIE
from .maoritv import MaoriTVIE
from .markiza import ( from .markiza import (
MarkizaIE, MarkizaIE,
MarkizaPageIE, MarkizaPageIE,
@ -673,7 +683,10 @@ from .mixcloud import (
MixcloudUserIE, MixcloudUserIE,
MixcloudPlaylistIE, MixcloudPlaylistIE,
) )
from .mlb import MLBIE from .mlb import (
MLBIE,
MLBVideoIE,
)
from .mnet import MnetIE from .mnet import MnetIE
from .moevideo import MoeVideoIE from .moevideo import MoeVideoIE
from .mofosex import ( from .mofosex import (
@ -874,6 +887,11 @@ from .packtpub import (
PacktPubIE, PacktPubIE,
PacktPubCourseIE, PacktPubCourseIE,
) )
from .palcomp3 import (
PalcoMP3IE,
PalcoMP3ArtistIE,
PalcoMP3VideoIE,
)
from .pandoratv import PandoraTVIE from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
@ -907,6 +925,7 @@ from .platzi import (
from .playfm import PlayFMIE from .playfm import PlayFMIE
from .playplustv import PlayPlusTVIE from .playplustv import PlayPlusTVIE
from .plays import PlaysTVIE from .plays import PlaysTVIE
from .playstuff import PlayStuffIE
from .playtvak import PlaytvakIE from .playtvak import PlaytvakIE
from .playvid import PlayvidIE from .playvid import PlayvidIE
from .playwire import PlaywireIE from .playwire import PlaywireIE
@ -1621,5 +1640,9 @@ from .zattoo import (
) )
from .zdf import ZDFIE, ZDFChannelIE from .zdf import ZDFIE, ZDFChannelIE
from .zhihu import ZhihuIE from .zhihu import ZhihuIE
from .zingmp3 import ZingMp3IE from .zingmp3 import (
ZingMp3IE,
ZingMp3AlbumIE,
)
from .zoom import ZoomIE
from .zype import ZypeIE from .zype import ZypeIE

View File

@ -383,6 +383,10 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
}, { }, {
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin', 'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
'only_matching': True, 'only_matching': True,
}, {
# "<figure id=" pattern (#28792)
'url': 'https://www.francetvinfo.fr/culture/patrimoine/incendie-de-notre-dame-de-paris/notre-dame-de-paris-de-l-incendie-de-la-cathedrale-a-sa-reconstruction_4372291.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -399,7 +403,8 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)', (r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)', r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'), r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id') webpage, 'video id')
return self._make_url_result(video_id) return self._make_url_result(video_id)

View File

@ -17,7 +17,7 @@ class FujiTVFODPlus7IE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id) self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id, 'mp4')
for f in formats: for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr')) wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh: if wh:

View File

@ -16,7 +16,7 @@ from ..utils import (
class FunimationIE(InfoExtractor): class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/shows/[^/]+/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:[^/]+/)?shows/[^/]+/(?P<id>[^/?#&]+)'
_NETRC_MACHINE = 'funimation' _NETRC_MACHINE = 'funimation'
_TOKEN = None _TOKEN = None
@ -51,6 +51,10 @@ class FunimationIE(InfoExtractor):
}, { }, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/', 'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True, 'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}] }]
def _login(self): def _login(self):

View File

@ -6,6 +6,7 @@ from .common import InfoExtractor
from .kaltura import KalturaIE from .kaltura import KalturaIE
from ..utils import ( from ..utils import (
HEADRequest, HEADRequest,
remove_start,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
urlencode_postdata, urlencode_postdata,
@ -102,6 +103,26 @@ class GDCVaultIE(InfoExtractor):
'format': 'mp4-408', 'format': 'mp4-408',
}, },
}, },
{
# Kaltura embed, whitespace between quote and embedded URL in iframe's src
'url': 'https://www.gdcvault.com/play/1025699',
'info_dict': {
'id': '0_zagynv0a',
'ext': 'mp4',
'title': 'Tech Toolbox',
'upload_date': '20190408',
'uploader_id': 'joe@blazestreaming.com',
'timestamp': 1554764629,
},
'params': {
'skip_download': True,
},
},
{
# HTML5 video
'url': 'http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru',
'only_matching': True,
},
] ]
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
@ -175,7 +196,18 @@ class GDCVaultIE(InfoExtractor):
xml_name = self._html_search_regex( xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>', r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename') start_page, 'xml filename', default=None)
if not xml_name:
info = self._parse_html5_media_entries(url, start_page, video_id)[0]
info.update({
'title': remove_start(self._search_regex(
r'>Session Name:\s*<.*?>\s*<td>(.+?)</td>', start_page,
'title', default=None) or self._og_search_title(
start_page, default=None), 'GDC Vault - '),
'id': video_id,
'display_id': display_id,
})
return info
embed_url = '%s/xml/%s' % (xml_root, xml_name) embed_url = '%s/xml/%s' % (xml_root, xml_name)
ie_key = 'DigitallySpeaking' ie_key = 'DigitallySpeaking'

View File

@ -126,6 +126,7 @@ from .viqeo import ViqeoIE
from .expressen import ExpressenIE from .expressen import ExpressenIE
from .zype import ZypeIE from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .vk import VKIE
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .arcpublishing import ArcPublishingIE from .arcpublishing import ArcPublishingIE
from .medialaan import MedialaanIE from .medialaan import MedialaanIE
@ -2248,6 +2249,11 @@ class GenericIE(InfoExtractor):
}, },
'playlist_mincount': 52, 'playlist_mincount': 52,
}, },
{
# Sibnet embed (https://help.sibnet.ru/?sibnet_video_embed)
'url': 'https://phpbb3.x-tk.ru/bbcode-video-sibnet-t24.html',
'only_matching': True,
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@ -2777,6 +2783,11 @@ class GenericIE(InfoExtractor):
if odnoklassniki_url: if odnoklassniki_url:
return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key()) return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
# Look for sibnet embedded player
sibnet_urls = VKIE._extract_sibnet_urls(webpage)
if sibnet_urls:
return self.playlist_from_matches(sibnet_urls, video_id, video_title)
# Look for embedded ivi player # Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage) mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
if mobj is not None: if mobj is not None:
@ -2953,7 +2964,7 @@ class GenericIE(InfoExtractor):
webpage) webpage)
if not mobj: if not mobj:
mobj = re.search( mobj = re.search(
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)', r'data-video-link=["\'](?P<url>http://m\.mlb\.com/video/[^"\']+)',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB') return self.url_result(mobj.group('url'), 'MLB')
@ -3400,6 +3411,9 @@ class GenericIE(InfoExtractor):
'url': src, 'url': src,
'ext': (mimetype2ext(src_type) 'ext': (mimetype2ext(src_type)
or ext if ext in KNOWN_EXTENSIONS else 'mp4'), or ext if ext in KNOWN_EXTENSIONS else 'mp4'),
'http_headers': {
'Referer': full_response.geturl(),
},
}) })
if formats: if formats:
self._sort_formats(formats) self._sort_formats(formats)
@ -3468,7 +3482,7 @@ class GenericIE(InfoExtractor):
m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage) m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player: # We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None: if m_video_type is not None:
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)) found = filter_video(re.findall(r'<meta.*?property="og:(?:video|audio)".*?content="(.*?)"', webpage))
if not found: if not found:
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)' REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
found = re.search( found = re.search(

View File

@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re import re
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
determine_ext, determine_ext,
parse_age_limit, parse_age_limit,
try_get,
urlencode_postdata, urlencode_postdata,
ExtractorError, ExtractorError,
) )
@ -116,6 +118,18 @@ class GoIE(AdobePassIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot',
'info_dict': {
'id': 'VDKA22600213',
'ext': 'mp4',
'title': 'Pilot',
'description': 'md5:74306df917cfc199d76d061d66bebdb4',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True, 'only_matching': True,
@ -149,14 +163,30 @@ class GoIE(AdobePassIE):
brand = site_info.get('brand') brand = site_info.get('brand')
if not video_id or not site_info: if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id) webpage = self._download_webpage(url, display_id or video_id)
video_id = self._search_regex( data = self._parse_json(
( self._search_regex(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'" r'["\']__abc_com__["\']\s*\]\s*=\s*({.+?})\s*;', webpage,
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood 'data', default='{}'),
r'data-video-id=["\']*(VDKA\w+)', display_id or video_id, fatal=False)
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet # https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)' layout = try_get(data, lambda x: x['page']['content']['video']['layout'], dict)
), webpage, 'video id', default=video_id) video_id = None
if layout:
video_id = try_get(
layout,
(lambda x: x['videoid'], lambda x: x['video']['id']),
compat_str)
if not video_id:
video_id = self._search_regex(
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# page.analytics.videoIdCode
r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
), webpage, 'video id', default=video_id)
if not site_info: if not site_info:
brand = self._search_regex( brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)', (r'data-brand=\s*["\']\s*(\d+)',

View File

@ -12,6 +12,7 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none,
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
@ -32,6 +33,7 @@ class InstagramIE(InfoExtractor):
'title': 'Video by naomipq', 'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8', 'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1371748545, 'timestamp': 1371748545,
'upload_date': '20130620', 'upload_date': '20130620',
'uploader_id': 'naomipq', 'uploader_id': 'naomipq',
@ -48,6 +50,7 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by britneyspears', 'title': 'Video by britneyspears',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1453760977, 'timestamp': 1453760977,
'upload_date': '20160125', 'upload_date': '20160125',
'uploader_id': 'britneyspears', 'uploader_id': 'britneyspears',
@ -86,6 +89,24 @@ class InstagramIE(InfoExtractor):
'title': 'Post by instagram', 'title': 'Post by instagram',
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957', 'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
}, },
}, {
# IGTV
'url': 'https://www.instagram.com/tv/BkfuX9UB-eK/',
'info_dict': {
'id': 'BkfuX9UB-eK',
'ext': 'mp4',
'title': 'Fingerboarding Tricks with @cass.fb',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 53.83,
'timestamp': 1530032919,
'upload_date': '20180626',
'uploader_id': 'instagram',
'uploader': 'Instagram',
'like_count': int,
'comment_count': int,
'comments': list,
'description': 'Meet Cass Hirst (@cass.fb), a fingerboarding pro who can perform tiny ollies and kickflips while blindfolded.',
}
}, { }, {
'url': 'https://instagram.com/p/-Cmh1cukG2/', 'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True, 'only_matching': True,
@ -159,7 +180,9 @@ class InstagramIE(InfoExtractor):
description = try_get( description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'], media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption') compat_str) or media.get('caption')
title = media.get('title')
thumbnail = media.get('display_src') or media.get('display_url') thumbnail = media.get('display_src') or media.get('display_url')
duration = float_or_none(media.get('video_duration'))
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date')) timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name') uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username') uploader_id = media.get('owner', {}).get('username')
@ -200,9 +223,10 @@ class InstagramIE(InfoExtractor):
continue continue
entries.append({ entries.append({
'id': node.get('shortcode') or node['id'], 'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num, 'title': node.get('title') or 'Video %d' % edge_num,
'url': node_video_url, 'url': node_video_url,
'thumbnail': node.get('display_url'), 'thumbnail': node.get('display_url'),
'duration': float_or_none(node.get('video_duration')),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])), 'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])), 'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')), 'view_count': int_or_none(node.get('video_view_count')),
@ -239,8 +263,9 @@ class InstagramIE(InfoExtractor):
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by %s' % uploader_id, 'title': title or 'Video by %s' % uploader_id,
'description': description, 'description': description,
'duration': duration,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'timestamp': timestamp, 'timestamp': timestamp,
'uploader_id': uploader_id, 'uploader_id': uploader_id,

View File

@ -29,34 +29,51 @@ class JamendoIE(InfoExtractor):
'id': '196219', 'id': '196219',
'display_id': 'stories-from-emona-i', 'display_id': 'stories-from-emona-i',
'ext': 'flac', 'ext': 'flac',
'title': 'Maya Filipič - Stories from Emona I', # 'title': 'Maya Filipič - Stories from Emona I',
'artist': 'Maya Filipič', 'title': 'Stories from Emona I',
# 'artist': 'Maya Filipič',
'track': 'Stories from Emona I', 'track': 'Stories from Emona I',
'duration': 210, 'duration': 210,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1217438117, 'timestamp': 1217438117,
'upload_date': '20080730', 'upload_date': '20080730',
'license': 'by-nc-nd',
'view_count': int,
'like_count': int,
'average_rating': int,
'tags': ['piano', 'peaceful', 'newage', 'strings', 'upbeat'],
} }
}, { }, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock', 'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True, 'only_matching': True,
}] }]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url): def _real_extract(self, url):
track_id, display_id = self._VALID_URL_RE.match(url).groups() track_id, display_id = self._VALID_URL_RE.match(url).groups()
webpage = self._download_webpage( # webpage = self._download_webpage(
'https://www.jamendo.com/track/' + track_id, track_id) # 'https://www.jamendo.com/track/' + track_id, track_id)
models = self._parse_json(self._html_search_regex( # models = self._parse_json(self._html_search_regex(
r"data-bundled-models='([^']+)", # r"data-bundled-models='([^']+)",
webpage, 'bundled models'), track_id) # webpage, 'bundled models'), track_id)
track = models['track']['models'][0] # track = models['track']['models'][0]
track = self._call_api('track', track_id)
title = track_name = track['name'] title = track_name = track['name']
get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {} # get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
artist = get_model('artist') # artist = get_model('artist')
artist_name = artist.get('name') # artist_name = artist.get('name')
if artist_name: # if artist_name:
title = '%s - %s' % (artist_name, title) # title = '%s - %s' % (artist_name, title)
album = get_model('album') # album = get_model('album')
formats = [{ formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294' 'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@ -74,7 +91,7 @@ class JamendoIE(InfoExtractor):
urls = [] urls = []
thumbnails = [] thumbnails = []
for _, covers in track.get('cover', {}).items(): for covers in (track.get('cover') or {}).values():
for cover_id, cover_url in covers.items(): for cover_id, cover_url in covers.items():
if not cover_url or cover_url in urls: if not cover_url or cover_url in urls:
continue continue
@ -88,13 +105,14 @@ class JamendoIE(InfoExtractor):
}) })
tags = [] tags = []
for tag in track.get('tags', []): for tag in (track.get('tags') or []):
tag_name = tag.get('name') tag_name = tag.get('name')
if not tag_name: if not tag_name:
continue continue
tags.append(tag_name) tags.append(tag_name)
stats = track.get('stats') or {} stats = track.get('stats') or {}
license = track.get('licenseCC') or []
return { return {
'id': track_id, 'id': track_id,
@ -103,11 +121,11 @@ class JamendoIE(InfoExtractor):
'title': title, 'title': title,
'description': track.get('description'), 'description': track.get('description'),
'duration': int_or_none(track.get('duration')), 'duration': int_or_none(track.get('duration')),
'artist': artist_name, # 'artist': artist_name,
'track': track_name, 'track': track_name,
'album': album.get('name'), # 'album': album.get('name'),
'formats': formats, 'formats': formats,
'license': '-'.join(track.get('licenseCC', [])) or None, 'license': '-'.join(license) if license else None,
'timestamp': int_or_none(track.get('dateCreated')), 'timestamp': int_or_none(track.get('dateCreated')),
'view_count': int_or_none(stats.get('listenedAll')), 'view_count': int_or_none(stats.get('listenedAll')),
'like_count': int_or_none(stats.get('favorited')), 'like_count': int_or_none(stats.get('favorited')),
@ -116,9 +134,9 @@ class JamendoIE(InfoExtractor):
} }
class JamendoAlbumIE(InfoExtractor): class JamendoAlbumIE(JamendoIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
'url': 'https://www.jamendo.com/album/121486/duck-on-cover', 'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': { 'info_dict': {
'id': '121486', 'id': '121486',
@ -151,17 +169,7 @@ class JamendoAlbumIE(InfoExtractor):
'params': { 'params': {
'playlistend': 2 'playlistend': 2
} }
} }]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url): def _real_extract(self, url):
album_id = self._match_id(url) album_id = self._match_id(url)
@ -169,7 +177,7 @@ class JamendoAlbumIE(InfoExtractor):
album_name = album.get('name') album_name = album.get('name')
entries = [] entries = []
for track in album.get('tracks', []): for track in (album.get('tracks') or []):
track_id = track.get('id') track_id = track.get('id')
if not track_id: if not track_id:
continue continue

View File

@ -120,7 +120,7 @@ class KalturaIE(InfoExtractor):
def _extract_urls(webpage): def _extract_urls(webpage):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
finditer = ( finditer = (
re.finditer( list(re.finditer(
r"""(?xs) r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\( kWidget\.(?:thumb)?[Ee]mbed\(
\{.*? \{.*?
@ -128,8 +128,8 @@ class KalturaIE(InfoExtractor):
(?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*? (?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s* (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\}) (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage) """, webpage))
or re.finditer( or list(re.finditer(
r'''(?xs) r'''(?xs)
(?P<q1>["']) (?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)* (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@ -142,16 +142,16 @@ class KalturaIE(InfoExtractor):
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s* \[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
) )
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3) (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage) ''', webpage))
or re.finditer( or list(re.finditer(
r'''(?xs) r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["']) <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])\s*
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+) (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+) [?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
(?P=q1) (?P=q1)
''', webpage) ''', webpage))
) )
urls = [] urls = []
for mobj in finditer: for mobj in finditer:

View File

@ -120,6 +120,26 @@ class LBRYIE(LBRYBaseIE):
'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212', 'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212',
'vcodec': 'none', 'vcodec': 'none',
} }
}, {
# HLS
'url': 'https://odysee.com/@gardeningincanada:b/plants-i-will-never-grow-again.-the:e',
'md5': 'fc82f45ea54915b1495dd7cb5cc1289f',
'info_dict': {
'id': 'e51671357333fe22ae88aad320bde2f6f96b1410',
'ext': 'mp4',
'title': 'PLANTS I WILL NEVER GROW AGAIN. THE BLACK LIST PLANTS FOR A CANADIAN GARDEN | Gardening in Canada 🍁',
'description': 'md5:9c539c6a03fb843956de61a4d5288d5e',
'timestamp': 1618254123,
'upload_date': '20210412',
'release_timestamp': 1618254002,
'release_date': '20210412',
'tags': list,
'duration': 554,
'channel': 'Gardening In Canada',
'channel_id': 'b8be0e93b423dad221abe29545fbe8ec36e806bc',
'channel_url': 'https://odysee.com/@gardeningincanada:b8be0e93b423dad221abe29545fbe8ec36e806bc',
'formats': 'mincount:3',
}
}, { }, {
'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e', 'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e',
'only_matching': True, 'only_matching': True,
@ -163,10 +183,18 @@ class LBRYIE(LBRYBaseIE):
streaming_url = self._call_api_proxy( streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url'] 'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
info = self._parse_stream(result, url) info = self._parse_stream(result, url)
urlh = self._request_webpage(
streaming_url, display_id, note='Downloading streaming redirect url info')
if determine_ext(urlh.geturl()) == 'm3u8':
info['formats'] = self._extract_m3u8_formats(
urlh.geturl(), display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(info['formats'])
else:
info['url'] = streaming_url
info.update({ info.update({
'id': claim_id, 'id': claim_id,
'title': title, 'title': title,
'url': streaming_url,
}) })
return info return info

View File

@ -4,7 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
str_or_none,
)
class LineTVIE(InfoExtractor): class LineTVIE(InfoExtractor):
@ -88,3 +94,137 @@ class LineTVIE(InfoExtractor):
for thumbnail in video_info.get('thumbnails', {}).get('list', [])], for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
'view_count': video_info.get('meta', {}).get('count'), 'view_count': video_info.get('meta', {}).get('count'),
} }
class LineLiveBaseIE(InfoExtractor):
_API_BASE_URL = 'https://live-api.line-apps.com/web/v4.0/channel/'
def _parse_broadcast_item(self, item):
broadcast_id = compat_str(item['id'])
title = item['title']
is_live = item.get('isBroadcastingNow')
thumbnails = []
for thumbnail_id, thumbnail_url in (item.get('thumbnailURLs') or {}).items():
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
})
channel = item.get('channel') or {}
channel_id = str_or_none(channel.get('id'))
return {
'id': broadcast_id,
'title': self._live_title(title) if is_live else title,
'thumbnails': thumbnails,
'timestamp': int_or_none(item.get('createdAt')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': 'https://live.line.me/channels/' + channel_id if channel_id else None,
'duration': int_or_none(item.get('archiveDuration')),
'view_count': int_or_none(item.get('viewerCount')),
'comment_count': int_or_none(item.get('chatCount')),
'is_live': is_live,
}
class LineLiveIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<channel_id>\d+)/broadcast/(?P<id>\d+)'
_TESTS = [{
'url': 'https://live.line.me/channels/4867368/broadcast/16331360',
'md5': 'bc931f26bf1d4f971e3b0982b3fab4a3',
'info_dict': {
'id': '16331360',
'title': '振りコピ講座😙😙😙',
'ext': 'mp4',
'timestamp': 1617095132,
'upload_date': '20210330',
'channel': '白川ゆめか',
'channel_id': '4867368',
'view_count': int,
'comment_count': int,
'is_live': False,
}
}, {
# archiveStatus == 'DELETED'
'url': 'https://live.line.me/channels/4778159/broadcast/16378488',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id, broadcast_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
self._API_BASE_URL + '%s/broadcast/%s' % (channel_id, broadcast_id),
broadcast_id)
item = broadcast['item']
info = self._parse_broadcast_item(item)
protocol = 'm3u8' if info['is_live'] else 'm3u8_native'
formats = []
for k, v in (broadcast.get(('live' if info['is_live'] else 'archived') + 'HLSURLs') or {}).items():
if not v:
continue
if k == 'abr':
formats.extend(self._extract_m3u8_formats(
v, broadcast_id, 'mp4', protocol,
m3u8_id='hls', fatal=False))
continue
f = {
'ext': 'mp4',
'format_id': 'hls-' + k,
'protocol': protocol,
'url': v,
}
if not k.isdigit():
f['vcodec'] = 'none'
formats.append(f)
if not formats:
archive_status = item.get('archiveStatus')
if archive_status != 'ARCHIVED':
raise ExtractorError('this video has been ' + archive_status.lower(), expected=True)
self._sort_formats(formats)
info['formats'] = formats
return info
class LineLiveChannelIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<id>\d+)(?!/broadcast/\d+)(?:[/?&#]|$)'
_TEST = {
'url': 'https://live.line.me/channels/5893542',
'info_dict': {
'id': '5893542',
'title': 'いくらちゃん',
'description': 'md5:c3a4af801f43b2fac0b02294976580be',
},
'playlist_mincount': 29
}
def _archived_broadcasts_entries(self, archived_broadcasts, channel_id):
while True:
for row in (archived_broadcasts.get('rows') or []):
share_url = str_or_none(row.get('shareURL'))
if not share_url:
continue
info = self._parse_broadcast_item(row)
info.update({
'_type': 'url',
'url': share_url,
'ie_key': LineLiveIE.ie_key(),
})
yield info
if not archived_broadcasts.get('hasNextPage'):
return
archived_broadcasts = self._download_json(
self._API_BASE_URL + channel_id + '/archived_broadcasts',
channel_id, query={
'lastId': info['id'],
})
def _real_extract(self, url):
channel_id = self._match_id(url)
channel = self._download_json(self._API_BASE_URL + channel_id, channel_id)
return self.playlist_result(
self._archived_broadcasts_entries(channel.get('archivedBroadcasts') or {}, channel_id),
channel_id, channel.get('title'), channel.get('information'))

View File

@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MaoriTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?maoritelevision\.com/shows/(?:[^/]+/)+(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.maoritelevision.com/shows/korero-mai/S01E054/korero-mai-series-1-episode-54',
'md5': '5ade8ef53851b6a132c051b1cd858899',
'info_dict': {
'id': '4774724855001',
'ext': 'mp4',
'title': 'Kōrero Mai, Series 1 Episode 54',
'upload_date': '20160226',
'timestamp': 1456455018,
'description': 'md5:59bde32fd066d637a1a55794c56d8dcb',
'uploader_id': '1614493167001',
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1614493167001/HJlhIQhQf_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(
r'data-main-video-id=["\'](\d+)', webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'BrightcoveNew', brightcove_id)

View File

@ -15,33 +15,39 @@ from ..utils import (
class MedalTVIE(InfoExtractor): class MedalTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://medal.tv/clips/34934644/3Is9zyGMoBMr', 'url': 'https://medal.tv/clips/2mA60jWAGQCBH',
'md5': '7b07b064331b1cf9e8e5c52a06ae68fa', 'md5': '7b07b064331b1cf9e8e5c52a06ae68fa',
'info_dict': { 'info_dict': {
'id': '34934644', 'id': '2mA60jWAGQCBH',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Quad Cold', 'title': 'Quad Cold',
'description': 'Medal,https://medal.tv/desktop/', 'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'MowgliSB', 'uploader': 'MowgliSB',
'timestamp': 1603165266, 'timestamp': 1603165266,
'upload_date': '20201020', 'upload_date': '20201020',
'uploader_id': 10619174, 'uploader_id': '10619174',
} }
}, { }, {
'url': 'https://medal.tv/clips/36787208', 'url': 'https://medal.tv/clips/2um24TWdty0NA',
'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148', 'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148',
'info_dict': { 'info_dict': {
'id': '36787208', 'id': '2um24TWdty0NA',
'ext': 'mp4', 'ext': 'mp4',
'title': 'u tk me i tk u bigger', 'title': 'u tk me i tk u bigger',
'description': 'Medal,https://medal.tv/desktop/', 'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'Mimicc', 'uploader': 'Mimicc',
'timestamp': 1605580939, 'timestamp': 1605580939,
'upload_date': '20201117', 'upload_date': '20201117',
'uploader_id': 5156321, 'uploader_id': '5156321',
} }
}, {
'url': 'https://medal.tv/clips/37rMeFpryCC-9',
'only_matching': True,
}, {
'url': 'https://medal.tv/clips/2WRj40tpY_EU9',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,15 +1,91 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .nhl import NHLBaseIE import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
parse_duration,
parse_iso8601,
try_get,
)
class MLBIE(NHLBaseIE): class MLBBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_video_data(display_id)
video_id = video['id']
title = video['title']
feed = self._get_feed(video)
formats = []
for playback in (feed.get('playbacks') or []):
playback_url = playback.get('url')
if not playback_url:
continue
name = playback.get('name')
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4',
'm3u8_native', m3u8_id=name, fatal=False))
else:
f = {
'format_id': name,
'url': playback_url,
}
mobj = re.search(r'_(\d+)K_(\d+)X(\d+)', name)
if mobj:
f.update({
'height': int(mobj.group(3)),
'tbr': int(mobj.group(1)),
'width': int(mobj.group(2)),
})
mobj = re.search(r'_(\d+)x(\d+)_(\d+)_(\d+)K\.mp4', playback_url)
if mobj:
f.update({
'fps': int(mobj.group(3)),
'height': int(mobj.group(2)),
'tbr': int(mobj.group(4)),
'width': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats)
thumbnails = []
for cut in (try_get(feed, lambda x: x['image']['cuts'], list) or []):
src = cut.get('src')
if not src:
continue
thumbnails.append({
'height': int_or_none(cut.get('height')),
'url': src,
'width': int_or_none(cut.get('width')),
})
language = (video.get('language') or 'EN').lower()
return {
'id': video_id,
'title': title,
'formats': formats,
'description': video.get('description'),
'duration': parse_duration(feed.get('duration')),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(video.get(self._TIMESTAMP_KEY)),
'subtitles': self._extract_mlb_subtitles(feed, language),
}
class MLBIE(MLBBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:[\da-z_-]+\.)*(?P<site>mlb)\.com/ (?:[\da-z_-]+\.)*mlb\.com/
(?: (?:
(?: (?:
(?:[^/]+/)*c-| (?:[^/]+/)*video/[^/]+/c-|
(?: (?:
shared/video/embed/(?:embed|m-internal-embed)\.html| shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp| (?:[^/]+/)+(?:play|index)\.jsp|
@ -18,7 +94,6 @@ class MLBIE(NHLBaseIE):
(?P<id>\d+) (?P<id>\d+)
) )
''' '''
_CONTENT_DOMAIN = 'content.mlb.com'
_TESTS = [ _TESTS = [
{ {
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933', 'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
@ -76,18 +151,6 @@ class MLBIE(NHLBaseIE):
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{
'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
'md5': 'e09e37b552351fddbf4d9e699c924d68',
'info_dict': {
'id': '75609783',
'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429139220,
'upload_date': '20150415',
}
},
{ {
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694', 'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
'only_matching': True, 'only_matching': True,
@ -113,8 +176,92 @@ class MLBIE(NHLBaseIE):
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb', 'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
'only_matching': True,
}
] ]
_TIMESTAMP_KEY = 'date'
@staticmethod
def _get_feed(video):
return video
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for keyword in (feed.get('keywordsAll') or []):
keyword_type = keyword.get('type')
if keyword_type and keyword_type.startswith('closed_captions_location_'):
cc_location = keyword.get('value')
if cc_location:
subtitles.setdefault(language, []).append({
'url': cc_location,
})
return subtitles
def _download_video_data(self, display_id):
return self._download_json(
'http://content.mlb.com/mlb/item/id/v1/%s/details/web-v1.json' % display_id,
display_id)
class MLBVideoIE(MLBBaseIE):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/(?:[^/]+/)*video/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.mlb.com/mariners/video/ackley-s-spectacular-catch-c34698933',
'md5': '632358dacfceec06bad823b83d21df2d',
'info_dict': {
'id': 'c04a8863-f569-42e6-9f87-992393657614',
'ext': 'mp4',
'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66,
'timestamp': 1405995000,
'upload_date': '20140722',
'thumbnail': r're:^https?://.+',
},
}
_TIMESTAMP_KEY = 'timestamp'
@classmethod
def suitable(cls, url):
return False if MLBIE.suitable(url) else super(MLBVideoIE, cls).suitable(url)
@staticmethod
def _get_feed(video):
return video['feeds'][0]
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for cc_location in (feed.get('closedCaptions') or []):
subtitles.setdefault(language, []).append({
'url': cc_location,
})
def _download_video_data(self, display_id):
# https://www.mlb.com/data-service/en/videos/[SLUG]
return self._download_json(
'https://fastball-gateway.mlb.com/graphql',
display_id, query={
'query': '''{
mediaPlayback(ids: "%s") {
description
feeds(types: CMS) {
closedCaptions
duration
image {
cuts {
width
height
src
}
}
playbacks {
name
url
}
}
id
timestamp
title
}
}''' % display_id,
})['data']['mediaPlayback'][0]

View File

@ -255,7 +255,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
@staticmethod @staticmethod
def _extract_child_with_type(parent, t): def _extract_child_with_type(parent, t):
return next(c for c in parent['children'] if c.get('type') == t) for c in parent['children']:
if c.get('type') == t:
return c
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
try: try:
@ -286,7 +288,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
data = self._parse_json(self._search_regex( data = self._parse_json(self._search_regex(
r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None) r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None)
main_container = self._extract_child_with_type(data, 'MainContainer') main_container = self._extract_child_with_type(data, 'MainContainer')
video_player = self._extract_child_with_type(main_container, 'VideoPlayer') ab_testing = self._extract_child_with_type(main_container, 'ABTesting')
video_player = self._extract_child_with_type(ab_testing or main_container, 'VideoPlayer')
mgid = video_player['props']['media']['video']['config']['uri'] mgid = video_player['props']['media']['video']['config']['uri']
return mgid return mgid
@ -320,7 +323,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1', webpage)
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')

View File

@ -182,7 +182,7 @@ class ORFRadioIE(InfoExtractor):
duration = end - start if end and start else None duration = end - start if end and start else None
entries.append({ entries.append({
'id': loop_stream_id.replace('.mp3', ''), 'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id), 'url': 'https://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'title': title, 'title': title,
'description': clean_html(data.get('subtitle')), 'description': clean_html(data.get('subtitle')),
'duration': duration, 'duration': duration,

View File

@ -0,0 +1,148 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
try_get,
)
class PalcoMP3BaseIE(InfoExtractor):
_GQL_QUERY_TMPL = '''{
artist(slug: "%s") {
%s
}
}'''
_ARTIST_FIELDS_TMPL = '''music(slug: "%%s") {
%s
}'''
_MUSIC_FIELDS = '''duration
hls
mp3File
musicID
plays
title'''
def _call_api(self, artist_slug, artist_fields):
return self._download_json(
'https://www.palcomp3.com.br/graphql/', artist_slug, query={
'query': self._GQL_QUERY_TMPL % (artist_slug, artist_fields),
})['data']
def _parse_music(self, music):
music_id = compat_str(music['musicID'])
title = music['title']
formats = []
hls_url = music.get('hls')
if hls_url:
formats.append({
'url': hls_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
mp3_file = music.get('mp3File')
if mp3_file:
formats.append({
'url': mp3_file,
})
return {
'id': music_id,
'title': title,
'formats': formats,
'duration': int_or_none(music.get('duration')),
'view_count': int_or_none(music.get('plays')),
}
def _real_initialize(self):
self._ARTIST_FIELDS_TMPL = self._ARTIST_FIELDS_TMPL % self._MUSIC_FIELDS
def _real_extract(self, url):
artist_slug, music_slug = re.match(self._VALID_URL, url).groups()
artist_fields = self._ARTIST_FIELDS_TMPL % music_slug
music = self._call_api(artist_slug, artist_fields)['artist']['music']
return self._parse_music(music)
class PalcoMP3IE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:song'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/nossas-composicoes-cuida-bem-dela/',
'md5': '99fd6405b2d8fd589670f6db1ba3b358',
'info_dict': {
'id': '3162927',
'ext': 'mp3',
'title': 'Nossas Composições - CUIDA BEM DELA',
'duration': 210,
'view_count': int,
}
}]
@classmethod
def suitable(cls, url):
return False if PalcoMP3VideoIE.suitable(url) else super(PalcoMP3IE, cls).suitable(url)
class PalcoMP3ArtistIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:artist'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com.br/condedoforro/',
'info_dict': {
'id': '358396',
'title': 'Conde do Forró',
},
'playlist_mincount': 188,
}]
_ARTIST_FIELDS_TMPL = '''artistID
musics {
nodes {
%s
}
}
name'''
@ classmethod
def suitable(cls, url):
return False if re.match(PalcoMP3IE._VALID_URL, url) else super(PalcoMP3ArtistIE, cls).suitable(url)
def _real_extract(self, url):
artist_slug = self._match_id(url)
artist = self._call_api(artist_slug, self._ARTIST_FIELDS_TMPL)['artist']
def entries():
for music in (try_get(artist, lambda x: x['musics']['nodes'], list) or []):
yield self._parse_music(music)
return self.playlist_result(
entries(), str_or_none(artist.get('artistID')), artist.get('name'))
class PalcoMP3VideoIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:video'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)/?#clipe'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/maiara-e-maraisa-voce-faz-falta-aqui-ao-vivo-em-vicosa-mg/#clipe',
'add_ie': ['Youtube'],
'info_dict': {
'id': '_pD1nR2qqPg',
'ext': 'mp4',
'title': 'Maiara e Maraisa - Você Faz Falta Aqui - DVD Ao Vivo Em Campo Grande',
'description': 'md5:7043342c09a224598e93546e98e49282',
'upload_date': '20161107',
'uploader_id': 'maiaramaraisaoficial',
'uploader': 'Maiara e Maraisa',
}
}]
_MUSIC_FIELDS = 'youtubeID'
def _parse_music(self, music):
youtube_id = music['youtubeID']
return self.url_result(youtube_id, 'Youtube', youtube_id)

View File

@ -599,11 +599,13 @@ class PeerTubeIE(InfoExtractor):
else: else:
age_limit = None age_limit = None
webpage_url = 'https://%s/videos/watch/%s' % (host, video_id)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': urljoin(url, video.get('thumbnailPath')), 'thumbnail': urljoin(webpage_url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')), 'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName', compat_str), 'uploader': account_data('displayName', compat_str),
'uploader_id': str_or_none(account_data('id', int)), 'uploader_id': str_or_none(account_data('id', int)),
@ -621,5 +623,6 @@ class PeerTubeIE(InfoExtractor):
'tags': try_get(video, lambda x: x['tags'], list), 'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories, 'categories': categories,
'formats': formats, 'formats': formats,
'subtitles': subtitles 'subtitles': subtitles,
'webpage_url': webpage_url,
} }

View File

@ -9,8 +9,9 @@ from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
merge_dicts, merge_dicts,
try_get,
unified_timestamp, unified_timestamp,
xpath_text, urljoin,
) )
@ -27,10 +28,11 @@ class PhoenixIE(ZDFBaseIE):
'title': 'Wohin führt der Protest in der Pandemie?', 'title': 'Wohin führt der Protest in der Pandemie?',
'description': 'md5:7d643fe7f565e53a24aac036b2122fbd', 'description': 'md5:7d643fe7f565e53a24aac036b2122fbd',
'duration': 1691, 'duration': 1691,
'timestamp': 1613906100, 'timestamp': 1613902500,
'upload_date': '20210221', 'upload_date': '20210221',
'uploader': 'Phoenix', 'uploader': 'Phoenix',
'channel': 'corona nachgehakt', 'series': 'corona nachgehakt',
'episode': 'Wohin führt der Protest in der Pandemie?',
}, },
}, { }, {
# Youtube embed # Youtube embed
@ -79,50 +81,53 @@ class PhoenixIE(ZDFBaseIE):
video_id = compat_str(video.get('basename') or video.get('content')) video_id = compat_str(video.get('basename') or video.get('content'))
details = self._download_xml( details = self._download_json(
'https://www.phoenix.de/php/mediaplayer/data/beitrags_details.php', 'https://www.phoenix.de/php/mediaplayer/data/beitrags_details.php',
video_id, 'Downloading details XML', query={ video_id, 'Downloading details JSON', query={
'ak': 'web', 'ak': 'web',
'ptmd': 'true', 'ptmd': 'true',
'id': video_id, 'id': video_id,
'profile': 'player2', 'profile': 'player2',
}) })
title = title or xpath_text( title = title or details['title']
details, './/information/title', 'title', fatal=True) content_id = details['tracking']['nielsen']['content']['assetid']
content_id = xpath_text(
details, './/video/details/basename', 'content id', fatal=True)
info = self._extract_ptmd( info = self._extract_ptmd(
'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/%s' % content_id, 'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/%s' % content_id,
content_id, None, url) content_id, None, url)
timestamp = unified_timestamp(xpath_text(details, './/details/airtime')) duration = int_or_none(try_get(
details, lambda x: x['tracking']['nielsen']['content']['length']))
timestamp = unified_timestamp(details.get('editorialDate'))
series = try_get(
details, lambda x: x['tracking']['nielsen']['content']['program'],
compat_str)
episode = title if details.get('contentType') == 'episode' else None
thumbnails = [] thumbnails = []
for node in details.findall('.//teaserimages/teaserimage'): teaser_images = try_get(details, lambda x: x['teaserImageRef']['layouts'], dict) or {}
thumbnail_url = node.text for thumbnail_key, thumbnail_url in teaser_images.items():
thumbnail_url = urljoin(url, thumbnail_url)
if not thumbnail_url: if not thumbnail_url:
continue continue
thumbnail = { thumbnail = {
'url': thumbnail_url, 'url': thumbnail_url,
} }
thumbnail_key = node.get('key') m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if thumbnail_key: if m:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key) thumbnail['width'] = int(m.group(1))
if m: thumbnail['height'] = int(m.group(2))
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail) thumbnails.append(thumbnail)
return merge_dicts(info, { return merge_dicts(info, {
'id': content_id, 'id': content_id,
'title': title, 'title': title,
'description': xpath_text(details, './/information/detail'), 'description': details.get('leadParagraph'),
'duration': int_or_none(xpath_text(details, './/details/lengthSec')), 'duration': duration,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'timestamp': timestamp, 'timestamp': timestamp,
'uploader': xpath_text(details, './/details/channel'), 'uploader': details.get('tvService'),
'uploader_id': xpath_text(details, './/details/originChannelId'), 'series': series,
'channel': xpath_text(details, './/details/originChannelTitle'), 'episode': episode,
}) })

View File

@ -1,22 +1,15 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
try_get,
update_url_query,
urlencode_postdata,
) )
class PicartoIE(InfoExtractor): class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)(?:/(?P<token>[a-zA-Z0-9]+))?' _VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
_TEST = { _TEST = {
'url': 'https://picarto.tv/Setz', 'url': 'https://picarto.tv/Setz',
'info_dict': { 'info_dict': {
@ -34,65 +27,46 @@ class PicartoIE(InfoExtractor):
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url) return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) channel_id = self._match_id(url)
channel_id = mobj.group('id')
metadata = self._download_json( data = self._download_json(
'https://api.picarto.tv/v1/channel/name/' + channel_id, 'https://ptvintern.picarto.tv/ptvapi', channel_id, query={
channel_id) 'query': '''{
channel(name: "%s") {
adult
id
online
stream_name
title
}
getLoadBalancerUrl(channel_name: "%s") {
url
}
}''' % (channel_id, channel_id),
})['data']
metadata = data['channel']
if metadata.get('online') is False: if metadata.get('online') == 0:
raise ExtractorError('Stream is offline', expected=True) raise ExtractorError('Stream is offline', expected=True)
title = metadata['title']
cdn_data = self._download_json( cdn_data = self._download_json(
'https://picarto.tv/process/channel', channel_id, data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js',
data=urlencode_postdata({'loadbalancinginfo': channel_id}), channel_id, 'Downloading load balancing info')
note='Downloading load balancing info')
token = mobj.group('token') or 'public'
params = {
'con': int(time.time() * 1000),
'token': token,
}
prefered_edge = cdn_data.get('preferedEdge')
formats = [] formats = []
for source in (cdn_data.get('source') or []):
for edge in cdn_data['edges']: source_url = source.get('url')
edge_ep = edge.get('ep') if not source_url:
if not edge_ep or not isinstance(edge_ep, compat_str):
continue continue
edge_id = edge.get('id') source_type = source.get('type')
for tech in cdn_data['techs']: if source_type == 'html5/application/vnd.apple.mpegurl':
tech_label = tech.get('label') formats.extend(self._extract_m3u8_formats(
tech_type = tech.get('type') source_url, channel_id, 'mp4', m3u8_id='hls', fatal=False))
preference = 0 elif source_type == 'html5/video/mp4':
if edge_id == prefered_edge: formats.append({
preference += 1 'url': source_url,
format_id = [] })
if edge_id:
format_id.append(edge_id)
if tech_type == 'application/x-mpegurl' or tech_label == 'HLS':
format_id.append('hls')
formats.extend(self._extract_m3u8_formats(
update_url_query(
'https://%s/hls/%s/index.m3u8'
% (edge_ep, channel_id), params),
channel_id, 'mp4', preference=preference,
m3u8_id='-'.join(format_id), fatal=False))
continue
elif tech_type == 'video/mp4' or tech_label == 'MP4':
format_id.append('mp4')
formats.append({
'url': update_url_query(
'https://%s/mp4/%s.mp4' % (edge_ep, channel_id),
params),
'format_id': '-'.join(format_id),
'preference': preference,
})
else:
# rtmp format does not seem to work
continue
self._sort_formats(formats) self._sort_formats(formats)
mature = metadata.get('adult') mature = metadata.get('adult')
@ -103,10 +77,10 @@ class PicartoIE(InfoExtractor):
return { return {
'id': channel_id, 'id': channel_id,
'title': self._live_title(metadata.get('title') or channel_id), 'title': self._live_title(title.strip()),
'is_live': True, 'is_live': True,
'thumbnail': try_get(metadata, lambda x: x['thumbnails']['web']),
'channel': channel_id, 'channel': channel_id,
'channel_id': metadata.get('id'),
'channel_url': 'https://picarto.tv/%s' % channel_id, 'channel_url': 'https://picarto.tv/%s' % channel_id,
'age_limit': age_limit, 'age_limit': age_limit,
'formats': formats, 'formats': formats,

View File

@ -31,6 +31,7 @@ class PinterestBaseIE(InfoExtractor):
title = (data.get('title') or data.get('grid_title') or video_id).strip() title = (data.get('title') or data.get('grid_title') or video_id).strip()
urls = []
formats = [] formats = []
duration = None duration = None
if extract_formats: if extract_formats:
@ -38,8 +39,9 @@ class PinterestBaseIE(InfoExtractor):
if not isinstance(format_dict, dict): if not isinstance(format_dict, dict):
continue continue
format_url = url_or_none(format_dict.get('url')) format_url = url_or_none(format_dict.get('url'))
if not format_url: if not format_url or format_url in urls:
continue continue
urls.append(format_url)
duration = float_or_none(format_dict.get('duration'), scale=1000) duration = float_or_none(format_dict.get('duration'), scale=1000)
ext = determine_ext(format_url) ext = determine_ext(format_url)
if 'hls' in format_id.lower() or ext == 'm3u8': if 'hls' in format_id.lower() or ext == 'm3u8':

View File

@ -0,0 +1,65 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
smuggle_url,
try_get,
)
class PlayStuffIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?play\.stuff\.co\.nz/details/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://play.stuff.co.nz/details/608778ac1de1c4001a3fa09a',
'md5': 'c82d3669e5247c64bc382577843e5bd0',
'info_dict': {
'id': '6250584958001',
'ext': 'mp4',
'title': 'Episode 1: Rotorua/Mt Maunganui/Tauranga',
'description': 'md5:c154bafb9f0dd02d01fd4100fb1c1913',
'uploader_id': '6005208634001',
'timestamp': 1619491027,
'upload_date': '20210427',
},
'add_ie': ['BrightcoveNew'],
}, {
# geo restricted, bypassable
'url': 'https://play.stuff.co.nz/details/_6155660351001',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
state = self._parse_json(
self._search_regex(
r'__INITIAL_STATE__\s*=\s*({.+?})\s*;', webpage, 'state'),
video_id)
account_id = try_get(
state, lambda x: x['configurations']['accountId'],
compat_str) or '6005208634001'
player_id = try_get(
state, lambda x: x['configurations']['playerId'],
compat_str) or 'default'
entries = []
for item_id, video in state['items'].items():
if not isinstance(video, dict):
continue
asset_id = try_get(
video, lambda x: x['content']['attributes']['assetId'],
compat_str)
if not asset_id:
continue
entries.append(self.url_result(
smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, asset_id),
{'geo_countries': ['NZ']}),
'BrightcoveNew', video_id))
return self.playlist_result(entries, video_id)

View File

@ -393,7 +393,7 @@ query viewClip {
# To somewhat reduce the probability of these consequences # To somewhat reduce the probability of these consequences
# we will sleep random amount of time before each call to ViewClip. # we will sleep random amount of time before each call to ViewClip.
self._sleep( self._sleep(
random.randint(2, 5), display_id, random.randint(5, 10), display_id,
'%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling') '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
if not viewclip: if not viewclip:

View File

@ -398,6 +398,16 @@ class PornHubIE(PornHubBaseIE):
formats = [] formats = []
def add_format(format_url, height=None): def add_format(format_url, height=None):
ext = determine_ext(format_url)
if ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
return
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
return
tbr = None tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url) mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url)
if mobj: if mobj:
@ -417,16 +427,6 @@ class PornHubIE(PornHubBaseIE):
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None) r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date: if upload_date:
upload_date = upload_date.replace('/', '') upload_date = upload_date.replace('/', '')
ext = determine_ext(video_url)
if ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
continue
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
if '/video/get_media' in video_url: if '/video/get_media' in video_url:
medias = self._download_json(video_url, video_id, fatal=False) medias = self._download_json(video_url, video_id, fatal=False)
if isinstance(medias, list): if isinstance(medias, list):

View File

@ -133,8 +133,10 @@ class RedBullEmbedIE(RedBullTVIE):
rrn_id = self._match_id(url) rrn_id = self._match_id(url)
asset_id = self._download_json( asset_id = self._download_json(
'https://edge-graphql.crepo-production.redbullaws.com/v1/graphql', 'https://edge-graphql.crepo-production.redbullaws.com/v1/graphql',
rrn_id, headers={'API-KEY': 'e90a1ff11335423998b100c929ecc866'}, rrn_id, headers={
query={ 'Accept': 'application/json',
'API-KEY': 'e90a1ff11335423998b100c929ecc866',
}, query={
'query': '''{ 'query': '''{
resource(id: "%s", enforceGeoBlocking: false) { resource(id: "%s", enforceGeoBlocking: false) {
%s %s

View File

@ -2,8 +2,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64 import base64
import io
import re import re
import time import sys
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@ -14,56 +15,13 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
qualities,
remove_end, remove_end,
remove_start, remove_start,
sanitized_Request,
std_headers, std_headers,
) )
_bytes_to_chr = (lambda x: x) if sys.version_info[0] == 2 else (lambda x: map(chr, x))
def _decrypt_url(png):
encrypted_data = compat_b64decode(png)
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:]
length = compat_struct_unpack('!I', text_chunk[:4])[0]
# Use bytearray to get integers when iterating in both python 2.x and 3.x
data = bytearray(text_chunk[8:8 + length])
data = [chr(b) for b in data if b != 0]
hash_index = data.index('#')
alphabet_data = data[:hash_index]
url_data = data[hash_index + 1:]
if url_data[0] == 'H' and url_data[3] == '%':
# remove useless HQ%% at the start
url_data = url_data[4:]
alphabet = []
e = 0
d = 0
for l in alphabet_data:
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in url_data:
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
return url
class RTVEALaCartaIE(InfoExtractor): class RTVEALaCartaIE(InfoExtractor):
@ -79,28 +37,31 @@ class RTVEALaCartaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia', 'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
'duration': 5024.566, 'duration': 5024.566,
'series': 'Balonmano',
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}, { }, {
'note': 'Live stream', 'note': 'Live stream',
'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/', 'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
'info_dict': { 'info_dict': {
'id': '1694255', 'id': '1694255',
'ext': 'flv', 'ext': 'mp4',
'title': 'TODO', 'title': 're:^24H LIVE [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
'skip_download': 'live stream',
}, },
'skip': 'The f4m manifest can\'t be used yet',
}, { }, {
'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/', 'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/',
'md5': 'e55e162379ad587e9640eda4f7353c0f', 'md5': 'd850f3c8731ea53952ebab489cf81cbf',
'info_dict': { 'info_dict': {
'id': '4236788', 'id': '4236788',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Servir y proteger - Capítulo 104 ', 'title': 'Servir y proteger - Capítulo 104',
'duration': 3222.0, 'duration': 3222.0,
}, },
'params': { 'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve', 'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
'only_matching': True, 'only_matching': True,
@ -111,58 +72,102 @@ class RTVEALaCartaIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8') user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8')
manager_info = self._download_json( self._manager = self._download_json(
'http://www.rtve.es/odin/loki/' + user_agent_b64, 'http://www.rtve.es/odin/loki/' + user_agent_b64,
None, 'Fetching manager info') None, 'Fetching manager info')['manager']
self._manager = manager_info['manager']
@staticmethod
def _decrypt_url(png):
encrypted_data = io.BytesIO(compat_b64decode(png)[8:])
while True:
length = compat_struct_unpack('!I', encrypted_data.read(4))[0]
chunk_type = encrypted_data.read(4)
if chunk_type == b'IEND':
break
data = encrypted_data.read(length)
if chunk_type == b'tEXt':
alphabet_data, text = data.split(b'\0')
quality, url_data = text.split(b'%%')
alphabet = []
e = 0
d = 0
for l in _bytes_to_chr(alphabet_data):
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in _bytes_to_chr(url_data):
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
yield quality.decode(), url
encrypted_data.read(4) # CRC
def _extract_png_formats(self, video_id):
png = self._download_webpage(
'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id),
video_id, 'Downloading url information', query={'q': 'v2'})
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
formats = []
for quality, video_url in self._decrypt_url(png):
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, 'dash', fatal=False))
else:
formats.append({
'format_id': quality,
'quality': q(quality),
'url': video_url,
})
self._sort_formats(formats)
return formats
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
info = self._download_json( info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id, 'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0] video_id)['page']['items'][0]
if info['state'] == 'DESPU': if info['state'] == 'DESPU':
raise ExtractorError('The video is no longer available', expected=True) raise ExtractorError('The video is no longer available', expected=True)
title = info['title'] title = info['title'].strip()
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id) formats = self._extract_png_formats(video_id)
png_request = sanitized_Request(png_url)
png_request.add_header('Referer', url)
png = self._download_webpage(png_request, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
ext = determine_ext(video_url)
formats = []
if not video_url.endswith('.f4m') and ext != 'm3u8':
if '?' not in video_url:
video_url = video_url.replace('resources/', 'auth/resources/')
video_url = video_url.replace('.net.rtve', '.multimedia.cdn.rtve')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'url': video_url,
})
self._sort_formats(formats)
subtitles = None subtitles = None
if info.get('sbtFile') is not None: sbt_file = info.get('sbtFile')
subtitles = self.extract_subtitles(video_id, info['sbtFile']) if sbt_file:
subtitles = self.extract_subtitles(video_id, sbt_file)
is_live = info.get('live') is True
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title) if is_live else title,
'formats': formats, 'formats': formats,
'thumbnail': info.get('image'), 'thumbnail': info.get('image'),
'page_url': url,
'subtitles': subtitles, 'subtitles': subtitles,
'duration': float_or_none(info.get('duration'), scale=1000), 'duration': float_or_none(info.get('duration'), 1000),
'is_live': is_live,
'series': info.get('programTitle'),
} }
def _get_subtitles(self, video_id, sub_file): def _get_subtitles(self, video_id, sub_file):
@ -174,48 +179,26 @@ class RTVEALaCartaIE(InfoExtractor):
for s in subs) for s in subs)
class RTVEInfantilIE(InfoExtractor): class RTVEInfantilIE(RTVEALaCartaIE):
IE_NAME = 'rtve.es:infantil' IE_NAME = 'rtve.es:infantil'
IE_DESC = 'RTVE infantil' IE_DESC = 'RTVE infantil'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/(?P<show>[^/]*)/video/(?P<short_title>[^/]*)/(?P<id>[0-9]+)/' _VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/[^/]+/video/[^/]+/(?P<id>[0-9]+)/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/', 'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/',
'md5': '915319587b33720b8e0357caaa6617e6', 'md5': '5747454717aedf9f9fdf212d1bcfc48d',
'info_dict': { 'info_dict': {
'id': '3040283', 'id': '3040283',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Maneras de vivir', 'title': 'Maneras de vivir',
'thumbnail': 'http://www.rtve.es/resources/jpg/6/5/1426182947956.JPG', 'thumbnail': r're:https?://.+/1426182947956\.JPG',
'duration': 357.958, 'duration': 357.958,
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}] }]
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0]
webpage = self._download_webpage(url, video_id) class RTVELiveIE(RTVEALaCartaIE):
vidplayer_id = self._search_regex(
r' id="vidplayer([0-9]+)"', webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
return {
'id': video_id,
'ext': 'mp4',
'title': info['title'],
'url': video_url,
'thumbnail': info.get('image'),
'duration': float_or_none(info.get('duration'), scale=1000),
}
class RTVELiveIE(InfoExtractor):
IE_NAME = 'rtve.es:live' IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams' IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)' _VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
@ -225,7 +208,7 @@ class RTVELiveIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'la-1', 'id': 'la-1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$', 'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
}, },
'params': { 'params': {
'skip_download': 'live stream', 'skip_download': 'live stream',
@ -234,29 +217,22 @@ class RTVELiveIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
start_time = time.gmtime()
video_id = mobj.group('id') video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es') title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
title = remove_start(title, 'Estoy viendo ') title = remove_start(title, 'Estoy viendo ')
title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
vidplayer_id = self._search_regex( vidplayer_id = self._search_regex(
(r'playerId=player([0-9]+)', (r'playerId=player([0-9]+)',
r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)', r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
r'data-id=["\'](\d+)'), r'data-id=["\'](\d+)'),
webpage, 'internal video ID') webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
m3u8_url = _decrypt_url(png)
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title),
'formats': formats, 'formats': self._extract_png_formats(vidplayer_id),
'is_live': True, 'is_live': True,
} }

View File

@ -10,7 +10,7 @@ from ..utils import (
class SBSIE(InfoExtractor): class SBSIE(InfoExtractor):
IE_DESC = 'sbs.com.au' IE_DESC = 'sbs.com.au'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=)|news/(?:embeds/)?video/)(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=|/watch/)|news/(?:embeds/)?video/)(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
# Original URL is handled by the generic IE which finds the iframe: # Original URL is handled by the generic IE which finds the iframe:
@ -43,6 +43,9 @@ class SBSIE(InfoExtractor):
}, { }, {
'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866', 'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.sbs.com.au/ondemand/watch/1698704451971',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -2,12 +2,18 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json from ..utils import (
get_element_by_class,
int_or_none,
remove_start,
strip_or_none,
unified_strdate,
)
class ScreencastOMaticIE(InfoExtractor): class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://screencast-o-matic\.com/(?:(?:watch|player)/|embed\?.*?\bsc=)(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TESTS = [{
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl', 'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
'md5': '483583cb80d92588f15ccbedd90f0c18', 'md5': '483583cb80d92588f15ccbedd90f0c18',
'info_dict': { 'info_dict': {
@ -16,22 +22,30 @@ class ScreencastOMaticIE(InfoExtractor):
'title': 'Welcome to 3-4 Philosophy @ DECV!', 'title': 'Welcome to 3-4 Philosophy @ DECV!',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.', 'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
'duration': 369.163, 'duration': 369,
'upload_date': '20141216',
} }
} }, {
'url': 'http://screencast-o-matic.com/player/c2lD3BeOPl',
'only_matching': True,
}, {
'url': 'http://screencast-o-matic.com/embed?ff=true&sc=cbV2r4Q5TL&fromPH=true&a=1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(
'https://screencast-o-matic.com/player/' + video_id, video_id)
jwplayer_data = self._parse_json( info = self._parse_html5_media_entries(url, webpage, video_id)[0]
self._search_regex( info.update({
r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'), 'id': video_id,
video_id, transform_source=js_to_json) 'title': get_element_by_class('overlayTitle', webpage),
'description': strip_or_none(get_element_by_class('overlayDescription', webpage)) or None,
info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False) 'duration': int_or_none(self._search_regex(
info_dict.update({ r'player\.duration\s*=\s*function\(\)\s*{\s*return\s+(\d+);\s*};',
'title': self._og_search_title(webpage), webpage, 'duration', default=None)),
'description': self._og_search_description(webpage), 'upload_date': unified_strdate(remove_start(
get_element_by_class('overlayPublished', webpage), 'Published: ')),
}) })
return info_dict return info

View File

@ -21,6 +21,7 @@ from ..utils import (
class ShahidBaseIE(AWSIE): class ShahidBaseIE(AWSIE):
_AWS_PROXY_HOST = 'api2.shahid.net' _AWS_PROXY_HOST = 'api2.shahid.net'
_AWS_API_KEY = '2RRtuMHx95aNI1Kvtn2rChEuwsCogUd4samGPjLh' _AWS_API_KEY = '2RRtuMHx95aNI1Kvtn2rChEuwsCogUd4samGPjLh'
_VALID_URL_BASE = r'https?://shahid\.mbc\.net/[a-z]{2}/'
def _handle_error(self, e): def _handle_error(self, e):
fail_data = self._parse_json( fail_data = self._parse_json(
@ -49,7 +50,7 @@ class ShahidBaseIE(AWSIE):
class ShahidIE(ShahidBaseIE): class ShahidIE(ShahidBaseIE):
_NETRC_MACHINE = 'shahid' _NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)' _VALID_URL = ShahidBaseIE._VALID_URL_BASE + r'(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924', 'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924',
'info_dict': { 'info_dict': {
@ -73,6 +74,9 @@ class ShahidIE(ShahidBaseIE):
# shahid plus subscriber only # shahid plus subscriber only
'url': 'https://shahid.mbc.net/ar/series/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/episode-90511', 'url': 'https://shahid.mbc.net/ar/series/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/episode-90511',
'only_matching': True 'only_matching': True
}, {
'url': 'https://shahid.mbc.net/en/shows/Ramez-Fi-Al-Shallal-season-1-episode-1/episode-359319',
'only_matching': True
}] }]
def _real_initialize(self): def _real_initialize(self):
@ -168,7 +172,7 @@ class ShahidIE(ShahidBaseIE):
class ShahidShowIE(ShahidBaseIE): class ShahidShowIE(ShahidBaseIE):
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:show|serie)s/[^/]+/(?:show|series)-(?P<id>\d+)' _VALID_URL = ShahidBaseIE._VALID_URL_BASE + r'(?:show|serie)s/[^/]+/(?:show|series)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D8%B1%D8%A7%D9%85%D8%B2-%D9%82%D8%B1%D8%B4-%D8%A7%D9%84%D8%A8%D8%AD%D8%B1/show-79187', 'url': 'https://shahid.mbc.net/ar/shows/%D8%B1%D8%A7%D9%85%D8%B2-%D9%82%D8%B1%D8%B4-%D8%A7%D9%84%D8%A8%D8%AD%D8%B1/show-79187',
'info_dict': { 'info_dict': {

View File

@ -86,10 +86,10 @@ class SharedIE(SharedBaseIE):
class VivoIE(SharedBaseIE): class VivoIE(SharedBaseIE):
IE_DESC = 'vivo.sx' IE_DESC = 'vivo.sx'
_VALID_URL = r'https?://vivo\.sx/(?P<id>[\da-z]{10})' _VALID_URL = r'https?://vivo\.s[xt]/(?P<id>[\da-z]{10})'
_FILE_NOT_FOUND = '>The file you have requested does not exists or has been removed' _FILE_NOT_FOUND = '>The file you have requested does not exists or has been removed'
_TEST = { _TESTS = [{
'url': 'http://vivo.sx/d7ddda0e78', 'url': 'http://vivo.sx/d7ddda0e78',
'md5': '15b3af41be0b4fe01f4df075c2678b2c', 'md5': '15b3af41be0b4fe01f4df075c2678b2c',
'info_dict': { 'info_dict': {
@ -98,7 +98,10 @@ class VivoIE(SharedBaseIE):
'title': 'Chicken', 'title': 'Chicken',
'filesize': 515659, 'filesize': 515659,
}, },
} }, {
'url': 'http://vivo.st/d7ddda0e78',
'only_matching': True,
}]
def _extract_title(self, webpage): def _extract_title(self, webpage):
title = self._html_search_regex( title = self._html_search_regex(

View File

@ -6,9 +6,9 @@ from .mtv import MTVServicesInfoExtractor
class SouthParkIE(MTVServicesInfoExtractor): class SouthParkIE(MTVServicesInfoExtractor):
IE_NAME = 'southpark.cc.com' IE_NAME = 'southpark.cc.com'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark(?:\.cc|studios)\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss' _FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
_TESTS = [{ _TESTS = [{
'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured', 'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured',
@ -23,8 +23,20 @@ class SouthParkIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1', 'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.southparkstudios.com/episodes/h4o269/south-park-stunning-and-brave-season-19-ep-1',
'only_matching': True,
}] }]
def _get_feed_query(self, uri):
return {
'accountOverride': 'intl.mtvi.com',
'arcEp': 'shared.southpark.global',
'ep': '90877963',
'imageEp': 'shared.southpark.global',
'mgid': uri,
}
class SouthParkEsIE(SouthParkIE): class SouthParkEsIE(SouthParkIE):
IE_NAME = 'southpark.cc.com:español' IE_NAME = 'southpark.cc.com:español'

View File

@ -1,82 +1,105 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
clean_html,
float_or_none,
int_or_none,
parse_iso8601, parse_iso8601,
sanitized_Request, strip_or_none,
try_get,
) )
class SportDeutschlandIE(InfoExtractor): class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])' _VALID_URL = r'https?://sportdeutschland\.tv/(?P<id>(?:[^/]+/)?[^?#/&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0', 'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': { 'info_dict': {
'id': 're-live-deutsche-meisterschaften-2020-halbfinals', 'id': '5318cac0275701382770543d7edaf0a0',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals', 'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals - Teil 1',
'categories': ['Badminton-Deutschland'], 'duration': 16106.36,
'view_count': int,
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': int,
'upload_date': '20200201',
'description': 're:.*', # meaningless description for THIS video
}, },
'params': {
'noplaylist': True,
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': {
'id': 'c6e2fdd01f63013854c47054d2ab776f',
'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals',
'description': 'md5:5263ff4c31c04bb780c9f91130b48530',
'duration': 31397,
},
'playlist_count': 2,
}, {
'url': 'https://sportdeutschland.tv/freeride-world-tour-2021-fieberbrunn-oesterreich',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
video_id = mobj.group('id') data = self._download_json(
sport_id = mobj.group('sport') 'https://backend.sportdeutschland.tv/api/permalinks/' + display_id,
display_id, query={'access_token': 'true'})
api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id)
req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json',
'Referer': url,
})
data = self._download_json(req, video_id)
asset = data['asset'] asset = data['asset']
categories = [data['section']['title']] title = (asset.get('title') or asset['label']).strip()
asset_id = asset.get('id') or asset.get('uuid')
formats = [] info = {
smil_url = asset['video'] 'id': asset_id,
if '.smil' in smil_url: 'title': title,
m3u8_url = smil_url.replace('.smil', '.m3u8') 'description': clean_html(asset.get('body') or asset.get('description')) or asset.get('teaser'),
formats.extend( 'duration': int_or_none(asset.get('seconds')),
self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4'))
smil_doc = self._download_xml(
smil_url, video_id, note='Downloading SMIL metadata')
base_url_el = smil_doc.find('./head/meta')
if base_url_el:
base_url = base_url_el.attrib['base']
formats.extend([{
'format_id': 'rmtp',
'url': base_url if base_url_el else n.attrib['src'],
'play_path': n.attrib['src'],
'ext': 'flv',
'preference': -100,
'format_note': 'Seems to fail at example stream',
} for n in smil_doc.findall('./body/video')])
else:
formats.append({'url': smil_url})
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': asset['title'],
'thumbnail': asset.get('image'),
'description': asset.get('teaser'),
'duration': asset.get('duration'),
'categories': categories,
'view_count': asset.get('views'),
'rtmp_live': asset.get('live'),
'timestamp': parse_iso8601(asset.get('date')),
} }
videos = asset.get('videos') or []
if len(videos) > 1:
playlist_id = compat_parse_qs(compat_urllib_parse_urlparse(url).query).get('playlistId', [None])[0]
if playlist_id:
if self._downloader.params.get('noplaylist'):
videos = [videos[int(playlist_id)]]
self.to_screen('Downloading just a single video because of --no-playlist')
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % asset_id)
def entries():
for i, video in enumerate(videos, 1):
video_id = video.get('uuid')
video_url = video.get('url')
if not (video_id and video_url):
continue
formats = self._extract_m3u8_formats(
video_url.replace('.smil', '.m3u8'), video_id, 'mp4', fatal=False)
if not formats:
continue
yield {
'id': video_id,
'formats': formats,
'title': title + ' - ' + (video.get('label') or 'Teil %d' % i),
'duration': float_or_none(video.get('duration')),
}
info.update({
'_type': 'multi_video',
'entries': entries(),
})
else:
formats = self._extract_m3u8_formats(
videos[0]['url'].replace('.smil', '.m3u8'), asset_id, 'mp4')
section_title = strip_or_none(try_get(data, lambda x: x['section']['title']))
info.update({
'formats': formats,
'display_id': asset.get('permalink'),
'thumbnail': try_get(asset, lambda x: x['images'][0]),
'categories': [section_title] if section_title else None,
'view_count': int_or_none(asset.get('views')),
'is_live': asset.get('is_live') is True,
'timestamp': parse_iso8601(asset.get('date') or asset.get('published_at')),
})
return info

View File

@ -146,18 +146,19 @@ class SVTPlayIE(SVTPlayBaseIE):
) )
(?P<svt_id>[^/?#&]+)| (?P<svt_id>[^/?#&]+)|
https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+) https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+)
(?:.*?(?:modalId|id)=(?P<modal_id>[\da-zA-Z-]+))?
) )
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://www.svtplay.se/video/26194546/det-har-ar-himlen', 'url': 'https://www.svtplay.se/video/30479064',
'md5': '2382036fd6f8c994856c323fe51c426e', 'md5': '2382036fd6f8c994856c323fe51c426e',
'info_dict': { 'info_dict': {
'id': 'jNwpV9P', 'id': '8zVbDPA',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Det här är himlen', 'title': 'Designdrömmar i Stenungsund',
'timestamp': 1586044800, 'timestamp': 1615770000,
'upload_date': '20200405', 'upload_date': '20210315',
'duration': 3515, 'duration': 3519,
'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$', 'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$',
'age_limit': 0, 'age_limit': 0,
'subtitles': { 'subtitles': {
@ -173,6 +174,12 @@ class SVTPlayIE(SVTPlayBaseIE):
# AssertionError: Expected test_SVTPlay_jNwpV9P.mp4 to be at least 9.77KiB, but it's only 864.00B # AssertionError: Expected test_SVTPlay_jNwpV9P.mp4 to be at least 9.77KiB, but it's only 864.00B
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://www.svtplay.se/video/30479064/husdrommar/husdrommar-sasong-8-designdrommar-i-stenungsund?modalId=8zVbDPA',
'only_matching': True,
}, {
'url': 'https://www.svtplay.se/video/30684086/rapport/rapport-24-apr-18-00-7?id=e72gVpa',
'only_matching': True,
}, { }, {
# geo restricted to Sweden # geo restricted to Sweden
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten', 'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
@ -219,7 +226,8 @@ class SVTPlayIE(SVTPlayBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id, svt_id = mobj.group('id', 'svt_id') video_id = mobj.group('id')
svt_id = mobj.group('svt_id') or mobj.group('modal_id')
if svt_id: if svt_id:
return self._extract_by_video_id(svt_id) return self._extract_by_video_id(svt_id)
@ -254,6 +262,7 @@ class SVTPlayIE(SVTPlayBaseIE):
if not svt_id: if not svt_id:
svt_id = self._search_regex( svt_id = self._search_regex(
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)', (r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
r'<[^>]+\bdata-rt=["\']top-area-play-button["\'][^>]+\bhref=["\'][^"\']*video/%s/[^"\']*\b(?:modalId|id)=([\da-zA-Z-]+)' % re.escape(video_id),
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)', r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)', r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)',
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"', r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',

View File

@ -107,9 +107,12 @@ class TikTokIE(TikTokBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data = self._parse_json(self._search_regex( page_props = self._parse_json(self._search_regex(
r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script', r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
webpage, 'data'), video_id)['props']['pageProps']['itemInfo']['itemStruct'] webpage, 'data'), video_id)['props']['pageProps']
data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict)
if not data and page_props.get('statusCode') == 10216:
raise ExtractorError('This video is private', expected=True)
return self._extract_video(data, video_id) return self._extract_video(data, video_id)

View File

@ -74,6 +74,12 @@ class TV2DKIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
entries = [] entries = []
def add_entry(partner_id, kaltura_id):
entries.append(self.url_result(
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura',
video_id=kaltura_id))
for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage): for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage):
video = extract_attributes(video_el) video = extract_attributes(video_el)
kaltura_id = video.get('data-entryid') kaltura_id = video.get('data-entryid')
@ -82,9 +88,14 @@ class TV2DKIE(InfoExtractor):
partner_id = video.get('data-partnerid') partner_id = video.get('data-partnerid')
if not partner_id: if not partner_id:
continue continue
entries.append(self.url_result( add_entry(partner_id, kaltura_id)
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura', if not entries:
video_id=kaltura_id)) kaltura_id = self._search_regex(
r'entry_id\s*:\s*["\']([0-9a-z_]+)', webpage, 'kaltura id')
partner_id = self._search_regex(
(r'\\u002Fp\\u002F(\d+)\\u002F', r'/p/(\d+)/'), webpage,
'partner id')
add_entry(partner_id, kaltura_id)
return self.playlist_result(entries) return self.playlist_result(entries)

View File

@ -25,6 +25,10 @@ class TVerIE(InfoExtractor):
}, { }, {
'url': 'https://tver.jp/episode/79622438', 'url': 'https://tver.jp/episode/79622438',
'only_matching': True, 'only_matching': True,
}, {
# subtitle = ' '
'url': 'https://tver.jp/corner/f0068870',
'only_matching': True,
}] }]
_TOKEN = None _TOKEN = None
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
@ -40,28 +44,18 @@ class TVerIE(InfoExtractor):
query={'token': self._TOKEN})['main'] query={'token': self._TOKEN})['main']
p_id = main['publisher_id'] p_id = main['publisher_id']
service = remove_start(main['service'], 'ts_') service = remove_start(main['service'], 'ts_')
info = {
r_id = main['reference_id']
if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'):
r_id = 'ref:' + r_id
bc_url = smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id),
{'geo_countries': ['JP']})
return {
'_type': 'url_transparent', '_type': 'url_transparent',
'description': try_get(main, lambda x: x['note'][0]['text'], compat_str), 'description': try_get(main, lambda x: x['note'][0]['text'], compat_str),
'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])), 'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])),
'url': bc_url,
'ie_key': 'BrightcoveNew',
} }
if service == 'cx':
info.update({
'title': main.get('subtitle') or main['title'],
'url': 'https://i.fod.fujitv.co.jp/plus7/web/%s/%s.html' % (p_id[:4], p_id),
'ie_key': 'FujiTVFODPlus7',
})
else:
r_id = main['reference_id']
if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'):
r_id = 'ref:' + r_id
bc_url = smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id),
{'geo_countries': ['JP']})
info.update({
'url': bc_url,
'ie_key': 'BrightcoveNew',
})
return info

View File

@ -19,6 +19,7 @@ from ..utils import (
strip_or_none, strip_or_none,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_or_none,
xpath_text, xpath_text,
) )
@ -52,6 +53,9 @@ class TwitterBaseIE(InfoExtractor):
return [f] return [f]
def _extract_formats_from_vmap_url(self, vmap_url, video_id): def _extract_formats_from_vmap_url(self, vmap_url, video_id):
vmap_url = url_or_none(vmap_url)
if not vmap_url:
return []
vmap_data = self._download_xml(vmap_url, video_id) vmap_data = self._download_xml(vmap_url, video_id)
formats = [] formats = []
urls = [] urls = []

View File

@ -23,6 +23,8 @@ class VGTVIE(XstreamIE):
'fvn.no/fvntv': 'fvntv', 'fvn.no/fvntv': 'fvntv',
'aftenposten.no/webtv': 'aptv', 'aftenposten.no/webtv': 'aptv',
'ap.vgtv.no/webtv': 'aptv', 'ap.vgtv.no/webtv': 'aptv',
'tv.aftonbladet.se': 'abtv',
# obsolete URL schemas, kept in order to save one HTTP redirect
'tv.aftonbladet.se/abtv': 'abtv', 'tv.aftonbladet.se/abtv': 'abtv',
'www.aftonbladet.se/tv': 'abtv', 'www.aftonbladet.se/tv': 'abtv',
} }
@ -140,6 +142,10 @@ class VGTVIE(XstreamIE):
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk', 'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://tv.aftonbladet.se/video/36015/vulkanutbrott-i-rymden-nu-slapper-nasa-bilderna',
'only_matching': True,
},
{ {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015', 'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'only_matching': True, 'only_matching': True,

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import base64 import base64
import functools import functools
import json
import re import re
import itertools import itertools
@ -17,14 +16,14 @@ from ..compat import (
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
dict_get,
ExtractorError, ExtractorError,
get_element_by_class,
js_to_json, js_to_json,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
OnDemandPagedList, OnDemandPagedList,
parse_filesize, parse_filesize,
RegexNotFoundError, parse_iso8601,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
std_headers, std_headers,
@ -74,25 +73,28 @@ class VimeoBaseInfoExtractor(InfoExtractor):
expected=True) expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _verify_video_password(self, url, video_id, webpage): def _get_video_password(self):
password = self._downloader.params.get('videopassword') password = self._downloader.params.get('videopassword')
if password is None: if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True) raise ExtractorError(
token, vuid = self._extract_xsrft_and_vuid(webpage) 'This video is protected by a password, use the --video-password option',
data = urlencode_postdata({ expected=True)
'password': password, return password
'token': token,
}) def _verify_video_password(self, url, video_id, password, token, vuid):
if url.startswith('http://'): if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url # vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://') url = url.replace('http://', 'https://')
password_request = sanitized_Request(url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Referer', url)
self._set_vimeo_cookie('vuid', vuid) self._set_vimeo_cookie('vuid', vuid)
return self._download_webpage( return self._download_webpage(
password_request, video_id, url + '/password', video_id, 'Verifying the password',
'Verifying the password', 'Wrong password') 'Wrong password', data=urlencode_postdata({
'password': password,
'token': token,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': url,
})
def _extract_xsrft_and_vuid(self, webpage): def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex( xsrft = self._search_regex(
@ -123,10 +125,11 @@ class VimeoBaseInfoExtractor(InfoExtractor):
video_title = video_data['title'] video_title = video_data['title']
live_event = video_data.get('live_event') or {} live_event = video_data.get('live_event') or {}
is_live = live_event.get('status') == 'started' is_live = live_event.get('status') == 'started'
request = config.get('request') or {}
formats = [] formats = []
config_files = video_data.get('files') or config['request'].get('files', {}) config_files = video_data.get('files') or request.get('files') or {}
for f in config_files.get('progressive', []): for f in (config_files.get('progressive') or []):
video_url = f.get('url') video_url = f.get('url')
if not video_url: if not video_url:
continue continue
@ -142,7 +145,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
# TODO: fix handling of 308 status code returned for live archive manifest requests # TODO: fix handling of 308 status code returned for live archive manifest requests
sep_pattern = r'/sep/video/' sep_pattern = r'/sep/video/'
for files_type in ('hls', 'dash'): for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items(): for cdn_name, cdn_data in (try_get(config_files, lambda x: x[files_type]['cdns']) or {}).items():
manifest_url = cdn_data.get('url') manifest_url = cdn_data.get('url')
if not manifest_url: if not manifest_url:
continue continue
@ -188,17 +191,15 @@ class VimeoBaseInfoExtractor(InfoExtractor):
f['preference'] = -40 f['preference'] = -40
subtitles = {} subtitles = {}
text_tracks = config['request'].get('text_tracks') for tt in (request.get('text_tracks') or []):
if text_tracks: subtitles[tt['lang']] = [{
for tt in text_tracks: 'ext': 'vtt',
subtitles[tt['lang']] = [{ 'url': urljoin('https://vimeo.com', tt['url']),
'ext': 'vtt', }]
'url': urljoin('https://vimeo.com', tt['url']),
}]
thumbnails = [] thumbnails = []
if not is_live: if not is_live:
for key, thumb in video_data.get('thumbs', {}).items(): for key, thumb in (video_data.get('thumbs') or {}).items():
thumbnails.append({ thumbnails.append({
'id': key, 'id': key,
'width': int_or_none(key), 'width': int_or_none(key),
@ -278,7 +279,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
)? )?
(?:videos?/)? (?:videos?/)?
(?P<id>[0-9]+) (?P<id>[0-9]+)
(?:/[\da-f]+)? (?:/(?P<unlisted_hash>[\da-f]{10}))?
/?(?:[?&].*)?(?:[#].*)?$ /?(?:[?&].*)?(?:[#].*)?$
''' '''
IE_NAME = 'vimeo' IE_NAME = 'vimeo'
@ -318,6 +319,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 1595, 'duration': 1595,
'upload_date': '20130610', 'upload_date': '20130610',
'timestamp': 1370893156, 'timestamp': 1370893156,
'license': 'by',
}, },
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
@ -331,9 +333,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
'id': '54469442', 'id': '54469442',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012', 'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
'uploader': 'The BLN & Business of Software', 'uploader': 'Business of Software',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
'uploader_id': 'theblnbusinessofsoftware', 'uploader_id': 'businessofsoftware',
'duration': 3610, 'duration': 3610,
'description': None, 'description': None,
}, },
@ -396,6 +398,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'staff', 'uploader_id': 'staff',
'uploader': 'Vimeo Staff', 'uploader': 'Vimeo Staff',
'duration': 62, 'duration': 62,
'subtitles': {
'de': [{'ext': 'vtt'}],
'en': [{'ext': 'vtt'}],
'es': [{'ext': 'vtt'}],
'fr': [{'ext': 'vtt'}],
},
} }
}, },
{ {
@ -468,6 +476,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Unable to download JSON metadata'], 'expected_warnings': ['Unable to download JSON metadata'],
'skip': 'this page is no longer available.',
}, },
{ {
'url': 'http://player.vimeo.com/video/68375962', 'url': 'http://player.vimeo.com/video/68375962',
@ -550,9 +559,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
return urls[0] if urls else None return urls[0] if urls else None
def _verify_player_video_password(self, url, video_id, headers): def _verify_player_video_password(self, url, video_id, headers):
password = self._downloader.params.get('videopassword') password = self._get_video_password()
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
data = urlencode_postdata({ data = urlencode_postdata({
'password': base64.b64encode(password.encode()), 'password': base64.b64encode(password.encode()),
}) })
@ -569,6 +576,37 @@ class VimeoIE(VimeoBaseInfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _extract_from_api(self, video_id, unlisted_hash=None):
token = self._download_json(
'https://vimeo.com/_rv/jwt', video_id, headers={
'X-Requested-With': 'XMLHttpRequest'
})['token']
api_url = 'https://api.vimeo.com/videos/' + video_id
if unlisted_hash:
api_url += ':' + unlisted_hash
video = self._download_json(
api_url, video_id, headers={
'Authorization': 'jwt ' + token,
}, query={
'fields': 'config_url,created_time,description,license,metadata.connections.comments.total,metadata.connections.likes.total,release_time,stats.plays',
})
info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id)
self._vimeo_sort_formats(info['formats'])
get_timestamp = lambda x: parse_iso8601(video.get(x + '_time'))
info.update({
'description': video.get('description'),
'license': video.get('license'),
'release_timestamp': get_timestamp('release'),
'timestamp': get_timestamp('created'),
'view_count': int_or_none(try_get(video, lambda x: x['stats']['plays'])),
})
connections = try_get(
video, lambda x: x['metadata']['connections'], dict) or {}
for k in ('comment', 'like'):
info[k + '_count'] = int_or_none(try_get(connections, lambda x: x[k + 's']['total']))
return info
def _real_extract(self, url): def _real_extract(self, url):
url, data = unsmuggle_url(url, {}) url, data = unsmuggle_url(url, {})
headers = std_headers.copy() headers = std_headers.copy()
@ -577,22 +615,19 @@ class VimeoIE(VimeoBaseInfoExtractor):
if 'Referer' not in headers: if 'Referer' not in headers:
headers['Referer'] = url headers['Referer'] = url
channel_id = self._search_regex( mobj = re.match(self._VALID_URL, url).groupdict()
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None) video_id, unlisted_hash = mobj['id'], mobj.get('unlisted_hash')
if unlisted_hash:
return self._extract_from_api(video_id, unlisted_hash)
# Extract ID from URL
video_id = self._match_id(url)
orig_url = url orig_url = url
is_pro = 'vimeopro.com/' in url is_pro = 'vimeopro.com/' in url
is_player = '://player.vimeo.com/video/' in url
if is_pro: if is_pro:
# some videos require portfolio_id to be present in player url # some videos require portfolio_id to be present in player url
# https://github.com/ytdl-org/youtube-dl/issues/20070 # https://github.com/ytdl-org/youtube-dl/issues/20070
url = self._extract_url(url, self._download_webpage(url, video_id)) url = self._extract_url(url, self._download_webpage(url, video_id))
if not url: if not url:
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
elif is_player:
url = 'https://player.vimeo.com/video/' + video_id
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')): elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
@ -612,14 +647,25 @@ class VimeoIE(VimeoBaseInfoExtractor):
expected=True) expected=True)
raise raise
# Now we begin extracting as much information as we can from what we if '://player.vimeo.com/video/' in url:
# retrieved. First we extract the information common to all extractors, config = self._parse_json(self._search_regex(
# and latter we extract those that are Vimeo specific. r'\bconfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
self.report_extraction(video_id) if config.get('view') == 4:
config = self._verify_player_video_password(
redirect_url, video_id, headers)
info = self._parse_config(config, video_id)
self._vimeo_sort_formats(info['formats'])
return info
if re.search(r'<form[^>]+?id="pw_form"', webpage):
video_password = self._get_video_password()
token, vuid = self._extract_xsrft_and_vuid(webpage)
webpage = self._verify_video_password(
redirect_url, video_id, video_password, token, vuid)
vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None) vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
if vimeo_config: if vimeo_config:
seed_status = vimeo_config.get('seed_status', {}) seed_status = vimeo_config.get('seed_status') or {}
if seed_status.get('state') == 'failed': if seed_status.get('state') == 'failed':
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, seed_status['title']), '%s said: %s' % (self.IE_NAME, seed_status['title']),
@ -628,67 +674,40 @@ class VimeoIE(VimeoBaseInfoExtractor):
cc_license = None cc_license = None
timestamp = None timestamp = None
video_description = None video_description = None
info_dict = {}
# Extract the config JSON channel_id = self._search_regex(
try: r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
try: if channel_id:
config_url = self._html_search_regex( config_url = self._html_search_regex(
r' data-config-url="(.+?)"', webpage, r'\bdata-config-url="([^"]+)"', webpage, 'config URL')
'config URL', default=None) video_description = clean_html(get_element_by_class('description', webpage))
if not config_url: info_dict.update({
# Sometimes new react-based page is served instead of old one that require 'channel_id': channel_id,
# different config URL extraction approach (see 'channel_url': 'https://vimeo.com/channels/' + channel_id,
# https://github.com/ytdl-org/youtube-dl/pull/7209) })
page_config = self._parse_json(self._search_regex(
r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
webpage, 'page config'), video_id)
config_url = page_config['player']['config_url']
cc_license = page_config.get('cc_license')
timestamp = try_get(
page_config, lambda x: x['clip']['uploaded_on'],
compat_str)
video_description = clean_html(dict_get(
page_config, ('description', 'description_html_escaped')))
config = self._download_json(config_url, video_id)
except RegexNotFoundError:
# For pro videos or player.vimeo.com urls
# We try to find out to which variable is assigned the config dic
m_variable_name = re.search(r'(\w)\.video\.id', webpage)
if m_variable_name is not None:
config_re = [r'%s=({[^}].+?});' % re.escape(m_variable_name.group(1))]
else:
config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;')
config_re.append(r'\bconfig\s*=\s*({.+?})\s*;')
config = self._search_regex(config_re, webpage, 'info section',
flags=re.DOTALL)
config = json.loads(config)
except Exception as e:
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
if '_video_password_verified' in data:
raise ExtractorError('video password verification failed!')
self._verify_video_password(redirect_url, video_id, webpage)
return self._real_extract(
smuggle_url(redirect_url, {'_video_password_verified': 'verified'}))
else:
raise ExtractorError('Unable to extract info section',
cause=e)
else: else:
if config.get('view') == 4: page_config = self._parse_json(self._search_regex(
config = self._verify_player_video_password(redirect_url, video_id, headers) r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
webpage, 'page config', default='{}'), video_id, fatal=False)
if not page_config:
return self._extract_from_api(video_id)
config_url = page_config['player']['config_url']
cc_license = page_config.get('cc_license')
clip = page_config.get('clip') or {}
timestamp = clip.get('uploaded_on')
video_description = clean_html(
clip.get('description') or page_config.get('description_html_escaped'))
config = self._download_json(config_url, video_id)
video = config.get('video') or {} video = config.get('video') or {}
vod = video.get('vod') or {} vod = video.get('vod') or {}
def is_rented(): def is_rented():
if '>You rented this title.<' in webpage: if '>You rented this title.<' in webpage:
return True return True
if config.get('user', {}).get('purchased'): if try_get(config, lambda x: x['user']['purchased']):
return True return True
for purchase_option in vod.get('purchase_options', []): for purchase_option in (vod.get('purchase_options') or []):
if purchase_option.get('purchased'): if purchase_option.get('purchased'):
return True return True
label = purchase_option.get('label_string') label = purchase_option.get('label_string')
@ -703,14 +722,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
'https://player.vimeo.com/player/%s' % feature_id, 'https://player.vimeo.com/player/%s' % feature_id,
{'force_feature_id': True}), 'Vimeo') {'force_feature_id': True}), 'Vimeo')
# Extract video description
if not video_description:
video_description = self._html_search_regex(
r'(?s)<div\s+class="[^"]*description[^"]*"[^>]*>(.*?)</div>',
webpage, 'description', default=None)
if not video_description: if not video_description:
video_description = self._html_search_meta( video_description = self._html_search_meta(
'description', webpage, default=None) ['description', 'og:description', 'twitter:description'],
webpage, default=None)
if not video_description and is_pro: if not video_description and is_pro:
orig_webpage = self._download_webpage( orig_webpage = self._download_webpage(
orig_url, video_id, orig_url, video_id,
@ -719,25 +734,14 @@ class VimeoIE(VimeoBaseInfoExtractor):
if orig_webpage: if orig_webpage:
video_description = self._html_search_meta( video_description = self._html_search_meta(
'description', orig_webpage, default=None) 'description', orig_webpage, default=None)
if not video_description and not is_player: if not video_description:
self._downloader.report_warning('Cannot find video description') self._downloader.report_warning('Cannot find video description')
# Extract upload date
if not timestamp: if not timestamp:
timestamp = self._search_regex( timestamp = self._search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, r'<time[^>]+datetime="([^"]+)"', webpage,
'timestamp', default=None) 'timestamp', default=None)
try:
view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, 'view count'))
like_count = int(self._search_regex(r'UserLikes:(\d+)', webpage, 'like count'))
comment_count = int(self._search_regex(r'UserComments:(\d+)', webpage, 'comment count'))
except RegexNotFoundError:
# This info is only available in vimeo.com/{id} urls
view_count = None
like_count = None
comment_count = None
formats = [] formats = []
source_format = self._extract_original_format( source_format = self._extract_original_format(
@ -756,29 +760,20 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1', r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license') webpage, 'license', default=None, group='license')
channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None info_dict.update({
info_dict = {
'formats': formats, 'formats': formats,
'timestamp': unified_timestamp(timestamp), 'timestamp': unified_timestamp(timestamp),
'description': video_description, 'description': video_description,
'webpage_url': url, 'webpage_url': url,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'license': cc_license, 'license': cc_license,
'channel_id': channel_id, })
'channel_url': channel_url,
}
info_dict = merge_dicts(info_dict, info_dict_config, json_ld) return merge_dicts(info_dict, info_dict_config, json_ld)
return info_dict
class VimeoOndemandIE(VimeoIE): class VimeoOndemandIE(VimeoIE):
IE_NAME = 'vimeo:ondemand' IE_NAME = 'vimeo:ondemand'
_VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/([^/]+/)?(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/(?:[^/]+/)?(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# ondemand video not available via https://vimeo.com/id # ondemand video not available via https://vimeo.com/id
'url': 'https://vimeo.com/ondemand/20704', 'url': 'https://vimeo.com/ondemand/20704',
@ -939,11 +934,15 @@ class VimeoAlbumIE(VimeoBaseInfoExtractor):
} }
if hashed_pass: if hashed_pass:
query['_hashed_pass'] = hashed_pass query['_hashed_pass'] = hashed_pass
videos = self._download_json( try:
'https://api.vimeo.com/albums/%s/videos' % album_id, videos = self._download_json(
album_id, 'Downloading page %d' % api_page, query=query, headers={ 'https://api.vimeo.com/albums/%s/videos' % album_id,
'Authorization': 'jwt ' + authorization, album_id, 'Downloading page %d' % api_page, query=query, headers={
})['data'] 'Authorization': 'jwt ' + authorization,
})['data']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
return
for video in videos: for video in videos:
link = video.get('link') link = video.get('link')
if not link: if not link:
@ -1058,10 +1057,23 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
page_url, video_id = re.match(self._VALID_URL, url).groups() page_url, video_id = re.match(self._VALID_URL, url).groups()
clip_data = self._download_json( data = self._download_json(
page_url.replace('/review/', '/review/data/'), page_url.replace('/review/', '/review/data/'), video_id)
video_id)['clipData'] if data.get('isLocked') is True:
config_url = clip_data['configUrl'] video_password = self._get_video_password()
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', video_id)
webpage = self._verify_video_password(
'https://vimeo.com/' + video_id, video_id,
video_password, viewer['xsrft'], viewer['vuid'])
clip_page_config = self._parse_json(self._search_regex(
r'window\.vimeo\.clip_page_config\s*=\s*({.+?});',
webpage, 'clip page config'), video_id)
config_url = clip_page_config['player']['config_url']
clip_data = clip_page_config.get('clip') or {}
else:
clip_data = data['clipData']
config_url = clip_data['configUrl']
config = self._download_json(config_url, video_id) config = self._download_json(config_url, video_id)
info_dict = self._parse_config(config, video_id) info_dict = self._parse_config(config, video_id)
source_format = self._extract_original_format( source_format = self._extract_original_format(

View File

@ -300,6 +300,13 @@ class VKIE(VKBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod
def _extract_sibnet_urls(webpage):
# https://help.sibnet.ru/?sibnet_video_embed
return [unescapeHTML(mobj.group('url')) for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//video\.sibnet\.ru/shell\.php\?.*?\bvideoid=\d+.*?)\1',
webpage)]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid') video_id = mobj.group('videoid')
@ -408,6 +415,10 @@ class VKIE(VKBaseIE):
if odnoklassniki_url: if odnoklassniki_url:
return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key()) return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
sibnet_urls = self._extract_sibnet_urls(info_page)
if sibnet_urls:
return self.url_result(sibnet_urls[0])
m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page) m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page)
if m_opts: if m_opts:
m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1)) m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1))

View File

@ -106,7 +106,7 @@ class VLiveIE(VLiveBaseIE):
raise ExtractorError('Unable to log in', expected=True) raise ExtractorError('Unable to log in', expected=True)
def _call_api(self, path_template, video_id, fields=None): def _call_api(self, path_template, video_id, fields=None):
query = {'appId': self._APP_ID, 'gcc': 'KR'} query = {'appId': self._APP_ID, 'gcc': 'KR', 'platformType': 'PC'}
if fields: if fields:
query['fields'] = fields query['fields'] = fields
try: try:

View File

@ -182,17 +182,20 @@ class VVVVIDIE(InfoExtractor):
if not embed_code: if not embed_code:
continue continue
embed_code = ds(embed_code) embed_code = ds(embed_code)
if video_type in ('video/rcs', 'video/kenc'): if video_type == 'video/kenc':
if video_type == 'video/kenc': embed_code = re.sub(r'https?(://[^/]+)/z/', r'https\1/i/', embed_code).replace('/manifest.f4m', '/master.m3u8')
kenc = self._download_json( kenc = self._download_json(
'https://www.vvvvid.it/kenc', video_id, query={ 'https://www.vvvvid.it/kenc', video_id, query={
'action': 'kt', 'action': 'kt',
'conn_id': self._conn_id, 'conn_id': self._conn_id,
'url': embed_code, 'url': embed_code,
}, fatal=False) or {} }, fatal=False) or {}
kenc_message = kenc.get('message') kenc_message = kenc.get('message')
if kenc_message: if kenc_message:
embed_code += '?' + ds(kenc_message) embed_code += '?' + ds(kenc_message)
formats.extend(self._extract_m3u8_formats(
embed_code, video_id, 'mp4', m3u8_id='hls', fatal=False))
elif video_type == 'video/rcs':
formats.extend(self._extract_akamai_formats(embed_code, video_id)) formats.extend(self._extract_akamai_formats(embed_code, video_id))
elif video_type == 'video/youtube': elif video_type == 'video/youtube':
info.update({ info.update({

View File

@ -58,6 +58,7 @@ class XFileShareIE(InfoExtractor):
(r'vidlocker\.xyz', 'VidLocker'), (r'vidlocker\.xyz', 'VidLocker'),
(r'vidshare\.tv', 'VidShare'), (r'vidshare\.tv', 'VidShare'),
(r'vup\.to', 'VUp'), (r'vup\.to', 'VUp'),
(r'wolfstream\.tv', 'WolfStream'),
(r'xvideosharing\.com', 'XVideoSharing'), (r'xvideosharing\.com', 'XVideoSharing'),
) )
@ -82,6 +83,9 @@ class XFileShareIE(InfoExtractor):
}, { }, {
'url': 'https://aparat.cam/n4d6dh0wvlpr', 'url': 'https://aparat.cam/n4d6dh0wvlpr',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://wolfstream.tv/nthme29v9u2x',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -11,6 +11,7 @@ from ..utils import (
parse_duration, parse_duration,
sanitized_Request, sanitized_Request,
str_to_int, str_to_int,
url_or_none,
) )
@ -87,10 +88,10 @@ class XTubeIE(InfoExtractor):
'Cookie': 'age_verified=1; cookiesAccepted=1', 'Cookie': 'age_verified=1; cookiesAccepted=1',
}) })
title, thumbnail, duration = [None] * 3 title, thumbnail, duration, sources, media_definition = [None] * 5
config = self._parse_json(self._search_regex( config = self._parse_json(self._search_regex(
r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf)', webpage, 'config', r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf|playerWrapper)', webpage, 'config',
default='{}'), video_id, transform_source=js_to_json, fatal=False) default='{}'), video_id, transform_source=js_to_json, fatal=False)
if config: if config:
config = config.get('mainRoll') config = config.get('mainRoll')
@ -99,20 +100,52 @@ class XTubeIE(InfoExtractor):
thumbnail = config.get('poster') thumbnail = config.get('poster')
duration = int_or_none(config.get('duration')) duration = int_or_none(config.get('duration'))
sources = config.get('sources') or config.get('format') sources = config.get('sources') or config.get('format')
media_definition = config.get('mediaDefinition')
if not isinstance(sources, dict): if not isinstance(sources, dict) and not media_definition:
sources = self._parse_json(self._search_regex( sources = self._parse_json(self._search_regex(
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),', r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id, webpage, 'sources', group='sources'), video_id,
transform_source=js_to_json) transform_source=js_to_json)
formats = [] formats = []
for format_id, format_url in sources.items(): format_urls = set()
formats.append({
'url': format_url, if isinstance(sources, dict):
'format_id': format_id, for format_id, format_url in sources.items():
'height': int_or_none(format_id), format_url = url_or_none(format_url)
}) if not format_url:
continue
if format_url in format_urls:
continue
format_urls.add(format_url)
formats.append({
'url': format_url,
'format_id': format_id,
'height': int_or_none(format_id),
})
if isinstance(media_definition, list):
for media in media_definition:
video_url = url_or_none(media.get('videoUrl'))
if not video_url:
continue
if video_url in format_urls:
continue
format_urls.add(video_url)
format_id = media.get('format')
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif format_id == 'mp4':
height = int_or_none(media.get('quality'))
formats.append({
'url': video_url,
'format_id': '%s-%d' % (format_id, height) if height else format_id,
'height': height,
})
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -154,7 +154,7 @@ class YoukuIE(InfoExtractor):
# request basic data # request basic data
basic_data_params = { basic_data_params = {
'vid': video_id, 'vid': video_id,
'ccode': '0590', 'ccode': '0532',
'client_ip': '192.168.1.1', 'client_ip': '192.168.1.1',
'utid': cna, 'utid': cna,
'client_ts': time.time() / 1000, 'client_ts': time.time() / 1000,

View File

@ -24,6 +24,7 @@ from ..jsinterp import JSInterpreter
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html, clean_html,
dict_get,
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
@ -45,6 +46,10 @@ from ..utils import (
) )
def parse_qs(url):
return compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
class YoutubeBaseInfoExtractor(InfoExtractor): class YoutubeBaseInfoExtractor(InfoExtractor):
"""Provide base functions for Youtube extractors""" """Provide base functions for Youtube extractors"""
_LOGIN_URL = 'https://accounts.google.com/ServiceLogin' _LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
@ -60,11 +65,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)' _PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)'
def _ids_to_results(self, ids):
return [
self.url_result(vid_id, 'Youtube', video_id=vid_id)
for vid_id in ids]
def _login(self): def _login(self):
""" """
Attempt to log in to YouTube. Attempt to log in to YouTube.
@ -248,7 +248,23 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return True return True
def _initialize_consent(self):
cookies = self._get_cookies('https://www.youtube.com/')
if cookies.get('__Secure-3PSID'):
return
consent_id = None
consent = cookies.get('CONSENT')
if consent:
if 'YES' in consent.value:
return
consent_id = self._search_regex(
r'PENDING\+(\d+)', consent.value, 'consent', default=None)
if not consent_id:
consent_id = random.randint(100, 999)
self._set_cookie('.youtube.com', 'CONSENT', 'YES+cb.20210328-17-p0.en+FX+%s' % consent_id)
def _real_initialize(self): def _real_initialize(self):
self._initialize_consent()
if self._downloader is None: if self._downloader is None:
return return
if not self._login(): if not self._login():
@ -289,7 +305,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return self._parse_json( return self._parse_json(
self._search_regex( self._search_regex(
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg', r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
default='{}'), video_id, fatal=False) default='{}'), video_id, fatal=False) or {}
def _extract_video(self, renderer): def _extract_video(self, renderer):
video_id = renderer['videoId'] video_id = renderer['videoId']
@ -312,7 +328,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
(lambda x: x['ownerText']['runs'][0]['text'], (lambda x: x['ownerText']['runs'][0]['text'],
lambda x: x['shortBylineText']['runs'][0]['text']), compat_str) lambda x: x['shortBylineText']['runs'][0]['text']), compat_str)
return { return {
'_type': 'url_transparent', '_type': 'url',
'ie_key': YoutubeIE.ie_key(), 'ie_key': YoutubeIE.ie_key(),
'id': video_id, 'id': video_id,
'url': video_id, 'url': video_id,
@ -338,21 +354,28 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'(?:www\.)?invidious\.mastodon\.host', r'(?:www\.)?invidious\.mastodon\.host',
r'(?:www\.)?invidious\.zapashcanon\.fr', r'(?:www\.)?invidious\.zapashcanon\.fr',
r'(?:www\.)?invidious\.kavin\.rocks', r'(?:www\.)?invidious\.kavin\.rocks',
r'(?:www\.)?invidious\.tinfoil-hat\.net',
r'(?:www\.)?invidious\.himiko\.cloud',
r'(?:www\.)?invidious\.reallyancient\.tech',
r'(?:www\.)?invidious\.tube', r'(?:www\.)?invidious\.tube',
r'(?:www\.)?invidiou\.site', r'(?:www\.)?invidiou\.site',
r'(?:www\.)?invidious\.site', r'(?:www\.)?invidious\.site',
r'(?:www\.)?invidious\.xyz', r'(?:www\.)?invidious\.xyz',
r'(?:www\.)?invidious\.nixnet\.xyz', r'(?:www\.)?invidious\.nixnet\.xyz',
r'(?:www\.)?invidious\.048596\.xyz',
r'(?:www\.)?invidious\.drycat\.fr', r'(?:www\.)?invidious\.drycat\.fr',
r'(?:www\.)?inv\.skyn3t\.in',
r'(?:www\.)?tube\.poal\.co', r'(?:www\.)?tube\.poal\.co',
r'(?:www\.)?tube\.connect\.cafe', r'(?:www\.)?tube\.connect\.cafe',
r'(?:www\.)?vid\.wxzm\.sx', r'(?:www\.)?vid\.wxzm\.sx',
r'(?:www\.)?vid\.mint\.lgbt', r'(?:www\.)?vid\.mint\.lgbt',
r'(?:www\.)?vid\.puffyan\.us',
r'(?:www\.)?yewtu\.be', r'(?:www\.)?yewtu\.be',
r'(?:www\.)?yt\.elukerio\.org', r'(?:www\.)?yt\.elukerio\.org',
r'(?:www\.)?yt\.lelux\.fi', r'(?:www\.)?yt\.lelux\.fi',
r'(?:www\.)?invidious\.ggc-project\.de', r'(?:www\.)?invidious\.ggc-project\.de',
r'(?:www\.)?yt\.maisputain\.ovh', r'(?:www\.)?yt\.maisputain\.ovh',
r'(?:www\.)?ytprivate\.com',
r'(?:www\.)?invidious\.13ad\.de', r'(?:www\.)?invidious\.13ad\.de',
r'(?:www\.)?invidious\.toot\.koeln', r'(?:www\.)?invidious\.toot\.koeln',
r'(?:www\.)?invidious\.fdn\.fr', r'(?:www\.)?invidious\.fdn\.fr',
@ -396,16 +419,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId= |(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId=
) )
)? # all until now is optional -> you can pass the naked ID )? # all until now is optional -> you can pass the naked ID
(?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID (?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?!.*?\blist=
(?:
%(playlist_id)s| # combined list/video URLs are handled by the playlist IE
WL # WL are handled by the watch later IE
)
)
(?(1).+)? # if we found the ID, everything can follow (?(1).+)? # if we found the ID, everything can follow
$""" % { $""" % {
'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE,
'invidious': '|'.join(_INVIDIOUS_SITES), 'invidious': '|'.join(_INVIDIOUS_SITES),
} }
_PLAYER_INFO_RE = ( _PLAYER_INFO_RE = (
@ -791,6 +807,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}, },
'skip': 'This video does not exist.', 'skip': 'This video does not exist.',
}, },
{
# Video with incomplete 'yt:stretch=16:'
'url': 'https://www.youtube.com/watch?v=FRhJzUSJbGI',
'only_matching': True,
},
{ {
# Video licensed under Creative Commons # Video licensed under Creative Commons
'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA', 'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA',
@ -1067,6 +1088,23 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtube.com/watch?v=nGC3D_FkCmg', 'url': 'https://www.youtube.com/watch?v=nGC3D_FkCmg',
'only_matching': True, 'only_matching': True,
}, },
{
# restricted location, https://github.com/ytdl-org/youtube-dl/issues/28685
'url': 'cBvYw8_A0vQ',
'info_dict': {
'id': 'cBvYw8_A0vQ',
'ext': 'mp4',
'title': '4K Ueno Okachimachi Street Scenes 上野御徒町歩き',
'description': 'md5:ea770e474b7cd6722b4c95b833c03630',
'upload_date': '20201120',
'uploader': 'Walk around Japan',
'uploader_id': 'UC3o_t8PzBmXf5S9b7GLx1Mw',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UC3o_t8PzBmXf5S9b7GLx1Mw',
},
'params': {
'skip_download': True,
},
},
] ]
_formats = { _formats = {
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'}, '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
@ -1174,6 +1212,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'}, '397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'},
} }
@classmethod
def suitable(cls, url):
# Hack for lazy extractors until more generic solution is implemented
# (see #28780)
from .youtube import parse_qs
qs = parse_qs(url)
if qs.get('list', [None])[0]:
return False
return super(YoutubeIE, cls).suitable(url)
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(YoutubeIE, self).__init__(*args, **kwargs) super(YoutubeIE, self).__init__(*args, **kwargs)
self._code_cache = {} self._code_cache = {}
@ -1431,7 +1479,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
base_url = self.http_scheme() + '//www.youtube.com/' base_url = self.http_scheme() + '//www.youtube.com/'
webpage_url = base_url + 'watch?v=' + video_id webpage_url = base_url + 'watch?v=' + video_id
webpage = self._download_webpage( webpage = self._download_webpage(
webpage_url + '&bpctr=9999999999', video_id, fatal=False) webpage_url + '&bpctr=9999999999&has_verified=1', video_id, fatal=False)
player_response = None player_response = None
if webpage: if webpage:
@ -1468,7 +1516,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def get_text(x): def get_text(x):
if not x: if not x:
return return
return x.get('simpleText') or ''.join([r['text'] for r in x['runs']]) text = x.get('simpleText')
if text and isinstance(text, compat_str):
return text
runs = x.get('runs')
if not isinstance(runs, list):
return
return ''.join([r['text'] for r in runs if isinstance(r.get('text'), compat_str)])
search_meta = ( search_meta = (
lambda x: self._html_search_meta(x, webpage, default=None)) \ lambda x: self._html_search_meta(x, webpage, default=None)) \
@ -1617,7 +1671,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
f['format_id'] = itag f['format_id'] = itag
formats.append(f) formats.append(f)
if self._downloader.params.get('youtube_include_dash_manifest'): if self._downloader.params.get('youtube_include_dash_manifest', True):
dash_manifest_url = streaming_data.get('dashManifestUrl') dash_manifest_url = streaming_data.get('dashManifestUrl')
if dash_manifest_url: if dash_manifest_url:
for f in self._extract_mpd_formats( for f in self._extract_mpd_formats(
@ -1666,13 +1720,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
for m in re.finditer(self._meta_regex('og:video:tag'), webpage)] for m in re.finditer(self._meta_regex('og:video:tag'), webpage)]
for keyword in keywords: for keyword in keywords:
if keyword.startswith('yt:stretch='): if keyword.startswith('yt:stretch='):
w, h = keyword.split('=')[1].split(':') mobj = re.search(r'(\d+)\s*:\s*(\d+)', keyword)
w, h = int(w), int(h) if mobj:
if w > 0 and h > 0: # NB: float is intentional for forcing float division
ratio = w / h w, h = (float(v) for v in mobj.groups())
for f in formats: if w > 0 and h > 0:
if f.get('vcodec') != 'none': ratio = w / h
f['stretched_ratio'] = ratio for f in formats:
if f.get('vcodec') != 'none':
f['stretched_ratio'] = ratio
break
thumbnails = [] thumbnails = []
for container in (video_details, microformat): for container in (video_details, microformat):
@ -1895,7 +1952,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
info['channel'] = get_text(try_get( info['channel'] = get_text(try_get(
vsir, vsir,
lambda x: x['owner']['videoOwnerRenderer']['title'], lambda x: x['owner']['videoOwnerRenderer']['title'],
compat_str)) dict))
rows = try_get( rows = try_get(
vsir, vsir,
lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'], lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'],
@ -1942,7 +1999,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
invidio\.us invidio\.us
)/ )/
(?: (?:
(?:channel|c|user|feed)/| (?:channel|c|user|feed|hashtag)/|
(?:playlist|watch)\?.*?\blist=| (?:playlist|watch)\?.*?\blist=|
(?!(?:watch|embed|v|e)\b) (?!(?:watch|embed|v|e)\b)
) )
@ -1968,6 +2025,15 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'title': 'Игорь Клейнер - Playlists', 'title': 'Игорь Клейнер - Playlists',
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2', 'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
}, },
}, {
# playlists, series
'url': 'https://www.youtube.com/c/3blue1brown/playlists?view=50&sort=dd&shelf_id=3',
'playlist_mincount': 5,
'info_dict': {
'id': 'UCYO_jab_esuFRV4b17AJtAw',
'title': '3Blue1Brown - Playlists',
'description': 'md5:e1384e8a133307dd10edee76e875d62f',
},
}, { }, {
# playlists, singlepage # playlists, singlepage
'url': 'https://www.youtube.com/user/ThirstForScience/playlists', 'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
@ -2228,6 +2294,16 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
}, { }, {
'url': 'https://www.youtube.com/TheYoungTurks/live', 'url': 'https://www.youtube.com/TheYoungTurks/live',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.youtube.com/hashtag/cctv9',
'info_dict': {
'id': 'cctv9',
'title': '#cctv9',
},
'playlist_mincount': 350,
}, {
'url': 'https://www.youtube.com/watch?list=PLW4dVinRY435CBE_JD3t-0SRXKfnZHS1P&feature=youtu.be&v=M9cJMXmQ_ZU',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -2250,10 +2326,13 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
@staticmethod @staticmethod
def _extract_grid_item_renderer(item): def _extract_grid_item_renderer(item):
for item_kind in ('Playlist', 'Video', 'Channel'): assert isinstance(item, dict)
renderer = item.get('grid%sRenderer' % item_kind) for key, renderer in item.items():
if renderer: if not key.startswith('grid') or not key.endswith('Renderer'):
return renderer continue
if not isinstance(renderer, dict):
continue
return renderer
def _grid_entries(self, grid_renderer): def _grid_entries(self, grid_renderer):
for item in grid_renderer['items']: for item in grid_renderer['items']:
@ -2263,7 +2342,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if not isinstance(renderer, dict): if not isinstance(renderer, dict):
continue continue
title = try_get( title = try_get(
renderer, lambda x: x['title']['runs'][0]['text'], compat_str) renderer, (lambda x: x['title']['runs'][0]['text'],
lambda x: x['title']['simpleText']), compat_str)
# playlist # playlist
playlist_id = renderer.get('playlistId') playlist_id = renderer.get('playlistId')
if playlist_id: if playlist_id:
@ -2271,10 +2351,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'https://www.youtube.com/playlist?list=%s' % playlist_id, 'https://www.youtube.com/playlist?list=%s' % playlist_id,
ie=YoutubeTabIE.ie_key(), video_id=playlist_id, ie=YoutubeTabIE.ie_key(), video_id=playlist_id,
video_title=title) video_title=title)
continue
# video # video
video_id = renderer.get('videoId') video_id = renderer.get('videoId')
if video_id: if video_id:
yield self._extract_video(renderer) yield self._extract_video(renderer)
continue
# channel # channel
channel_id = renderer.get('channelId') channel_id = renderer.get('channelId')
if channel_id: if channel_id:
@ -2283,6 +2365,17 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
yield self.url_result( yield self.url_result(
'https://www.youtube.com/channel/%s' % channel_id, 'https://www.youtube.com/channel/%s' % channel_id,
ie=YoutubeTabIE.ie_key(), video_title=title) ie=YoutubeTabIE.ie_key(), video_title=title)
continue
# generic endpoint URL support
ep_url = urljoin('https://www.youtube.com/', try_get(
renderer, lambda x: x['navigationEndpoint']['commandMetadata']['webCommandMetadata']['url'],
compat_str))
if ep_url:
for ie in (YoutubeTabIE, YoutubePlaylistIE, YoutubeIE):
if ie.suitable(ep_url):
yield self.url_result(
ep_url, ie=ie.ie_key(), video_id=ie._match_id(ep_url), video_title=title)
break
def _shelf_entries_from_content(self, shelf_renderer): def _shelf_entries_from_content(self, shelf_renderer):
content = shelf_renderer.get('content') content = shelf_renderer.get('content')
@ -2375,6 +2468,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
for entry in self._post_thread_entries(renderer): for entry in self._post_thread_entries(renderer):
yield entry yield entry
def _rich_grid_entries(self, contents):
for content in contents:
video_renderer = try_get(content, lambda x: x['richItemRenderer']['content']['videoRenderer'], dict)
if video_renderer:
entry = self._video_entry(video_renderer)
if entry:
yield entry
@staticmethod @staticmethod
def _build_continuation_query(continuation, ctp=None): def _build_continuation_query(continuation, ctp=None):
query = { query = {
@ -2420,81 +2521,97 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
ctp = continuation_ep.get('clickTrackingParams') ctp = continuation_ep.get('clickTrackingParams')
return YoutubeTabIE._build_continuation_query(continuation, ctp) return YoutubeTabIE._build_continuation_query(continuation, ctp)
def _entries(self, tab, identity_token): def _entries(self, tab, item_id, webpage):
tab_content = try_get(tab, lambda x: x['content'], dict) tab_content = try_get(tab, lambda x: x['content'], dict)
if not tab_content: if not tab_content:
return return
slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict) slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict)
if not slr_renderer: if slr_renderer:
return is_channels_tab = tab.get('title') == 'Channels'
is_channels_tab = tab.get('title') == 'Channels' continuation = None
continuation = None slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or []
slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or [] for slr_content in slr_contents:
for slr_content in slr_contents: if not isinstance(slr_content, dict):
if not isinstance(slr_content, dict):
continue
is_renderer = try_get(slr_content, lambda x: x['itemSectionRenderer'], dict)
if not is_renderer:
continue
isr_contents = try_get(is_renderer, lambda x: x['contents'], list) or []
for isr_content in isr_contents:
if not isinstance(isr_content, dict):
continue continue
renderer = isr_content.get('playlistVideoListRenderer') is_renderer = try_get(slr_content, lambda x: x['itemSectionRenderer'], dict)
if renderer: if not is_renderer:
for entry in self._playlist_entries(renderer):
yield entry
continuation = self._extract_continuation(renderer)
continue continue
renderer = isr_content.get('gridRenderer') isr_contents = try_get(is_renderer, lambda x: x['contents'], list) or []
if renderer: for isr_content in isr_contents:
for entry in self._grid_entries(renderer): if not isinstance(isr_content, dict):
yield entry continue
continuation = self._extract_continuation(renderer) renderer = isr_content.get('playlistVideoListRenderer')
continue if renderer:
renderer = isr_content.get('shelfRenderer') for entry in self._playlist_entries(renderer):
if renderer: yield entry
for entry in self._shelf_entries(renderer, not is_channels_tab): continuation = self._extract_continuation(renderer)
yield entry continue
continue renderer = isr_content.get('gridRenderer')
renderer = isr_content.get('backstagePostThreadRenderer') if renderer:
if renderer: for entry in self._grid_entries(renderer):
for entry in self._post_thread_entries(renderer): yield entry
yield entry continuation = self._extract_continuation(renderer)
continuation = self._extract_continuation(renderer) continue
continue renderer = isr_content.get('shelfRenderer')
renderer = isr_content.get('videoRenderer') if renderer:
if renderer: for entry in self._shelf_entries(renderer, not is_channels_tab):
entry = self._video_entry(renderer) yield entry
if entry: continue
yield entry renderer = isr_content.get('backstagePostThreadRenderer')
if renderer:
for entry in self._post_thread_entries(renderer):
yield entry
continuation = self._extract_continuation(renderer)
continue
renderer = isr_content.get('videoRenderer')
if renderer:
entry = self._video_entry(renderer)
if entry:
yield entry
if not continuation:
continuation = self._extract_continuation(is_renderer)
if not continuation: if not continuation:
continuation = self._extract_continuation(is_renderer) continuation = self._extract_continuation(slr_renderer)
else:
rich_grid_renderer = tab_content.get('richGridRenderer')
if not rich_grid_renderer:
return
for entry in self._rich_grid_entries(rich_grid_renderer.get('contents') or []):
yield entry
continuation = self._extract_continuation(rich_grid_renderer)
if not continuation: ytcfg = self._extract_ytcfg(item_id, webpage)
continuation = self._extract_continuation(slr_renderer) client_version = try_get(
ytcfg, lambda x: x['INNERTUBE_CLIENT_VERSION'], compat_str) or '2.20210407.08.00'
headers = { headers = {
'x-youtube-client-name': '1', 'x-youtube-client-name': '1',
'x-youtube-client-version': '2.20201112.04.01', 'x-youtube-client-version': client_version,
'content-type': 'application/json', 'content-type': 'application/json',
} }
context = try_get(ytcfg, lambda x: x['INNERTUBE_CONTEXT'], dict) or {
'client': {
'clientName': 'WEB',
'clientVersion': client_version,
}
}
visitor_data = try_get(context, lambda x: x['client']['visitorData'], compat_str)
identity_token = self._extract_identity_token(ytcfg, webpage)
if identity_token: if identity_token:
headers['x-youtube-identity-token'] = identity_token headers['x-youtube-identity-token'] = identity_token
data = { data = {
'context': { 'context': context,
'client': {
'clientName': 'WEB',
'clientVersion': '2.20201021.03.00',
}
},
} }
for page_num in itertools.count(1): for page_num in itertools.count(1):
if not continuation: if not continuation:
break break
if visitor_data:
headers['x-goog-visitor-id'] = visitor_data
data['continuation'] = continuation['continuation'] data['continuation'] = continuation['continuation']
data['clickTracking'] = { data['clickTracking'] = {
'clickTrackingParams': continuation['itct'] 'clickTrackingParams': continuation['itct']
@ -2519,6 +2636,9 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if not response: if not response:
break break
visitor_data = try_get(
response, lambda x: x['responseContext']['visitorData'], compat_str) or visitor_data
continuation_contents = try_get( continuation_contents = try_get(
response, lambda x: x['continuationContents'], dict) response, lambda x: x['continuationContents'], dict)
if continuation_contents: if continuation_contents:
@ -2541,13 +2661,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
continuation = self._extract_continuation(continuation_renderer) continuation = self._extract_continuation(continuation_renderer)
continue continue
on_response_received = dict_get(response, ('onResponseReceivedActions', 'onResponseReceivedEndpoints'))
continuation_items = try_get( continuation_items = try_get(
response, lambda x: x['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems'], list) on_response_received, lambda x: x[0]['appendContinuationItemsAction']['continuationItems'], list)
if continuation_items: if continuation_items:
continuation_item = continuation_items[0] continuation_item = continuation_items[0]
if not isinstance(continuation_item, dict): if not isinstance(continuation_item, dict):
continue continue
renderer = continuation_item.get('gridVideoRenderer') renderer = self._extract_grid_item_renderer(continuation_item)
if renderer: if renderer:
grid_renderer = {'items': continuation_items} grid_renderer = {'items': continuation_items}
for entry in self._grid_entries(grid_renderer): for entry in self._grid_entries(grid_renderer):
@ -2561,6 +2682,19 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
yield entry yield entry
continuation = self._extract_continuation(video_list_renderer) continuation = self._extract_continuation(video_list_renderer)
continue continue
renderer = continuation_item.get('backstagePostThreadRenderer')
if renderer:
continuation_renderer = {'contents': continuation_items}
for entry in self._post_thread_continuation_entries(continuation_renderer):
yield entry
continuation = self._extract_continuation(continuation_renderer)
continue
renderer = continuation_item.get('richItemRenderer')
if renderer:
for entry in self._rich_grid_entries(continuation_items):
yield entry
continuation = self._extract_continuation({'contents': continuation_items})
continue
break break
@ -2613,11 +2747,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
alerts.append(text) alerts.append(text)
return '\n'.join(alerts) return '\n'.join(alerts)
def _extract_from_tabs(self, item_id, webpage, data, tabs, identity_token): def _extract_from_tabs(self, item_id, webpage, data, tabs):
selected_tab = self._extract_selected_tab(tabs) selected_tab = self._extract_selected_tab(tabs)
renderer = try_get( renderer = try_get(
data, lambda x: x['metadata']['channelMetadataRenderer'], dict) data, lambda x: x['metadata']['channelMetadataRenderer'], dict)
playlist_id = title = description = None playlist_id = item_id
title = description = None
if renderer: if renderer:
channel_title = renderer.get('title') or item_id channel_title = renderer.get('title') or item_id
tab_title = selected_tab.get('title') tab_title = selected_tab.get('title')
@ -2626,14 +2761,18 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
title += ' - %s' % tab_title title += ' - %s' % tab_title
description = renderer.get('description') description = renderer.get('description')
playlist_id = renderer.get('externalId') playlist_id = renderer.get('externalId')
renderer = try_get( else:
data, lambda x: x['metadata']['playlistMetadataRenderer'], dict) renderer = try_get(
if renderer: data, lambda x: x['metadata']['playlistMetadataRenderer'], dict)
title = renderer.get('title') if renderer:
description = None title = renderer.get('title')
playlist_id = item_id else:
renderer = try_get(
data, lambda x: x['header']['hashtagHeaderRenderer'], dict)
if renderer:
title = try_get(renderer, lambda x: x['hashtag']['simpleText'])
playlist = self.playlist_result( playlist = self.playlist_result(
self._entries(selected_tab, identity_token), self._entries(selected_tab, item_id, webpage),
playlist_id=playlist_id, playlist_title=title, playlist_id=playlist_id, playlist_title=title,
playlist_description=description) playlist_description=description)
playlist.update(self._extract_uploader(data)) playlist.update(self._extract_uploader(data))
@ -2657,8 +2796,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
self._playlist_entries(playlist), playlist_id=playlist_id, self._playlist_entries(playlist), playlist_id=playlist_id,
playlist_title=title) playlist_title=title)
def _extract_identity_token(self, webpage, item_id): def _extract_identity_token(self, ytcfg, webpage):
ytcfg = self._extract_ytcfg(item_id, webpage)
if ytcfg: if ytcfg:
token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str) token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str)
if token: if token:
@ -2672,7 +2810,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
url = compat_urlparse.urlunparse( url = compat_urlparse.urlunparse(
compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com')) compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com'))
# Handle both video/playlist URLs # Handle both video/playlist URLs
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = parse_qs(url)
video_id = qs.get('v', [None])[0] video_id = qs.get('v', [None])[0]
playlist_id = qs.get('list', [None])[0] playlist_id = qs.get('list', [None])[0]
if video_id and playlist_id: if video_id and playlist_id:
@ -2681,12 +2819,11 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id) return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id)) self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
webpage = self._download_webpage(url, item_id) webpage = self._download_webpage(url, item_id)
identity_token = self._extract_identity_token(webpage, item_id)
data = self._extract_yt_initial_data(item_id, webpage) data = self._extract_yt_initial_data(item_id, webpage)
tabs = try_get( tabs = try_get(
data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list) data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list)
if tabs: if tabs:
return self._extract_from_tabs(item_id, webpage, data, tabs, identity_token) return self._extract_from_tabs(item_id, webpage, data, tabs)
playlist = try_get( playlist = try_get(
data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict) data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict)
if playlist: if playlist:
@ -2769,12 +2906,19 @@ class YoutubePlaylistIE(InfoExtractor):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if YoutubeTabIE.suitable(url) else super( if YoutubeTabIE.suitable(url):
YoutubePlaylistIE, cls).suitable(url) return False
# Hack for lazy extractors until more generic solution is implemented
# (see #28780)
from .youtube import parse_qs
qs = parse_qs(url)
if qs.get('v', [None])[0]:
return False
return super(YoutubePlaylistIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = parse_qs(url)
if not qs: if not qs:
qs = {'list': playlist_id} qs = {'list': playlist_id}
return self.url_result( return self.url_result(

View File

@ -1,93 +1,94 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
update_url_query,
) )
class ZingMp3BaseInfoExtractor(InfoExtractor): class ZingMp3BaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:mp3\.zing|zingmp3)\.vn/(?:%s)/[^/]+/(?P<id>\w+)\.html'
_GEO_COUNTRIES = ['VN']
def _extract_item(self, item, page_type, fatal=True): def _extract_item(self, item, fatal):
error_message = item.get('msg') item_id = item['id']
if error_message: title = item.get('name') or item['title']
if not fatal:
return
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message),
expected=True)
formats = [] formats = []
for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])): for k, v in (item.get('source') or {}).items():
if not source_url or source_url == 'require vip': if not v:
continue continue
if not re.match(r'https?://', source_url): if k in ('mp4', 'hls'):
source_url = '//' + source_url for res, video_url in v.items():
source_url = self._proto_relative_url(source_url, 'http:') if not video_url:
quality_num = int_or_none(quality) continue
f = { if k == 'hls':
'format_id': quality, formats.extend(self._extract_m3u8_formats(
'url': source_url, video_url, item_id, 'mp4',
} 'm3u8_native', m3u8_id=k, fatal=False))
if page_type == 'video': elif k == 'mp4':
f.update({ formats.append({
'height': quality_num, 'format_id': 'mp4-' + res,
'ext': 'mp4', 'url': video_url,
}) 'height': int_or_none(self._search_regex(
r'^(\d+)p', res, 'resolution', default=None)),
})
else: else:
f.update({ formats.append({
'abr': quality_num,
'ext': 'mp3', 'ext': 'mp3',
'format_id': k,
'tbr': int_or_none(k),
'url': self._proto_relative_url(v),
'vcodec': 'none',
}) })
formats.append(f) if not formats:
if not fatal:
return
msg = item['msg']
if msg == 'Sorry, this content is not available in your country.':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(msg, expected=True)
self._sort_formats(formats)
cover = item.get('cover') subtitles = None
lyric = item.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}],
}
album = item.get('album') or {}
return { return {
'title': (item.get('name') or item.get('title')).strip(), 'id': item_id,
'title': title,
'formats': formats, 'formats': formats,
'thumbnail': 'http:/' + cover if cover else None, 'thumbnail': item.get('thumbnail'),
'artist': item.get('artist'), 'subtitles': subtitles,
'duration': int_or_none(item.get('duration')),
'track': title,
'artist': item.get('artists_names'),
'album': album.get('name') or album.get('title'),
'album_artist': album.get('artists_names'),
} }
def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None): def _real_extract(self, url):
player_json = self._download_json(player_json_url, id, 'Downloading Player JSON') page_id = self._match_id(url)
items = player_json['data'] webpage = self._download_webpage(
if 'item' in items: url.replace('://zingmp3.vn/', '://mp3.zing.vn/'),
items = items['item'] page_id, query={'play_song': 1})
data_path = self._search_regex(
if len(items) == 1: r'data-xml="([^"]+)', webpage, 'data path')
# one single song return self._process_data(self._download_json(
data = self._extract_item(items[0], page_type) 'https://mp3.zing.vn/xhr' + data_path, page_id)['data'])
data['id'] = id
return data
else:
# playlist of songs
entries = []
for i, item in enumerate(items, 1):
entry = self._extract_item(item, page_type, fatal=False)
if not entry:
continue
entry['id'] = '%s-%d' % (id, i)
entries.append(entry)
return {
'_type': 'playlist',
'id': id,
'title': playlist_title,
'entries': entries,
}
class ZingMp3IE(ZingMp3BaseInfoExtractor): class ZingMp3IE(ZingMp3BaseIE):
_VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html' _VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'bai-hat|video-clip'
_TESTS = [{ _TESTS = [{
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html', 'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'md5': 'ead7ae13693b3205cbc89536a077daed', 'md5': 'ead7ae13693b3205cbc89536a077daed',
@ -95,49 +96,66 @@ class ZingMp3IE(ZingMp3BaseInfoExtractor):
'id': 'ZWZB9WAB', 'id': 'ZWZB9WAB',
'title': 'Xa Mãi Xa', 'title': 'Xa Mãi Xa',
'ext': 'mp3', 'ext': 'mp3',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.+\.jpg',
'subtitles': {
'origin': [{
'ext': 'lrc',
}]
},
'duration': 255,
'track': 'Xa Mãi Xa',
'artist': 'Bảo Thy',
'album': 'Special Album',
'album_artist': 'Bảo Thy',
}, },
}, { }, {
'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html', 'url': 'https://mp3.zing.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html',
'md5': '870295a9cd8045c0e15663565902618d', 'md5': 'e9c972b693aa88301ef981c8151c4343',
'info_dict': { 'info_dict': {
'id': 'ZW6BAEA0', 'id': 'ZO8ZF7C7',
'title': 'Let It Go (Frozen OST)', 'title': 'Sương Hoa Đưa Lối',
'ext': 'mp4', 'ext': 'mp4',
'thumbnail': r're:^https?://.+\.jpg',
'duration': 207,
'track': 'Sương Hoa Đưa Lối',
'artist': 'K-ICM, RYO',
}, },
}, { }, {
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html', 'url': 'https://zingmp3.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
},
'playlist_count': 10,
'skip': 'removed at the request of the owner',
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True, 'only_matching': True,
}] }]
IE_NAME = 'zingmp3' IE_NAME = 'zingmp3'
IE_DESC = 'mp3.zing.vn' IE_DESC = 'mp3.zing.vn'
def _real_extract(self, url): def _process_data(self, data):
page_id = self._match_id(url) return self._extract_item(data, True)
webpage = self._download_webpage(url, page_id)
player_json_url = self._search_regex([ class ZingMp3AlbumIE(ZingMp3BaseIE):
r'data-xml="([^"]+)', _VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'album|playlist'
r'&amp;xmlURL=([^&]+)&' _TESTS = [{
], webpage, 'player xml url') 'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái',
},
'playlist_count': 10,
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True,
}, {
'url': 'https://zingmp3.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'only_matching': True,
}]
IE_NAME = 'zingmp3:album'
playlist_title = None def _process_data(self, data):
page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type') def entries():
if page_type == 'video': for item in (data.get('items') or []):
player_json_url = update_url_query(player_json_url, {'format': 'json'}) entry = self._extract_item(item, False)
else: if entry:
player_json_url = player_json_url.replace('/xml/', '/html5xml/') yield entry
if page_type == 'album': info = data.get('info') or {}
playlist_title = self._og_search_title(webpage) return self.playlist_result(
entries(), info.get('id'), info.get('name') or info.get('title'))
return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)

View File

@ -0,0 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
parse_filesize,
urlencode_postdata,
)
class ZoomIE(InfoExtractor):
IE_NAME = 'zoom'
_VALID_URL = r'(?P<base_url>https?://(?:[^.]+\.)?zoom.us/)rec(?:ording)?/(?:play|share)/(?P<id>[A-Za-z0-9_.-]+)'
_TEST = {
'url': 'https://economist.zoom.us/rec/play/dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'md5': 'ab445e8c911fddc4f9adc842c2c5d434',
'info_dict': {
'id': 'dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'ext': 'mp4',
'title': 'China\'s "two sessions" and the new five-year plan',
}
}
def _real_extract(self, url):
base_url, play_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, play_id)
try:
form = self._form_hidden_inputs('password_form', webpage)
except ExtractorError:
form = None
if form:
password = self._downloader.params.get('videopassword')
if not password:
raise ExtractorError(
'This video is protected by a passcode, use the --video-password option', expected=True)
is_meeting = form.get('useWhichPasswd') == 'meeting'
validation = self._download_json(
base_url + 'rec/validate%s_passwd' % ('_meet' if is_meeting else ''),
play_id, 'Validating passcode', 'Wrong passcode', data=urlencode_postdata({
'id': form[('meet' if is_meeting else 'file') + 'Id'],
'passwd': password,
'action': form.get('action'),
}))
if not validation.get('status'):
raise ExtractorError(validation['errorMessage'], expected=True)
webpage = self._download_webpage(url, play_id)
data = self._parse_json(self._search_regex(
r'(?s)window\.__data__\s*=\s*({.+?});',
webpage, 'data'), play_id, js_to_json)
return {
'id': play_id,
'title': data['topic'],
'url': data['viewMp4Url'],
'width': int_or_none(data.get('viewResolvtionsWidth')),
'height': int_or_none(data.get('viewResolvtionsHeight')),
'http_headers': {
'Referer': base_url,
},
'filesize_approx': parse_filesize(data.get('fileSize')),
}

View File

@ -768,7 +768,7 @@ def parseOpts(overrideArguments=None):
action='store_true', dest='rm_cachedir', action='store_true', dest='rm_cachedir',
help='Delete all filesystem cache files') help='Delete all filesystem cache files')
thumbnail = optparse.OptionGroup(parser, 'Thumbnail images') thumbnail = optparse.OptionGroup(parser, 'Thumbnail Options')
thumbnail.add_option( thumbnail.add_option(
'--write-thumbnail', '--write-thumbnail',
action='store_true', dest='writethumbnail', default=False, action='store_true', dest='writethumbnail', default=False,

View File

@ -39,6 +39,7 @@ import zlib
from .compat import ( from .compat import (
compat_HTMLParseError, compat_HTMLParseError,
compat_HTMLParser, compat_HTMLParser,
compat_HTTPError,
compat_basestring, compat_basestring,
compat_chr, compat_chr,
compat_cookiejar, compat_cookiejar,
@ -2879,12 +2880,60 @@ class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler): class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
if sys.version_info[0] < 3: """YoutubeDL redirect handler
def redirect_request(self, req, fp, code, msg, headers, newurl):
# On python 2 urlh.geturl() may sometimes return redirect URL The code is based on HTTPRedirectHandler implementation from CPython [1].
# as byte string instead of unicode. This workaround allows
# to force it always return unicode. This redirect handler solves two issues:
return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl)) - ensures redirect URL is always unicode under python 2
- introduces support for experimental HTTP response status code
308 Permanent Redirect [2] used by some sites [3]
1. https://github.com/python/cpython/blob/master/Lib/urllib/request.py
2. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/308
3. https://github.com/ytdl-org/youtube-dl/issues/28768
"""
http_error_301 = http_error_303 = http_error_307 = http_error_308 = compat_urllib_request.HTTPRedirectHandler.http_error_302
def redirect_request(self, req, fp, code, msg, headers, newurl):
"""Return a Request or None in response to a redirect.
This is called by the http_error_30x methods when a
redirection response is received. If a redirection should
take place, return a new Request to allow http_error_30x to
perform the redirect. Otherwise, raise HTTPError if no-one
else should try to handle this url. Return None if you can't
but another Handler might.
"""
m = req.get_method()
if (not (code in (301, 302, 303, 307, 308) and m in ("GET", "HEAD")
or code in (301, 302, 303) and m == "POST")):
raise compat_HTTPError(req.full_url, code, msg, headers, fp)
# Strictly (according to RFC 2616), 301 or 302 in response to
# a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib.request, in this case). In practice,
# essentially all clients do redirect in this case, so we do
# the same.
# On python 2 urlh.geturl() may sometimes return redirect URL
# as byte string instead of unicode. This workaround allows
# to force it always return unicode.
if sys.version_info[0] < 3:
newurl = compat_str(newurl)
# Be conciliant with URIs containing a space. This is mainly
# redundant with the more complete encoding done in http_error_302(),
# but it is kept for compatibility with other callers.
newurl = newurl.replace(' ', '%20')
CONTENT_HEADERS = ("content-length", "content-type")
# NB: don't use dict comprehension for python 2.6 compatibility
newheaders = dict((k, v) for k, v in req.headers.items()
if k.lower() not in CONTENT_HEADERS)
return compat_urllib_request.Request(
newurl, headers=newheaders, origin_req_host=req.origin_req_host,
unverifiable=True)
def extract_timezone(date_str): def extract_timezone(date_str):

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2021.03.03' __version__ = '2021.05.16'