Merge branch 'master' into msirabella/youtube_translated

This commit is contained in:
Marco Sirabella 2021-06-24 15:52:35 -07:00 committed by GitHub
commit 3a8423a083
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
78 changed files with 2077 additions and 1085 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.25. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2021.03.25** - [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.03.25 [debug] youtube-dl version 2021.06.06
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.25. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2021.03.25** - [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.25. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2021.03.25** - [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.25. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2021.03.25** - [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.03.25 [debug] youtube-dl version 2021.06.06
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.03.25. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2021.03.25** - [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -49,11 +49,18 @@ jobs:
- name: Install Jython - name: Install Jython
if: ${{ matrix.python-impl == 'jython' }} if: ${{ matrix.python-impl == 'jython' }}
run: | run: |
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython" java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose - name: Install nose
if: ${{ matrix.python-impl != 'jython' }}
run: pip install nose run: pip install nose
- name: Install nose (Jython)
if: ${{ matrix.python-impl == 'jython' }}
# Working around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
run: |
wget https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl
pip install nose-1.3.7-py2-none-any.whl
- name: Run tests - name: Run tests
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }} continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env: env:

132
ChangeLog
View File

@ -1,3 +1,135 @@
version 2021.06.06
Extractors
* [facebook] Improve login required detection
* [youporn] Fix formats and view count extraction (#29216)
* [orf:tvthek] Fix thumbnails extraction (#29217)
* [formula1] Fix extraction (#29206)
* [ard] Relax URL regular expression and fix video ids (#22724, #29091)
+ [ustream] Detect https embeds (#29133)
* [ted] Prefer own formats over external sources (#29142)
* [twitch:clips] Improve extraction (#29149)
+ [twitch:clips] Add access token query to download URLs (#29136)
* [youtube] Fix get_video_info request (#29086, #29165)
* [vimeo] Fix vimeo pro embed extraction (#29126)
* [redbulltv] Fix embed data extraction (#28770)
* [shahid] Relax URL regular expression (#28772, #28930)
version 2021.05.16
Core
* [options] Fix thumbnail option group name (#29042)
* [YoutubeDL] Improve extract_info doc (#28946)
Extractors
+ [playstuff] Add support for play.stuff.co.nz (#28901, #28931)
* [eroprofile] Fix extraction (#23200, #23626, #29008)
+ [vivo] Add support for vivo.st (#29009)
+ [generic] Add support for og:audio (#28311, #29015)
* [phoenix] Fix extraction (#29057)
+ [generic] Add support for sibnet embeds
+ [vk] Add support for sibnet embeds (#9500)
+ [generic] Add Referer header for direct videojs download URLs (#2879,
#20217, #29053)
* [orf:radio] Switch download URLs to HTTPS (#29012, #29046)
- [blinkx] Remove extractor (#28941)
* [medaltv] Relax URL regular expression (#28884)
+ [funimation] Add support for optional lang code in URLs (#28950)
+ [gdcvault] Add support for HTML5 videos
* [dispeak] Improve FLV extraction (#13513, #28970)
* [kaltura] Improve iframe extraction (#28969)
* [kaltura] Make embed code alternatives actually work
* [cda] Improve extraction (#28709, #28937)
* [twitter] Improve formats extraction from vmap URL (#28909)
* [xtube] Fix formats extraction (#28870)
* [svtplay] Improve extraction (#28507, #28876)
* [tv2dk] Fix extraction (#28888)
version 2021.04.26
Extractors
+ [xfileshare] Add support for wolfstream.tv (#28858)
* [francetvinfo] Improve video id extraction (#28792)
* [medaltv] Fix extraction (#28807)
* [tver] Redirect all downloads to Brightcove (#28849)
* [go] Improve video id extraction (#25207, #25216, #26058)
* [youtube] Fix lazy extractors (#28780)
+ [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774)
* [cbsnews] Fix extraction for python <3.6 (#23359)
version 2021.04.17
Core
+ [utils] Add support for experimental HTTP response status code
308 Permanent Redirect (#27877, #28768)
Extractors
+ [lbry] Add support for HLS videos (#27877, #28768)
* [youtube] Fix stretched ratio calculation
* [youtube] Improve stretch extraction (#28769)
* [youtube:tab] Improve grid extraction (#28725)
+ [youtube:tab] Detect series playlist on playlists page (#28723)
+ [youtube] Add more invidious instances (#28706)
* [pluralsight] Extend anti-throttling timeout (#28712)
* [youtube] Improve URL to extractor routing (#27572, #28335, #28742)
+ [maoritv] Add support for maoritelevision.com (#24552)
+ [youtube:tab] Pass innertube context and x-goog-visitor-id header along with
continuation requests (#28702)
* [mtv] Fix Viacom A/B Testing Video Player extraction (#28703)
+ [pornhub] Extract DASH and HLS formats from get_media end point (#28698)
* [cbssports] Fix extraction (#28682)
* [jamendo] Fix track extraction (#28686)
* [curiositystream] Fix format extraction (#26845, #28668)
version 2021.04.07
Core
* [extractor/common] Use compat_cookies_SimpleCookie for _get_cookies
+ [compat] Introduce compat_cookies_SimpleCookie
* [extractor/common] Improve JSON-LD author extraction
* [extractor/common] Fix _get_cookies on python 2 (#20673, #23256, #20326,
#28640)
Extractors
* [youtube] Fix extraction of videos with restricted location (#28685)
+ [line] Add support for live.line.me (#17205, #28658)
* [vimeo] Improve extraction (#28591)
* [youku] Update ccode (#17852, #28447, #28460, #28648)
* [youtube] Prefer direct entry metadata over entry metadata from playlist
(#28619, #28636)
* [screencastomatic] Fix extraction (#11976, #24489)
+ [palcomp3] Add support for palcomp3.com (#13120)
+ [arnes] Add support for video.arnes.si (#28483)
+ [youtube:tab] Add support for hashtags (#28308)
version 2021.04.01
Extractors
* [youtube] Setup CONSENT cookie when needed (#28604)
* [vimeo] Fix password protected review extraction (#27591)
* [youtube] Improve age-restricted video extraction (#28578)
version 2021.03.31
Extractors
* [vlive] Fix inkey request (#28589)
* [francetvinfo] Improve video id extraction (#28584)
+ [instagram] Extract duration (#28469)
* [instagram] Improve title extraction (#28469)
+ [sbs] Add support for ondemand watch URLs (#28566)
* [youtube] Fix video's channel extraction (#28562)
* [picarto] Fix live stream extraction (#28532)
* [vimeo] Fix unlisted video extraction (#28414)
* [youtube:tab] Fix playlist/community continuation items extraction (#28266)
* [ard] Improve clip id extraction (#22724, #28528)
version 2021.03.25 version 2021.03.25
Extractors Extractors

View File

@ -287,7 +287,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--no-cache-dir Disable filesystem caching --no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files --rm-cache-dir Delete all filesystem cache files
## Thumbnail images: ## Thumbnail Options:
--write-thumbnail Write thumbnail image to disk --write-thumbnail Write thumbnail image to disk
--write-all-thumbnails Write all thumbnail image formats to --write-all-thumbnails Write all thumbnail image formats to
disk disk
@ -893,7 +893,7 @@ Since June 2012 ([#342](https://github.com/ytdl-org/youtube-dl/issues/342)) yout
### The exe throws an error due to missing `MSVCR100.dll` ### The exe throws an error due to missing `MSVCR100.dll`
To run the exe you need to install first the [Microsoft Visual C++ 2010 Redistributable Package (x86)](https://www.microsoft.com/en-US/download/details.aspx?id=5555). To run the exe you need to install first the [Microsoft Visual C++ 2010 Service Pack 1 Redistributable Package (x86)](https://download.microsoft.com/download/1/6/5/165255E7-1014-4D0A-B094-B6A430A6BFFC/vcredist_x86.exe).
### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files? ### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?

View File

@ -3,6 +3,7 @@
- **20min** - **20min**
- **220.ro** - **220.ro**
- **23video** - **23video**
- **247sports**
- **24video** - **24video**
- **3qsdn**: 3Q SDN - **3qsdn**: 3Q SDN
- **3sat** - **3sat**
@ -118,7 +119,6 @@
- **BitChuteChannel** - **BitChuteChannel**
- **BleacherReport** - **BleacherReport**
- **BleacherReportCMS** - **BleacherReportCMS**
- **blinkx**
- **Bloomberg** - **Bloomberg**
- **BokeCC** - **BokeCC**
- **BongaCams** - **BongaCams**
@ -160,7 +160,8 @@
- **cbsnews**: CBS News - **cbsnews**: CBS News
- **cbsnews:embed** - **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports** - **cbssports**
- **cbssports:embed**
- **CCMA** - **CCMA**
- **CCTV**: 央视网 - **CCTV**: 央视网
- **CDA** - **CDA**
@ -463,6 +464,8 @@
- **limelight** - **limelight**
- **limelight:channel** - **limelight:channel**
- **limelight:channel_list** - **limelight:channel_list**
- **LineLive**
- **LineLiveChannel**
- **LineTV** - **LineTV**
- **linkedin:learning** - **linkedin:learning**
- **linkedin:learning:course** - **linkedin:learning:course**
@ -488,6 +491,7 @@
- **mangomolo:live** - **mangomolo:live**
- **mangomolo:video** - **mangomolo:video**
- **ManyVids** - **ManyVids**
- **MaoriTV**
- **Markiza** - **Markiza**
- **MarkizaPage** - **MarkizaPage**
- **massengeschmack.tv** - **massengeschmack.tv**
@ -679,6 +683,9 @@
- **OutsideTV** - **OutsideTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PalcoMP3:artist**
- **PalcoMP3:song**
- **PalcoMP3:video**
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **ParamountNetwork** - **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
@ -705,6 +712,7 @@
- **play.fm** - **play.fm**
- **player.sky.it** - **player.sky.it**
- **PlayPlusTV** - **PlayPlusTV**
- **PlayStuff**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid** - **Playvid**
@ -1059,6 +1067,7 @@
- **Vidbit** - **Vidbit**
- **Viddler** - **Viddler**
- **Videa** - **Videa**
- **video.arnes.si**: Arnes Video
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.sky.it** - **video.sky.it**
- **video.sky.it:live** - **video.sky.it:live**
@ -1153,7 +1162,7 @@
- **WWE** - **WWE**
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing - **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, WolfStream, XVideoSharing
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **XHamsterUser** - **XHamsterUser**

View File

@ -70,15 +70,6 @@ class TestAllURLsMatching(unittest.TestCase):
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url']) # self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url']) # self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_facebook_matching(self): def test_facebook_matching(self):
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268')) self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793')) self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793'))

View File

@ -39,6 +39,16 @@ class TestExecution(unittest.TestCase):
_, stderr = p.communicate() _, stderr = p.communicate()
self.assertFalse(stderr) self.assertFalse(stderr)
def test_lazy_extractors(self):
try:
subprocess.check_call([sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, 'test/test_all_urls.py'], cwd=rootDir, stdout=_DEV_NULL)
finally:
try:
os.remove('youtube_dl/extractor/lazy_extractors.py')
except (IOError, OSError):
pass
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

26
test/test_youtube_misc.py Normal file
View File

@ -0,0 +1,26 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeIE
class TestYoutubeMisc(unittest.TestCase):
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
if __name__ == '__main__':
unittest.main()

View File

@ -773,11 +773,20 @@ class YoutubeDL(object):
def extract_info(self, url, download=True, ie_key=None, extra_info={}, def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True, force_generic_extractor=False): process=True, force_generic_extractor=False):
''' """
Returns a list with a dictionary for each video we find. Return a list with a dictionary for each video extracted.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result Arguments:
''' url -- URL to extract
Keyword arguments:
download -- whether to download videos during extraction
ie_key -- extractor key hint
extra_info -- dictionary containing the extra values to add to each result
process -- whether to resolve all unresolved references (URLs, playlist items),
must be True for download to work.
force_generic_extractor -- force using the generic extractor
"""
if not ie_key and force_generic_extractor: if not ie_key and force_generic_extractor:
ie_key = 'Generic' ie_key = 'Generic'

View File

@ -73,6 +73,15 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import Cookie as compat_cookies import Cookie as compat_cookies
if sys.version_info[0] == 2:
class compat_cookies_SimpleCookie(compat_cookies.SimpleCookie):
def load(self, rawdata):
if isinstance(rawdata, compat_str):
rawdata = str(rawdata)
return super(compat_cookies_SimpleCookie, self).load(rawdata)
else:
compat_cookies_SimpleCookie = compat_cookies.SimpleCookie
try: try:
import html.entities as compat_html_entities import html.entities as compat_html_entities
except ImportError: # Python 2 except ImportError: # Python 2
@ -3000,6 +3009,7 @@ __all__ = [
'compat_cookiejar', 'compat_cookiejar',
'compat_cookiejar_Cookie', 'compat_cookiejar_Cookie',
'compat_cookies', 'compat_cookies',
'compat_cookies_SimpleCookie',
'compat_ctypes_WINFUNCTYPE', 'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element', 'compat_etree_Element',
'compat_etree_fromstring', 'compat_etree_fromstring',

View File

@ -9,10 +9,10 @@ from ..utils import (
class AppleConnectIE(InfoExtractor): class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)' _VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/(?:id)?sa\.(?P<id>[\w-]+)'
_TEST = { _TESTS = [{
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3', 'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': 'e7c38568a01ea45402570e6029206723', 'md5': 'c1d41f72c8bcaf222e089434619316e4',
'info_dict': { 'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3', 'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v', 'ext': 'm4v',
@ -22,7 +22,10 @@ class AppleConnectIE(InfoExtractor):
'upload_date': '20150710', 'upload_date': '20150710',
'timestamp': 1436545535, 'timestamp': 1436545535,
}, },
} }, {
'url': 'https://itunes.apple.com/us/post/sa.0fe0229f-2457-11e5-9f40-1bb645f2d5d9',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -36,7 +39,7 @@ class AppleConnectIE(InfoExtractor):
video_data = self._parse_json(video_json, video_id) video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp')) timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count')) like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count', default=None))
return { return {
'id': video_id, 'id': video_id,

View File

@ -249,14 +249,14 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?:video-?)?(?P<id>[0-9]+))\.html' _VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/(?:[^/?#&]+/)+(?P<id>[^/?#&]+))\.html'
_TESTS = [{ _TESTS = [{
# available till 7.01.2022 # available till 7.01.2022
'url': 'https://www.daserste.de/information/talk/maischberger/videos/maischberger-die-woche-video100.html', 'url': 'https://www.daserste.de/information/talk/maischberger/videos/maischberger-die-woche-video100.html',
'md5': '867d8aa39eeaf6d76407c5ad1bb0d4c1', 'md5': '867d8aa39eeaf6d76407c5ad1bb0d4c1',
'info_dict': { 'info_dict': {
'display_id': 'maischberger-die-woche', 'id': 'maischberger-die-woche-video100',
'id': '100', 'display_id': 'maischberger-die-woche-video100',
'ext': 'mp4', 'ext': 'mp4',
'duration': 3687.0, 'duration': 3687.0,
'title': 'maischberger. die woche vom 7. Januar 2021', 'title': 'maischberger. die woche vom 7. Januar 2021',
@ -264,16 +264,25 @@ class ARDIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, { }, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html', 'url': 'https://www.daserste.de/information/politik-weltgeschehen/morgenmagazin/videosextern/dominik-kahun-aus-der-nhl-direkt-zur-weltmeisterschaft-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/information/nachrichten-wetter/tagesthemen/videosextern/tagesthemen-17736.html',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html', 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/serie/in-aller-freundschaft-die-jungen-aerzte/Drehpause-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/film/filmmittwoch-im-ersten/videos/making-ofwendezeit-video-100.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id') display_id = mobj.group('id')
player_url = mobj.group('mainurl') + '~playerXml.xml' player_url = mobj.group('mainurl') + '~playerXml.xml'
doc = self._download_xml(player_url, display_id) doc = self._download_xml(player_url, display_id)
@ -324,7 +333,7 @@ class ARDIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': mobj.group('id'), 'id': xpath_text(video_node, './videoId', default=display_id),
'formats': formats, 'formats': formats,
'display_id': display_id, 'display_id': display_id,
'title': video_node.find('./title').text, 'title': video_node.find('./title').text,

View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View File

@ -11,6 +11,7 @@ from ..compat import (
compat_etree_Element, compat_etree_Element,
compat_HTTPError, compat_HTTPError,
compat_parse_qs, compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
) )
@ -25,8 +26,10 @@ from ..utils import (
js_to_json, js_to_json,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
strip_or_none,
try_get, try_get,
unescapeHTML, unescapeHTML,
unified_timestamp,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@ -761,8 +764,17 @@ class BBCIE(BBCCoUkIE):
'only_matching': True, 'only_matching': True,
}, { }, {
# custom redirection to www.bbc.com # custom redirection to www.bbc.com
# also, video with window.__INITIAL_DATA__
'url': 'http://www.bbc.co.uk/news/science-environment-33661876', 'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True, 'info_dict': {
'id': 'p02xzws1',
'ext': 'mp4',
'title': "Pluto may have 'nitrogen glaciers'",
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, { }, {
# single video article embedded with data-media-vpid # single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187', 'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
@ -1164,12 +1176,29 @@ class BBCIE(BBCCoUkIE):
continue continue
formats, subtitles = self._download_media_selector(item_id) formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats) self._sort_formats(formats)
item_desc = None
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
if blocks:
summary = []
for block in blocks:
text = try_get(block, lambda x: x['model']['text'], compat_str)
if text:
summary.append(text)
if summary:
item_desc = '\n\n'.join(summary)
item_time = None
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
if try_get(meta, lambda x: x['label']) == 'Published':
item_time = unified_timestamp(meta.get('timestamp'))
break
entries.append({ entries.append({
'id': item_id, 'id': item_id,
'title': item_title, 'title': item_title,
'thumbnail': item.get('holdingImageUrl'), 'thumbnail': item.get('holdingImageUrl'),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'timestamp': item_time,
'description': strip_or_none(item_desc),
}) })
for resp in (initial_data.get('data') or {}).values(): for resp in (initial_data.get('data') or {}).values():
name = resp.get('name') name = resp.get('name')

View File

@ -233,7 +233,7 @@ class BiliBiliIE(InfoExtractor):
webpage) webpage)
if uploader_mobj: if uploader_mobj:
info.update({ info.update({
'uploader': uploader_mobj.group('name'), 'uploader': uploader_mobj.group('name').strip(),
'uploader_id': uploader_mobj.group('id'), 'uploader_id': uploader_mobj.group('id'),
}) })
if not info.get('uploader'): if not info.get('uploader'):

View File

@ -1,86 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
remove_start,
int_or_none,
)
class BlinkxIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
IE_NAME = 'blinkx'
_TEST = {
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
'md5': '337cf7a344663ec79bf93a526a2e06c7',
'info_dict': {
'id': 'Da0Gw3xc',
'ext': 'mp4',
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
'uploader': 'IGN News',
'upload_date': '20150217',
'timestamp': 1424215740,
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
'duration': 47.743333,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id[:8]
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
+ 'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
duration = None
thumbnails = []
formats = []
for m in data['media']:
if m['type'] == 'jpg':
thumbnails.append({
'url': m['link'],
'width': int(m['w']),
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
tbr = vbr + abr if vbr and abr else None
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
'vcodec': vcodec,
'acodec': acodec,
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(m.get('w')),
'height': int_or_none(m.get('h')),
})
self._sort_formats(formats)
return {
'id': display_id,
'fullid': video_id,
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,
}

View File

@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
def _real_extract(self, url): def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode( item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))), compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0] -zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews') return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View File

@ -1,38 +1,113 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .cbs import CBSBaseIE import re
# from .cbs import CBSBaseIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
)
class CBSSportsIE(CBSBaseIE): # class CBSSportsEmbedIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)' class CBSSportsEmbedIE(InfoExtractor):
IE_NAME = 'cbssports:embed'
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
(?:
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
pcid%3D(?P<pcid>\d+)
)'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/', 'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
'info_dict': { 'only_matching': True,
'id': '1214315075735',
'ext': 'mp4',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/', 'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
'only_matching': True, 'only_matching': True,
}] }]
def _extract_video_info(self, filter_query, video_id): # def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id) # return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
uuid, pcid = re.match(self._VALID_URL, url).groups()
query = {'id': uuid} if uuid else {'pcid': pcid}
video = self._download_json(
'https://www.cbssports.com/api/content/video/',
uuid or pcid, query=query)[0]
video_id = video['id']
title = video['title']
metadata = video.get('metaData') or {}
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
formats = self._extract_m3u8_formats(
metadata['files'][0]['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
image = video.get('image')
thumbnails = None
if image:
image_path = image.get('path')
if image_path:
thumbnails = [{
'url': image_path,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
'filesize': int_or_none(image.get('size')),
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video.get('description'),
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
'duration': int_or_none(metadata.get('duration')),
}
class CBSSportsBaseIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( iframe_url = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'], r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
webpage, 'video id') webpage, 'embed url')
return self._extract_video_info('byId=%s' % video_id, video_id) return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
class CBSSportsIE(CBSSportsBaseIE):
IE_NAME = 'cbssports'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
'info_dict': {
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
'ext': 'mp4',
'title': 'Cover 3: Stanford Spring Gleaning',
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
'timestamp': 1617218398,
'upload_date': '20210331',
'duration': 502,
},
}]
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
IE_NAME = '247sports'
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
'info_dict': {
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
'ext': 'mp4',
'title': '2021 QB Jake Garcia senior highlights through five games',
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
'timestamp': 1607114223,
'upload_date': '20201204',
'duration': 208,
},
}]

View File

@ -133,6 +133,8 @@ class CDAIE(InfoExtractor):
'age_limit': 18 if need_confirm_age else 0, 'age_limit': 18 if need_confirm_age else 0,
} }
info = self._search_json_ld(webpage, video_id, default={})
# Source: https://www.cda.pl/js/player.js?t=1606154898 # Source: https://www.cda.pl/js/player.js?t=1606154898
def decrypt_file(a): def decrypt_file(a):
for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'): for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'):
@ -197,7 +199,7 @@ class CDAIE(InfoExtractor):
handler = self._download_webpage handler = self._download_webpage
webpage = handler( webpage = handler(
self._BASE_URL + href, video_id, urljoin(self._BASE_URL, href), video_id,
'Downloading %s version information' % resolution, fatal=False) 'Downloading %s version information' % resolution, fatal=False)
if not webpage: if not webpage:
# Manually report warning because empty page is returned when # Manually report warning because empty page is returned when
@ -209,6 +211,4 @@ class CDAIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts(info_dict, info) return merge_dicts(info_dict, info)

View File

@ -17,7 +17,7 @@ import math
from ..compat import ( from ..compat import (
compat_cookiejar_Cookie, compat_cookiejar_Cookie,
compat_cookies, compat_cookies_SimpleCookie,
compat_etree_Element, compat_etree_Element,
compat_etree_fromstring, compat_etree_fromstring,
compat_getpass, compat_getpass,
@ -1275,6 +1275,7 @@ class InfoExtractor(object):
def extract_video_object(e): def extract_video_object(e):
assert e['@type'] == 'VideoObject' assert e['@type'] == 'VideoObject'
author = e.get('author')
info.update({ info.update({
'url': url_or_none(e.get('contentUrl')), 'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')), 'title': unescapeHTML(e.get('name')),
@ -1282,7 +1283,11 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')), 'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')), 'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')), 'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')), # author can be an instance of 'Organization' or 'Person' types.
# both types can have 'name' property(inherited from 'Thing' type). [1]
# however some websites are using 'Text' type instead.
# 1. https://schema.org/VideoObject
'uploader': author.get('name') if isinstance(author, dict) else author if isinstance(author, compat_str) else None,
'filesize': float_or_none(e.get('contentSize')), 'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')), 'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')), 'width': int_or_none(e.get('width')),
@ -2896,10 +2901,10 @@ class InfoExtractor(object):
self._downloader.cookiejar.set_cookie(cookie) self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url): def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """ """ Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url) req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req) self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie')) return compat_cookies_SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie): def _apply_first_set_cookie_header(self, url_handle, cookie):
""" """

View File

@ -25,12 +25,12 @@ class CuriosityStreamBaseIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True) '%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id): def _call_api(self, path, video_id, query=None):
headers = {} headers = {}
if self._auth_token: if self._auth_token:
headers['X-Auth-Token'] = self._auth_token headers['X-Auth-Token'] = self._auth_token
result = self._download_json( result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers) self._API_BASE_URL + path, video_id, headers=headers, query=query)
self._handle_errors(result) self._handle_errors(result)
return result['data'] return result['data']
@ -52,62 +52,75 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)' _VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'https://app.curiositystream.com/video/2', 'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': { 'info_dict': {
'id': '2', 'id': '2',
'ext': 'mp4', 'ext': 'mp4',
'title': 'How Did You Develop The Internet?', 'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.', 'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
} },
'params': {
'format': 'bestvideo',
# m3u8 download
'skip_download': True,
},
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title']
formats = [] formats = []
for encoding in media.get('encodings', []): for encoding_format in ('m3u8', 'mpd'):
m3u8_url = encoding.get('master_playlist_url') media = self._call_api('media/' + video_id, video_id, query={
if m3u8_url: 'encodingsNew': 'true',
formats.extend(self._extract_m3u8_formats( 'encodingsFormat': encoding_format,
m3u8_url, video_id, 'mp4', 'm3u8_native', })
m3u8_id='hls', fatal=False)) for encoding in media.get('encodings', []):
encoding_url = encoding.get('url') playlist_url = encoding.get('master_playlist_url')
file_url = encoding.get('file_url') if encoding_format == 'm3u8':
if not encoding_url and not file_url: # use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
continue formats.extend(self._extract_m3u8_formats(
f = { playlist_url, video_id, 'mp4',
'width': int_or_none(encoding.get('width')), m3u8_id='hls', fatal=False))
'height': int_or_none(encoding.get('height')), elif encoding_format == 'mpd':
'vbr': int_or_none(encoding.get('video_bitrate')), formats.extend(self._extract_mpd_formats(
'abr': int_or_none(encoding.get('audio_bitrate')), playlist_url, video_id, mpd_id='dash', fatal=False))
'filesize': int_or_none(encoding.get('size_in_bytes')), encoding_url = encoding.get('url')
'vcodec': encoding.get('video_codec'), file_url = encoding.get('file_url')
'acodec': encoding.get('audio_codec'), if not encoding_url and not file_url:
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
continue continue
fmt = f.copy() f = {
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url) 'width': int_or_none(encoding.get('width')),
if rtmp: 'height': int_or_none(encoding.get('height')),
fmt.update({ 'vbr': int_or_none(encoding.get('video_bitrate')),
'url': rtmp.group('url'), 'abr': int_or_none(encoding.get('audio_bitrate')),
'play_path': rtmp.group('playpath'), 'filesize': int_or_none(encoding.get('size_in_bytes')),
'app': rtmp.group('app'), 'vcodec': encoding.get('video_codec'),
'ext': 'flv', 'acodec': encoding.get('audio_codec'),
'format_id': 'rtmp', 'container': encoding.get('container_type'),
}) }
else: for f_url in (encoding_url, file_url):
fmt.update({ if not f_url:
'url': f_url, continue
'format_id': 'http', fmt = f.copy()
}) rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
formats.append(fmt) if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
self._sort_formats(formats) self._sort_formats(formats)
title = media['title']
subtitles = {} subtitles = {}
for closed_caption in media.get('closed_captions', []): for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file') sub_url = closed_caption.get('file')
@ -132,7 +145,7 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE): class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection' IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)' _VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collections?|series)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://app.curiositystream.com/collection/2', 'url': 'https://app.curiositystream.com/collection/2',
'info_dict': { 'info_dict': {
@ -140,10 +153,13 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'title': 'Curious Minds: The Internet', 'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?', 'description': 'How is the internet shaping our lives in the 21st Century?',
}, },
'playlist_mincount': 17, 'playlist_mincount': 16,
}, { }, {
'url': 'https://curiositystream.com/series/2', 'url': 'https://curiositystream.com/series/2',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://curiositystream.com/collections/36',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -32,6 +32,18 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1013700/Advanced-Material # From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml', 'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True, 'only_matching': True,
}, {
# From https://gdcvault.com/play/1016624, empty speakerVideo
'url': 'https://sevt.dispeak.com/ubm/gdc/online12/xml/201210-822101_1349794556671DDDD.xml',
'info_dict': {
'id': '201210-822101_1349794556671DDDD',
'ext': 'flv',
'title': 'Pre-launch - Preparing to Take the Plunge',
},
}, {
# From http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru, empty slideVideo
'url': 'http://events.digitallyspeaking.com/gdc/project25/xml/p25-miyamoto1999_1282467389849HSVB.xml',
'only_matching': True,
}] }]
def _parse_mp4(self, metadata): def _parse_mp4(self, metadata):
@ -84,26 +96,20 @@ class DigitallySpeakingIE(InfoExtractor):
'vcodec': 'none', 'vcodec': 'none',
'format_id': audio.get('code'), 'format_id': audio.get('code'),
}) })
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True) for video_key, format_id, preference in (
formats.append({ ('slide', 'slides', -2), ('speaker', 'speaker', -1)):
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, video_path = xpath_text(metadata, './%sVideo' % video_key)
'play_path': remove_end(slide_video_path, '.flv'), if not video_path:
'ext': 'flv', continue
'format_note': 'slide deck video', formats.append({
'quality': -2, 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'preference': -2, 'play_path': remove_end(video_path, '.flv'),
'format_id': 'slides', 'ext': 'flv',
}) 'format_note': '%s video' % video_key,
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True) 'quality': preference,
formats.append({ 'preference': preference,
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, 'format_id': format_id,
'play_path': remove_end(speaker_video_path, '.flv'), })
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats return formats
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -22,16 +22,19 @@ class EggheadBaseIE(InfoExtractor):
class EggheadCourseIE(EggheadBaseIE): class EggheadCourseIE(EggheadBaseIE):
IE_DESC = 'egghead.io course' IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course' IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)' _VALID_URL = r'https://(?:app\.)?egghead\.io/(?:course|playlist)s/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript', 'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29, 'playlist_count': 29,
'info_dict': { 'info_dict': {
'id': '72', 'id': '432655',
'title': 'Professor Frisby Introduces Composable Functional JavaScript', 'title': 'Professor Frisby Introduces Composable Functional JavaScript',
'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$', 'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
}, },
} }, {
'url': 'https://app.egghead.io/playlists/professor-frisby-introduces-composable-functional-javascript',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
@ -65,7 +68,7 @@ class EggheadCourseIE(EggheadBaseIE):
class EggheadLessonIE(EggheadBaseIE): class EggheadLessonIE(EggheadBaseIE):
IE_DESC = 'egghead.io lesson' IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson' IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)' _VALID_URL = r'https://(?:app\.)?egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box', 'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': { 'info_dict': {
@ -88,6 +91,9 @@ class EggheadLessonIE(EggheadBaseIE):
}, { }, {
'url': 'https://egghead.io/api/v1/lessons/react-add-redux-to-a-react-application', 'url': 'https://egghead.io/api/v1/lessons/react-add-redux-to-a-react-application',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://app.egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -6,7 +6,7 @@ from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode from ..compat import compat_urllib_parse_urlencode
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
unescapeHTML merge_dicts,
) )
@ -24,7 +24,8 @@ class EroProfileIE(InfoExtractor):
'title': 'sexy babe softcore', 'title': 'sexy babe softcore',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} },
'skip': 'Video not found',
}, { }, {
'url': 'http://www.eroprofile.com/m/videos/view/Try-It-On-Pee_cut_2-wmv-4shared-com-file-sharing-download-movie-file', 'url': 'http://www.eroprofile.com/m/videos/view/Try-It-On-Pee_cut_2-wmv-4shared-com-file-sharing-download-movie-file',
'md5': '1baa9602ede46ce904c431f5418d8916', 'md5': '1baa9602ede46ce904c431f5418d8916',
@ -77,19 +78,15 @@ class EroProfileIE(InfoExtractor):
[r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'], [r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'],
webpage, 'video id', default=None) webpage, 'video id', default=None)
video_url = unescapeHTML(self._search_regex(
r'<source src="([^"]+)', webpage, 'video url'))
title = self._html_search_regex( title = self._html_search_regex(
r'Title:</th><td>([^<]+)</td>', webpage, 'title') (r'Title:</th><td>([^<]+)</td>', r'<h1[^>]*>(.+?)</h1>'),
thumbnail = self._search_regex( webpage, 'title')
r'onclick="showVideoPlayer\(\)"><img src="([^"]+)',
webpage, 'thumbnail', fatal=False)
return { info = self._parse_html5_media_entries(url, webpage, video_id)[0]
return merge_dicts(info, {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'url': video_url,
'title': title, 'title': title,
'thumbnail': thumbnail,
'age_limit': 18, 'age_limit': 18,
} })

View File

@ -72,6 +72,7 @@ from .arte import (
ArteTVEmbedIE, ArteTVEmbedIE,
ArteTVPlaylistIE, ArteTVPlaylistIE,
) )
from .arnes import ArnesIE
from .asiancrush import ( from .asiancrush import (
AsianCrushIE, AsianCrushIE,
AsianCrushPlaylistIE, AsianCrushPlaylistIE,
@ -131,7 +132,6 @@ from .bleacherreport import (
BleacherReportIE, BleacherReportIE,
BleacherReportCMSIE, BleacherReportCMSIE,
) )
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE from .bongacams import BongaCamsIE
@ -190,7 +190,11 @@ from .cbsnews import (
CBSNewsIE, CBSNewsIE,
CBSNewsLiveVideoIE, CBSNewsLiveVideoIE,
) )
from .cbssports import CBSSportsIE from .cbssports import (
CBSSportsEmbedIE,
CBSSportsIE,
TwentyFourSevenSportsIE,
)
from .ccc import ( from .ccc import (
CCCIE, CCCIE,
CCCPlaylistIE, CCCPlaylistIE,
@ -594,7 +598,11 @@ from .limelight import (
LimelightChannelIE, LimelightChannelIE,
LimelightChannelListIE, LimelightChannelListIE,
) )
from .line import LineTVIE from .line import (
LineTVIE,
LineLiveIE,
LineLiveChannelIE,
)
from .linkedin import ( from .linkedin import (
LinkedInLearningIE, LinkedInLearningIE,
LinkedInLearningCourseIE, LinkedInLearningCourseIE,
@ -602,10 +610,6 @@ from .linkedin import (
from .linuxacademy import LinuxAcademyIE from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE from .litv import LiTVIE
from .livejournal import LiveJournalIE from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
)
from .livestream import ( from .livestream import (
LivestreamIE, LivestreamIE,
LivestreamOriginalIE, LivestreamOriginalIE,
@ -631,6 +635,7 @@ from .mangomolo import (
MangomoloLiveIE, MangomoloLiveIE,
) )
from .manyvids import ManyVidsIE from .manyvids import ManyVidsIE
from .maoritv import MaoriTVIE
from .markiza import ( from .markiza import (
MarkizaIE, MarkizaIE,
MarkizaPageIE, MarkizaPageIE,
@ -878,6 +883,11 @@ from .packtpub import (
PacktPubIE, PacktPubIE,
PacktPubCourseIE, PacktPubCourseIE,
) )
from .palcomp3 import (
PalcoMP3IE,
PalcoMP3ArtistIE,
PalcoMP3VideoIE,
)
from .pandoratv import PandoraTVIE from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
@ -911,6 +921,7 @@ from .platzi import (
from .playfm import PlayFMIE from .playfm import PlayFMIE
from .playplustv import PlayPlusTVIE from .playplustv import PlayPlusTVIE
from .plays import PlaysTVIE from .plays import PlaysTVIE
from .playstuff import PlayStuffIE
from .playtvak import PlaytvakIE from .playtvak import PlaytvakIE
from .playvid import PlayvidIE from .playvid import PlayvidIE
from .playwire import PlaywireIE from .playwire import PlaywireIE

View File

@ -521,7 +521,10 @@ class FacebookIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'The video is not available, Facebook said: "%s"' % m_msg.group(1), 'The video is not available, Facebook said: "%s"' % m_msg.group(1),
expected=True) expected=True)
elif '>You must log in to continue' in webpage: elif any(p in webpage for p in (
'>You must log in to continue',
'id="login_form"',
'id="loginbutton"')):
self.raise_login_required() self.raise_login_required()
if not video_data and '/watchparty/' in url: if not video_data and '/watchparty/' in url:

View File

@ -5,29 +5,23 @@ from .common import InfoExtractor
class Formula1IE(InfoExtractor): class Formula1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?formula1\.com/(?:content/fom-website/)?en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:www\.)?formula1\.com/en/latest/video\.[^.]+\.(?P<id>\d+)\.html'
_TESTS = [{ _TEST = {
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html', 'url': 'https://www.formula1.com/en/latest/video.race-highlights-spain-2016.6060988138001.html',
'md5': '8c79e54be72078b26b89e0e111c0502b', 'md5': 'be7d3a8c2f804eb2ab2aa5d941c359f8',
'info_dict': { 'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV', 'id': '6060988138001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Race highlights - Spain 2016', 'title': 'Race highlights - Spain 2016',
'timestamp': 1463332814,
'upload_date': '20160515',
'uploader_id': '6057949432001',
}, },
'params': { 'add_ie': ['BrightcoveNew'],
# m3u8 download }
'skip_download': True, BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/6057949432001/S1WMrhjlh_default/index.html?videoId=%s'
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) bc_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
ooyala_embed_code = self._search_regex(
r'data-videoid="([^"]+)"', webpage, 'ooyala embed code')
return self.url_result( return self.url_result(
'ooyala:%s' % ooyala_embed_code, 'Ooyala', ooyala_embed_code) self.BRIGHTCOVE_URL_TEMPLATE % bc_id, 'BrightcoveNew', bc_id)

View File

@ -383,6 +383,10 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
}, { }, {
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin', 'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
'only_matching': True, 'only_matching': True,
}, {
# "<figure id=" pattern (#28792)
'url': 'https://www.francetvinfo.fr/culture/patrimoine/incendie-de-notre-dame-de-paris/notre-dame-de-paris-de-l-incendie-de-la-cathedrale-a-sa-reconstruction_4372291.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -399,7 +403,8 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)', (r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)', r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'), r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id') webpage, 'video id')
return self._make_url_result(video_id) return self._make_url_result(video_id)

View File

@ -16,7 +16,7 @@ from ..utils import (
class FunimationIE(InfoExtractor): class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/shows/[^/]+/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:[^/]+/)?shows/[^/]+/(?P<id>[^/?#&]+)'
_NETRC_MACHINE = 'funimation' _NETRC_MACHINE = 'funimation'
_TOKEN = None _TOKEN = None
@ -51,6 +51,10 @@ class FunimationIE(InfoExtractor):
}, { }, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/', 'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True, 'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}] }]
def _login(self): def _login(self):

View File

@ -6,6 +6,7 @@ from .common import InfoExtractor
from .kaltura import KalturaIE from .kaltura import KalturaIE
from ..utils import ( from ..utils import (
HEADRequest, HEADRequest,
remove_start,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
urlencode_postdata, urlencode_postdata,
@ -102,6 +103,26 @@ class GDCVaultIE(InfoExtractor):
'format': 'mp4-408', 'format': 'mp4-408',
}, },
}, },
{
# Kaltura embed, whitespace between quote and embedded URL in iframe's src
'url': 'https://www.gdcvault.com/play/1025699',
'info_dict': {
'id': '0_zagynv0a',
'ext': 'mp4',
'title': 'Tech Toolbox',
'upload_date': '20190408',
'uploader_id': 'joe@blazestreaming.com',
'timestamp': 1554764629,
},
'params': {
'skip_download': True,
},
},
{
# HTML5 video
'url': 'http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru',
'only_matching': True,
},
] ]
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
@ -175,7 +196,18 @@ class GDCVaultIE(InfoExtractor):
xml_name = self._html_search_regex( xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>', r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename') start_page, 'xml filename', default=None)
if not xml_name:
info = self._parse_html5_media_entries(url, start_page, video_id)[0]
info.update({
'title': remove_start(self._search_regex(
r'>Session Name:\s*<.*?>\s*<td>(.+?)</td>', start_page,
'title', default=None) or self._og_search_title(
start_page, default=None), 'GDC Vault - '),
'id': video_id,
'display_id': display_id,
})
return info
embed_url = '%s/xml/%s' % (xml_root, xml_name) embed_url = '%s/xml/%s' % (xml_root, xml_name)
ie_key = 'DigitallySpeaking' ie_key = 'DigitallySpeaking'

View File

@ -84,7 +84,6 @@ from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .arkena import ArkenaIE from .arkena import ArkenaIE
from .instagram import InstagramIE from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
@ -126,6 +125,7 @@ from .viqeo import ViqeoIE
from .expressen import ExpressenIE from .expressen import ExpressenIE
from .zype import ZypeIE from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .vk import VKIE
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .arcpublishing import ArcPublishingIE from .arcpublishing import ArcPublishingIE
from .medialaan import MedialaanIE from .medialaan import MedialaanIE
@ -1628,31 +1628,6 @@ class GenericIE(InfoExtractor):
'upload_date': '20160409', 'upload_date': '20160409',
}, },
}, },
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': '7619da8c820e835bef21a1efa2a0fc71',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Another LiveLeak embed pattern (#13336)
{
'url': 'https://milo.yiannopoulos.net/2017/06/concealed-carry-robbery/',
'info_dict': {
'id': '2eb_1496309988',
'ext': 'mp4',
'title': 'Thief robs place where everyone was armed',
'description': 'md5:694d73ee79e535953cf2488562288eee',
'uploader': 'brazilwtf',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Duplicated embedded video URLs # Duplicated embedded video URLs
{ {
'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443', 'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
@ -2248,6 +2223,11 @@ class GenericIE(InfoExtractor):
}, },
'playlist_mincount': 52, 'playlist_mincount': 52,
}, },
{
# Sibnet embed (https://help.sibnet.ru/?sibnet_video_embed)
'url': 'https://phpbb3.x-tk.ru/bbcode-video-sibnet-t24.html',
'only_matching': True,
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@ -2777,6 +2757,11 @@ class GenericIE(InfoExtractor):
if odnoklassniki_url: if odnoklassniki_url:
return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key()) return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
# Look for sibnet embedded player
sibnet_urls = VKIE._extract_sibnet_urls(webpage)
if sibnet_urls:
return self.playlist_from_matches(sibnet_urls, video_id, video_title)
# Look for embedded ivi player # Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage) mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
if mobj is not None: if mobj is not None:
@ -3168,11 +3153,6 @@ class GenericIE(InfoExtractor):
return self.url_result( return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key()) self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_urls = LiveLeakIE._extract_urls(webpage)
if liveleak_urls:
return self.playlist_from_matches(liveleak_urls, video_id, video_title)
# Look for 3Q SDN embeds # Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage) threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
if threeqsdn_url: if threeqsdn_url:
@ -3400,6 +3380,9 @@ class GenericIE(InfoExtractor):
'url': src, 'url': src,
'ext': (mimetype2ext(src_type) 'ext': (mimetype2ext(src_type)
or ext if ext in KNOWN_EXTENSIONS else 'mp4'), or ext if ext in KNOWN_EXTENSIONS else 'mp4'),
'http_headers': {
'Referer': full_response.geturl(),
},
}) })
if formats: if formats:
self._sort_formats(formats) self._sort_formats(formats)
@ -3468,7 +3451,7 @@ class GenericIE(InfoExtractor):
m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage) m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player: # We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None: if m_video_type is not None:
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)) found = filter_video(re.findall(r'<meta.*?property="og:(?:video|audio)".*?content="(.*?)"', webpage))
if not found: if not found:
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)' REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
found = re.search( found = re.search(

View File

@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re import re
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
determine_ext, determine_ext,
parse_age_limit, parse_age_limit,
try_get,
urlencode_postdata, urlencode_postdata,
ExtractorError, ExtractorError,
) )
@ -116,6 +118,18 @@ class GoIE(AdobePassIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot',
'info_dict': {
'id': 'VDKA22600213',
'ext': 'mp4',
'title': 'Pilot',
'description': 'md5:74306df917cfc199d76d061d66bebdb4',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True, 'only_matching': True,
@ -149,14 +163,30 @@ class GoIE(AdobePassIE):
brand = site_info.get('brand') brand = site_info.get('brand')
if not video_id or not site_info: if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id) webpage = self._download_webpage(url, display_id or video_id)
video_id = self._search_regex( data = self._parse_json(
( self._search_regex(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'" r'["\']__abc_com__["\']\s*\]\s*=\s*({.+?})\s*;', webpage,
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood 'data', default='{}'),
r'data-video-id=["\']*(VDKA\w+)', display_id or video_id, fatal=False)
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet # https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)' layout = try_get(data, lambda x: x['page']['content']['video']['layout'], dict)
), webpage, 'video id', default=video_id) video_id = None
if layout:
video_id = try_get(
layout,
(lambda x: x['videoid'], lambda x: x['video']['id']),
compat_str)
if not video_id:
video_id = self._search_regex(
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# page.analytics.videoIdCode
r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
), webpage, 'video id', default=video_id)
if not site_info: if not site_info:
brand = self._search_regex( brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)', (r'data-brand=\s*["\']\s*(\d+)',

View File

@ -12,6 +12,7 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none,
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
@ -32,6 +33,7 @@ class InstagramIE(InfoExtractor):
'title': 'Video by naomipq', 'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8', 'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1371748545, 'timestamp': 1371748545,
'upload_date': '20130620', 'upload_date': '20130620',
'uploader_id': 'naomipq', 'uploader_id': 'naomipq',
@ -48,6 +50,7 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by britneyspears', 'title': 'Video by britneyspears',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1453760977, 'timestamp': 1453760977,
'upload_date': '20160125', 'upload_date': '20160125',
'uploader_id': 'britneyspears', 'uploader_id': 'britneyspears',
@ -86,6 +89,24 @@ class InstagramIE(InfoExtractor):
'title': 'Post by instagram', 'title': 'Post by instagram',
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957', 'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
}, },
}, {
# IGTV
'url': 'https://www.instagram.com/tv/BkfuX9UB-eK/',
'info_dict': {
'id': 'BkfuX9UB-eK',
'ext': 'mp4',
'title': 'Fingerboarding Tricks with @cass.fb',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 53.83,
'timestamp': 1530032919,
'upload_date': '20180626',
'uploader_id': 'instagram',
'uploader': 'Instagram',
'like_count': int,
'comment_count': int,
'comments': list,
'description': 'Meet Cass Hirst (@cass.fb), a fingerboarding pro who can perform tiny ollies and kickflips while blindfolded.',
}
}, { }, {
'url': 'https://instagram.com/p/-Cmh1cukG2/', 'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True, 'only_matching': True,
@ -159,7 +180,9 @@ class InstagramIE(InfoExtractor):
description = try_get( description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'], media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption') compat_str) or media.get('caption')
title = media.get('title')
thumbnail = media.get('display_src') or media.get('display_url') thumbnail = media.get('display_src') or media.get('display_url')
duration = float_or_none(media.get('video_duration'))
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date')) timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name') uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username') uploader_id = media.get('owner', {}).get('username')
@ -200,9 +223,10 @@ class InstagramIE(InfoExtractor):
continue continue
entries.append({ entries.append({
'id': node.get('shortcode') or node['id'], 'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num, 'title': node.get('title') or 'Video %d' % edge_num,
'url': node_video_url, 'url': node_video_url,
'thumbnail': node.get('display_url'), 'thumbnail': node.get('display_url'),
'duration': float_or_none(node.get('video_duration')),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])), 'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])), 'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')), 'view_count': int_or_none(node.get('video_view_count')),
@ -239,8 +263,9 @@ class InstagramIE(InfoExtractor):
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by %s' % uploader_id, 'title': title or 'Video by %s' % uploader_id,
'description': description, 'description': description,
'duration': duration,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'timestamp': timestamp, 'timestamp': timestamp,
'uploader_id': uploader_id, 'uploader_id': uploader_id,

View File

@ -29,34 +29,51 @@ class JamendoIE(InfoExtractor):
'id': '196219', 'id': '196219',
'display_id': 'stories-from-emona-i', 'display_id': 'stories-from-emona-i',
'ext': 'flac', 'ext': 'flac',
'title': 'Maya Filipič - Stories from Emona I', # 'title': 'Maya Filipič - Stories from Emona I',
'artist': 'Maya Filipič', 'title': 'Stories from Emona I',
# 'artist': 'Maya Filipič',
'track': 'Stories from Emona I', 'track': 'Stories from Emona I',
'duration': 210, 'duration': 210,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1217438117, 'timestamp': 1217438117,
'upload_date': '20080730', 'upload_date': '20080730',
'license': 'by-nc-nd',
'view_count': int,
'like_count': int,
'average_rating': int,
'tags': ['piano', 'peaceful', 'newage', 'strings', 'upbeat'],
} }
}, { }, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock', 'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True, 'only_matching': True,
}] }]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url): def _real_extract(self, url):
track_id, display_id = self._VALID_URL_RE.match(url).groups() track_id, display_id = self._VALID_URL_RE.match(url).groups()
webpage = self._download_webpage( # webpage = self._download_webpage(
'https://www.jamendo.com/track/' + track_id, track_id) # 'https://www.jamendo.com/track/' + track_id, track_id)
models = self._parse_json(self._html_search_regex( # models = self._parse_json(self._html_search_regex(
r"data-bundled-models='([^']+)", # r"data-bundled-models='([^']+)",
webpage, 'bundled models'), track_id) # webpage, 'bundled models'), track_id)
track = models['track']['models'][0] # track = models['track']['models'][0]
track = self._call_api('track', track_id)
title = track_name = track['name'] title = track_name = track['name']
get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {} # get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
artist = get_model('artist') # artist = get_model('artist')
artist_name = artist.get('name') # artist_name = artist.get('name')
if artist_name: # if artist_name:
title = '%s - %s' % (artist_name, title) # title = '%s - %s' % (artist_name, title)
album = get_model('album') # album = get_model('album')
formats = [{ formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294' 'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@ -74,7 +91,7 @@ class JamendoIE(InfoExtractor):
urls = [] urls = []
thumbnails = [] thumbnails = []
for _, covers in track.get('cover', {}).items(): for covers in (track.get('cover') or {}).values():
for cover_id, cover_url in covers.items(): for cover_id, cover_url in covers.items():
if not cover_url or cover_url in urls: if not cover_url or cover_url in urls:
continue continue
@ -88,13 +105,14 @@ class JamendoIE(InfoExtractor):
}) })
tags = [] tags = []
for tag in track.get('tags', []): for tag in (track.get('tags') or []):
tag_name = tag.get('name') tag_name = tag.get('name')
if not tag_name: if not tag_name:
continue continue
tags.append(tag_name) tags.append(tag_name)
stats = track.get('stats') or {} stats = track.get('stats') or {}
license = track.get('licenseCC') or []
return { return {
'id': track_id, 'id': track_id,
@ -103,11 +121,11 @@ class JamendoIE(InfoExtractor):
'title': title, 'title': title,
'description': track.get('description'), 'description': track.get('description'),
'duration': int_or_none(track.get('duration')), 'duration': int_or_none(track.get('duration')),
'artist': artist_name, # 'artist': artist_name,
'track': track_name, 'track': track_name,
'album': album.get('name'), # 'album': album.get('name'),
'formats': formats, 'formats': formats,
'license': '-'.join(track.get('licenseCC', [])) or None, 'license': '-'.join(license) if license else None,
'timestamp': int_or_none(track.get('dateCreated')), 'timestamp': int_or_none(track.get('dateCreated')),
'view_count': int_or_none(stats.get('listenedAll')), 'view_count': int_or_none(stats.get('listenedAll')),
'like_count': int_or_none(stats.get('favorited')), 'like_count': int_or_none(stats.get('favorited')),
@ -116,9 +134,9 @@ class JamendoIE(InfoExtractor):
} }
class JamendoAlbumIE(InfoExtractor): class JamendoAlbumIE(JamendoIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
'url': 'https://www.jamendo.com/album/121486/duck-on-cover', 'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': { 'info_dict': {
'id': '121486', 'id': '121486',
@ -151,17 +169,7 @@ class JamendoAlbumIE(InfoExtractor):
'params': { 'params': {
'playlistend': 2 'playlistend': 2
} }
} }]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url): def _real_extract(self, url):
album_id = self._match_id(url) album_id = self._match_id(url)
@ -169,7 +177,7 @@ class JamendoAlbumIE(InfoExtractor):
album_name = album.get('name') album_name = album.get('name')
entries = [] entries = []
for track in album.get('tracks', []): for track in (album.get('tracks') or []):
track_id = track.get('id') track_id = track.get('id')
if not track_id: if not track_id:
continue continue

View File

@ -120,7 +120,7 @@ class KalturaIE(InfoExtractor):
def _extract_urls(webpage): def _extract_urls(webpage):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
finditer = ( finditer = (
re.finditer( list(re.finditer(
r"""(?xs) r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\( kWidget\.(?:thumb)?[Ee]mbed\(
\{.*? \{.*?
@ -128,8 +128,8 @@ class KalturaIE(InfoExtractor):
(?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*? (?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s* (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\}) (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage) """, webpage))
or re.finditer( or list(re.finditer(
r'''(?xs) r'''(?xs)
(?P<q1>["']) (?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)* (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@ -142,16 +142,16 @@ class KalturaIE(InfoExtractor):
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s* \[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
) )
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3) (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage) ''', webpage))
or re.finditer( or list(re.finditer(
r'''(?xs) r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["']) <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])\s*
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+) (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+) [?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
(?P=q1) (?P=q1)
''', webpage) ''', webpage))
) )
urls = [] urls = []
for mobj in finditer: for mobj in finditer:

View File

@ -120,6 +120,26 @@ class LBRYIE(LBRYBaseIE):
'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212', 'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212',
'vcodec': 'none', 'vcodec': 'none',
} }
}, {
# HLS
'url': 'https://odysee.com/@gardeningincanada:b/plants-i-will-never-grow-again.-the:e',
'md5': 'fc82f45ea54915b1495dd7cb5cc1289f',
'info_dict': {
'id': 'e51671357333fe22ae88aad320bde2f6f96b1410',
'ext': 'mp4',
'title': 'PLANTS I WILL NEVER GROW AGAIN. THE BLACK LIST PLANTS FOR A CANADIAN GARDEN | Gardening in Canada 🍁',
'description': 'md5:9c539c6a03fb843956de61a4d5288d5e',
'timestamp': 1618254123,
'upload_date': '20210412',
'release_timestamp': 1618254002,
'release_date': '20210412',
'tags': list,
'duration': 554,
'channel': 'Gardening In Canada',
'channel_id': 'b8be0e93b423dad221abe29545fbe8ec36e806bc',
'channel_url': 'https://odysee.com/@gardeningincanada:b8be0e93b423dad221abe29545fbe8ec36e806bc',
'formats': 'mincount:3',
}
}, { }, {
'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e', 'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e',
'only_matching': True, 'only_matching': True,
@ -163,10 +183,18 @@ class LBRYIE(LBRYBaseIE):
streaming_url = self._call_api_proxy( streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url'] 'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
info = self._parse_stream(result, url) info = self._parse_stream(result, url)
urlh = self._request_webpage(
streaming_url, display_id, note='Downloading streaming redirect url info')
if determine_ext(urlh.geturl()) == 'm3u8':
info['formats'] = self._extract_m3u8_formats(
urlh.geturl(), display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(info['formats'])
else:
info['url'] = streaming_url
info.update({ info.update({
'id': claim_id, 'id': claim_id,
'title': title, 'title': title,
'url': streaming_url,
}) })
return info return info

View File

@ -4,7 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
str_or_none,
)
class LineTVIE(InfoExtractor): class LineTVIE(InfoExtractor):
@ -88,3 +94,137 @@ class LineTVIE(InfoExtractor):
for thumbnail in video_info.get('thumbnails', {}).get('list', [])], for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
'view_count': video_info.get('meta', {}).get('count'), 'view_count': video_info.get('meta', {}).get('count'),
} }
class LineLiveBaseIE(InfoExtractor):
_API_BASE_URL = 'https://live-api.line-apps.com/web/v4.0/channel/'
def _parse_broadcast_item(self, item):
broadcast_id = compat_str(item['id'])
title = item['title']
is_live = item.get('isBroadcastingNow')
thumbnails = []
for thumbnail_id, thumbnail_url in (item.get('thumbnailURLs') or {}).items():
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
})
channel = item.get('channel') or {}
channel_id = str_or_none(channel.get('id'))
return {
'id': broadcast_id,
'title': self._live_title(title) if is_live else title,
'thumbnails': thumbnails,
'timestamp': int_or_none(item.get('createdAt')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': 'https://live.line.me/channels/' + channel_id if channel_id else None,
'duration': int_or_none(item.get('archiveDuration')),
'view_count': int_or_none(item.get('viewerCount')),
'comment_count': int_or_none(item.get('chatCount')),
'is_live': is_live,
}
class LineLiveIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<channel_id>\d+)/broadcast/(?P<id>\d+)'
_TESTS = [{
'url': 'https://live.line.me/channels/4867368/broadcast/16331360',
'md5': 'bc931f26bf1d4f971e3b0982b3fab4a3',
'info_dict': {
'id': '16331360',
'title': '振りコピ講座😙😙😙',
'ext': 'mp4',
'timestamp': 1617095132,
'upload_date': '20210330',
'channel': '白川ゆめか',
'channel_id': '4867368',
'view_count': int,
'comment_count': int,
'is_live': False,
}
}, {
# archiveStatus == 'DELETED'
'url': 'https://live.line.me/channels/4778159/broadcast/16378488',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id, broadcast_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
self._API_BASE_URL + '%s/broadcast/%s' % (channel_id, broadcast_id),
broadcast_id)
item = broadcast['item']
info = self._parse_broadcast_item(item)
protocol = 'm3u8' if info['is_live'] else 'm3u8_native'
formats = []
for k, v in (broadcast.get(('live' if info['is_live'] else 'archived') + 'HLSURLs') or {}).items():
if not v:
continue
if k == 'abr':
formats.extend(self._extract_m3u8_formats(
v, broadcast_id, 'mp4', protocol,
m3u8_id='hls', fatal=False))
continue
f = {
'ext': 'mp4',
'format_id': 'hls-' + k,
'protocol': protocol,
'url': v,
}
if not k.isdigit():
f['vcodec'] = 'none'
formats.append(f)
if not formats:
archive_status = item.get('archiveStatus')
if archive_status != 'ARCHIVED':
raise ExtractorError('this video has been ' + archive_status.lower(), expected=True)
self._sort_formats(formats)
info['formats'] = formats
return info
class LineLiveChannelIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<id>\d+)(?!/broadcast/\d+)(?:[/?&#]|$)'
_TEST = {
'url': 'https://live.line.me/channels/5893542',
'info_dict': {
'id': '5893542',
'title': 'いくらちゃん',
'description': 'md5:c3a4af801f43b2fac0b02294976580be',
},
'playlist_mincount': 29
}
def _archived_broadcasts_entries(self, archived_broadcasts, channel_id):
while True:
for row in (archived_broadcasts.get('rows') or []):
share_url = str_or_none(row.get('shareURL'))
if not share_url:
continue
info = self._parse_broadcast_item(row)
info.update({
'_type': 'url',
'url': share_url,
'ie_key': LineLiveIE.ie_key(),
})
yield info
if not archived_broadcasts.get('hasNextPage'):
return
archived_broadcasts = self._download_json(
self._API_BASE_URL + channel_id + '/archived_broadcasts',
channel_id, query={
'lastId': info['id'],
})
def _real_extract(self, url):
channel_id = self._match_id(url)
channel = self._download_json(self._API_BASE_URL + channel_id, channel_id)
return self.playlist_result(
self._archived_broadcasts_entries(channel.get('archivedBroadcasts') or {}, channel_id),
channel_id, channel.get('title'), channel.get('information'))

View File

@ -1,191 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class LiveLeakIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?liveleak\.com/view\?.*?\b[it]=(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.liveleak.com/view?i=757_1364311680',
'md5': '0813c2430bea7a46bf13acf3406992f4',
'info_dict': {
'id': '757_1364311680',
'ext': 'mp4',
'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2',
'title': 'Most unlucky car accident',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
'url': 'http://www.liveleak.com/view?i=f93_1390833151',
'md5': 'd3f1367d14cc3c15bf24fbfbe04b9abf',
'info_dict': {
'id': 'f93_1390833151',
'ext': 'mp4',
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Prochan embed
'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
'md5': '42c6d97d54f1db107958760788c5f48f',
'info_dict': {
'id': '4f7_1392687779',
'ext': 'mp4',
'description': "The guy with the cigarette seems amazingly nonchalant about the whole thing... I really hope my friends' reactions would be a bit stronger.\r\n\r\nAction-go to 0:55.",
'uploader': 'CapObveus',
'title': 'Man is Fatally Struck by Reckless Car While Packing up a Moving Truck',
'age_limit': 18,
},
'skip': 'Video is dead',
}, {
# Covers https://github.com/ytdl-org/youtube-dl/pull/5983
# Multiple resolutions
'url': 'http://www.liveleak.com/view?i=801_1409392012',
'md5': 'c3a449dbaca5c0d1825caecd52a57d7b',
'info_dict': {
'id': '801_1409392012',
'ext': 'mp4',
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Covers https://github.com/ytdl-org/youtube-dl/pull/10664#issuecomment-247439521
'url': 'http://m.liveleak.com/view?i=763_1473349649',
'add_ie': ['Youtube'],
'info_dict': {
'id': '763_1473349649',
'ext': 'mp4',
'title': 'Reporters and public officials ignore epidemic of black on asian violence in Sacramento | Colin Flaherty',
'description': 'Colin being the warrior he is and showing the injustice Asians in Sacramento are being subjected to.',
'uploader': 'Ziz',
'upload_date': '20160908',
'uploader_id': 'UCEbta5E_jqlZmEJsriTEtnw'
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.liveleak.com/view?i=677_1439397581',
'info_dict': {
'id': '677_1439397581',
'title': 'Fuel Depot in China Explosion caught on video',
},
'playlist_count': 3,
}, {
'url': 'https://www.liveleak.com/view?t=HvHi_1523016227',
'only_matching': True,
}, {
# No original video
'url': 'https://www.liveleak.com/view?t=C26ZZ_1558612804',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[ift]=[\w_]+[^"]+)"',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None))
video_thumbnail = self._og_search_thumbnail(webpage)
entries = self._parse_html5_media_entries(url, webpage, video_id)
if not entries:
# Maybe an embed?
embed_url = self._search_regex(
r'<iframe[^>]+src="((?:https?:)?//(?:www\.)?(?:prochan|youtube)\.com/embed[^"]+)"',
webpage, 'embed URL')
return {
'_type': 'url_transparent',
'url': embed_url,
'id': video_id,
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
}
for idx, info_dict in enumerate(entries):
formats = []
for a_format in info_dict['formats']:
if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None))
formats.append(a_format)
# Removing '.*.mp4' gives the raw video, which is essentially
# the same video without the LiveLeak logo at the top (see
# https://github.com/ytdl-org/youtube-dl/pull/4768)
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
format_id = 'original' + ('-' + format_id if format_id else '')
if self._is_valid_url(orig_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': orig_url,
'preference': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats
# Don't append entry ID for one-video pages to keep backward compatibility
if len(entries) > 1:
info_dict['id'] = '%s_%s' % (video_id, idx + 1)
else:
info_dict['id'] = video_id
info_dict.update({
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
'thumbnail': video_thumbnail,
})
return self.playlist_result(entries, video_id, video_title)
class LiveLeakEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[ift])=(?P<id>[\w_]+)'
# See generic.py for actual test cases
_TESTS = [{
'url': 'https://www.liveleak.com/ll_embed?i=874_1459135191',
'only_matching': True,
}, {
'url': 'https://www.liveleak.com/ll_embed?f=ab065df993c1',
'only_matching': True,
}]
def _real_extract(self, url):
kind, video_id = re.match(self._VALID_URL, url).groups()
if kind == 'f':
webpage = self._download_webpage(url, video_id)
liveleak_url = self._search_regex(
r'(?:logourl\s*:\s*|window\.open\()(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
webpage, 'LiveLeak URL', group='url')
else:
liveleak_url = 'http://www.liveleak.com/view?%s=%s' % (kind, video_id)
return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key())

View File

@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MaoriTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?maoritelevision\.com/shows/(?:[^/]+/)+(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.maoritelevision.com/shows/korero-mai/S01E054/korero-mai-series-1-episode-54',
'md5': '5ade8ef53851b6a132c051b1cd858899',
'info_dict': {
'id': '4774724855001',
'ext': 'mp4',
'title': 'Kōrero Mai, Series 1 Episode 54',
'upload_date': '20160226',
'timestamp': 1456455018,
'description': 'md5:59bde32fd066d637a1a55794c56d8dcb',
'uploader_id': '1614493167001',
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1614493167001/HJlhIQhQf_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(
r'data-main-video-id=["\'](\d+)', webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'BrightcoveNew', brightcove_id)

View File

@ -15,33 +15,39 @@ from ..utils import (
class MedalTVIE(InfoExtractor): class MedalTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://medal.tv/clips/34934644/3Is9zyGMoBMr', 'url': 'https://medal.tv/clips/2mA60jWAGQCBH',
'md5': '7b07b064331b1cf9e8e5c52a06ae68fa', 'md5': '7b07b064331b1cf9e8e5c52a06ae68fa',
'info_dict': { 'info_dict': {
'id': '34934644', 'id': '2mA60jWAGQCBH',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Quad Cold', 'title': 'Quad Cold',
'description': 'Medal,https://medal.tv/desktop/', 'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'MowgliSB', 'uploader': 'MowgliSB',
'timestamp': 1603165266, 'timestamp': 1603165266,
'upload_date': '20201020', 'upload_date': '20201020',
'uploader_id': 10619174, 'uploader_id': '10619174',
} }
}, { }, {
'url': 'https://medal.tv/clips/36787208', 'url': 'https://medal.tv/clips/2um24TWdty0NA',
'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148', 'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148',
'info_dict': { 'info_dict': {
'id': '36787208', 'id': '2um24TWdty0NA',
'ext': 'mp4', 'ext': 'mp4',
'title': 'u tk me i tk u bigger', 'title': 'u tk me i tk u bigger',
'description': 'Medal,https://medal.tv/desktop/', 'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'Mimicc', 'uploader': 'Mimicc',
'timestamp': 1605580939, 'timestamp': 1605580939,
'upload_date': '20201117', 'upload_date': '20201117',
'uploader_id': 5156321, 'uploader_id': '5156321',
} }
}, {
'url': 'https://medal.tv/clips/37rMeFpryCC-9',
'only_matching': True,
}, {
'url': 'https://medal.tv/clips/2WRj40tpY_EU9',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -255,7 +255,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
@staticmethod @staticmethod
def _extract_child_with_type(parent, t): def _extract_child_with_type(parent, t):
return next(c for c in parent['children'] if c.get('type') == t) for c in parent['children']:
if c.get('type') == t:
return c
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
try: try:
@ -286,7 +288,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
data = self._parse_json(self._search_regex( data = self._parse_json(self._search_regex(
r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None) r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None)
main_container = self._extract_child_with_type(data, 'MainContainer') main_container = self._extract_child_with_type(data, 'MainContainer')
video_player = self._extract_child_with_type(main_container, 'VideoPlayer') ab_testing = self._extract_child_with_type(main_container, 'ABTesting')
video_player = self._extract_child_with_type(ab_testing or main_container, 'VideoPlayer')
mgid = video_player['props']['media']['video']['config']['uri'] mgid = video_player['props']['media']['video']['config']['uri']
return mgid return mgid

View File

@ -58,7 +58,7 @@ class NRKBaseIE(InfoExtractor):
def _call_api(self, path, video_id, item=None, note=None, fatal=True, query=None): def _call_api(self, path, video_id, item=None, note=None, fatal=True, query=None):
return self._download_json( return self._download_json(
urljoin('http://psapi.nrk.no/', path), urljoin('https://psapi.nrk.no/', path),
video_id, note or 'Downloading %s JSON' % item, video_id, note or 'Downloading %s JSON' % item,
fatal=fatal, query=query, fatal=fatal, query=query,
headers={'Accept-Encoding': 'gzip, deflate, br'}) headers={'Accept-Encoding': 'gzip, deflate, br'})

View File

@ -98,6 +98,9 @@ class ORFTVthekIE(InfoExtractor):
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id=format_id, fatal=False)) src, video_id, f4m_id=format_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id=format_id, fatal=False))
else: else:
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
@ -140,6 +143,25 @@ class ORFTVthekIE(InfoExtractor):
}) })
upload_date = unified_strdate(sd.get('created_date')) upload_date = unified_strdate(sd.get('created_date'))
thumbnails = []
preview = sd.get('preview_image_url')
if preview:
thumbnails.append({
'id': 'preview',
'url': preview,
'preference': 0,
})
image = sd.get('image_full_url')
if not image and len(data_jsb) == 1:
image = self._og_search_thumbnail(webpage)
if image:
thumbnails.append({
'id': 'full',
'url': image,
'preference': 1,
})
entries.append({ entries.append({
'_type': 'video', '_type': 'video',
'id': video_id, 'id': video_id,
@ -149,7 +171,7 @@ class ORFTVthekIE(InfoExtractor):
'description': sd.get('description'), 'description': sd.get('description'),
'duration': int_or_none(sd.get('duration_in_seconds')), 'duration': int_or_none(sd.get('duration_in_seconds')),
'upload_date': upload_date, 'upload_date': upload_date,
'thumbnail': sd.get('image_full_url'), 'thumbnails': thumbnails,
}) })
return { return {
@ -182,7 +204,7 @@ class ORFRadioIE(InfoExtractor):
duration = end - start if end and start else None duration = end - start if end and start else None
entries.append({ entries.append({
'id': loop_stream_id.replace('.mp3', ''), 'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id), 'url': 'https://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'title': title, 'title': title,
'description': clean_html(data.get('subtitle')), 'description': clean_html(data.get('subtitle')),
'duration': duration, 'duration': duration,

View File

@ -0,0 +1,148 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
try_get,
)
class PalcoMP3BaseIE(InfoExtractor):
_GQL_QUERY_TMPL = '''{
artist(slug: "%s") {
%s
}
}'''
_ARTIST_FIELDS_TMPL = '''music(slug: "%%s") {
%s
}'''
_MUSIC_FIELDS = '''duration
hls
mp3File
musicID
plays
title'''
def _call_api(self, artist_slug, artist_fields):
return self._download_json(
'https://www.palcomp3.com.br/graphql/', artist_slug, query={
'query': self._GQL_QUERY_TMPL % (artist_slug, artist_fields),
})['data']
def _parse_music(self, music):
music_id = compat_str(music['musicID'])
title = music['title']
formats = []
hls_url = music.get('hls')
if hls_url:
formats.append({
'url': hls_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
mp3_file = music.get('mp3File')
if mp3_file:
formats.append({
'url': mp3_file,
})
return {
'id': music_id,
'title': title,
'formats': formats,
'duration': int_or_none(music.get('duration')),
'view_count': int_or_none(music.get('plays')),
}
def _real_initialize(self):
self._ARTIST_FIELDS_TMPL = self._ARTIST_FIELDS_TMPL % self._MUSIC_FIELDS
def _real_extract(self, url):
artist_slug, music_slug = re.match(self._VALID_URL, url).groups()
artist_fields = self._ARTIST_FIELDS_TMPL % music_slug
music = self._call_api(artist_slug, artist_fields)['artist']['music']
return self._parse_music(music)
class PalcoMP3IE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:song'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/nossas-composicoes-cuida-bem-dela/',
'md5': '99fd6405b2d8fd589670f6db1ba3b358',
'info_dict': {
'id': '3162927',
'ext': 'mp3',
'title': 'Nossas Composições - CUIDA BEM DELA',
'duration': 210,
'view_count': int,
}
}]
@classmethod
def suitable(cls, url):
return False if PalcoMP3VideoIE.suitable(url) else super(PalcoMP3IE, cls).suitable(url)
class PalcoMP3ArtistIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:artist'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com.br/condedoforro/',
'info_dict': {
'id': '358396',
'title': 'Conde do Forró',
},
'playlist_mincount': 188,
}]
_ARTIST_FIELDS_TMPL = '''artistID
musics {
nodes {
%s
}
}
name'''
@ classmethod
def suitable(cls, url):
return False if re.match(PalcoMP3IE._VALID_URL, url) else super(PalcoMP3ArtistIE, cls).suitable(url)
def _real_extract(self, url):
artist_slug = self._match_id(url)
artist = self._call_api(artist_slug, self._ARTIST_FIELDS_TMPL)['artist']
def entries():
for music in (try_get(artist, lambda x: x['musics']['nodes'], list) or []):
yield self._parse_music(music)
return self.playlist_result(
entries(), str_or_none(artist.get('artistID')), artist.get('name'))
class PalcoMP3VideoIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:video'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)/?#clipe'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/maiara-e-maraisa-voce-faz-falta-aqui-ao-vivo-em-vicosa-mg/#clipe',
'add_ie': ['Youtube'],
'info_dict': {
'id': '_pD1nR2qqPg',
'ext': 'mp4',
'title': 'Maiara e Maraisa - Você Faz Falta Aqui - DVD Ao Vivo Em Campo Grande',
'description': 'md5:7043342c09a224598e93546e98e49282',
'upload_date': '20161107',
'uploader_id': 'maiaramaraisaoficial',
'uploader': 'Maiara e Maraisa',
}
}]
_MUSIC_FIELDS = 'youtubeID'
def _parse_music(self, music):
youtube_id = music['youtubeID']
return self.url_result(youtube_id, 'Youtube', youtube_id)

View File

@ -9,8 +9,9 @@ from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
merge_dicts, merge_dicts,
try_get,
unified_timestamp, unified_timestamp,
xpath_text, urljoin,
) )
@ -27,10 +28,11 @@ class PhoenixIE(ZDFBaseIE):
'title': 'Wohin führt der Protest in der Pandemie?', 'title': 'Wohin führt der Protest in der Pandemie?',
'description': 'md5:7d643fe7f565e53a24aac036b2122fbd', 'description': 'md5:7d643fe7f565e53a24aac036b2122fbd',
'duration': 1691, 'duration': 1691,
'timestamp': 1613906100, 'timestamp': 1613902500,
'upload_date': '20210221', 'upload_date': '20210221',
'uploader': 'Phoenix', 'uploader': 'Phoenix',
'channel': 'corona nachgehakt', 'series': 'corona nachgehakt',
'episode': 'Wohin führt der Protest in der Pandemie?',
}, },
}, { }, {
# Youtube embed # Youtube embed
@ -79,50 +81,53 @@ class PhoenixIE(ZDFBaseIE):
video_id = compat_str(video.get('basename') or video.get('content')) video_id = compat_str(video.get('basename') or video.get('content'))
details = self._download_xml( details = self._download_json(
'https://www.phoenix.de/php/mediaplayer/data/beitrags_details.php', 'https://www.phoenix.de/php/mediaplayer/data/beitrags_details.php',
video_id, 'Downloading details XML', query={ video_id, 'Downloading details JSON', query={
'ak': 'web', 'ak': 'web',
'ptmd': 'true', 'ptmd': 'true',
'id': video_id, 'id': video_id,
'profile': 'player2', 'profile': 'player2',
}) })
title = title or xpath_text( title = title or details['title']
details, './/information/title', 'title', fatal=True) content_id = details['tracking']['nielsen']['content']['assetid']
content_id = xpath_text(
details, './/video/details/basename', 'content id', fatal=True)
info = self._extract_ptmd( info = self._extract_ptmd(
'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/%s' % content_id, 'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/%s' % content_id,
content_id, None, url) content_id, None, url)
timestamp = unified_timestamp(xpath_text(details, './/details/airtime')) duration = int_or_none(try_get(
details, lambda x: x['tracking']['nielsen']['content']['length']))
timestamp = unified_timestamp(details.get('editorialDate'))
series = try_get(
details, lambda x: x['tracking']['nielsen']['content']['program'],
compat_str)
episode = title if details.get('contentType') == 'episode' else None
thumbnails = [] thumbnails = []
for node in details.findall('.//teaserimages/teaserimage'): teaser_images = try_get(details, lambda x: x['teaserImageRef']['layouts'], dict) or {}
thumbnail_url = node.text for thumbnail_key, thumbnail_url in teaser_images.items():
thumbnail_url = urljoin(url, thumbnail_url)
if not thumbnail_url: if not thumbnail_url:
continue continue
thumbnail = { thumbnail = {
'url': thumbnail_url, 'url': thumbnail_url,
} }
thumbnail_key = node.get('key') m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if thumbnail_key: if m:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key) thumbnail['width'] = int(m.group(1))
if m: thumbnail['height'] = int(m.group(2))
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail) thumbnails.append(thumbnail)
return merge_dicts(info, { return merge_dicts(info, {
'id': content_id, 'id': content_id,
'title': title, 'title': title,
'description': xpath_text(details, './/information/detail'), 'description': details.get('leadParagraph'),
'duration': int_or_none(xpath_text(details, './/details/lengthSec')), 'duration': duration,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'timestamp': timestamp, 'timestamp': timestamp,
'uploader': xpath_text(details, './/details/channel'), 'uploader': details.get('tvService'),
'uploader_id': xpath_text(details, './/details/originChannelId'), 'series': series,
'channel': xpath_text(details, './/details/originChannelTitle'), 'episode': episode,
}) })

View File

@ -1,22 +1,15 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
try_get,
update_url_query,
urlencode_postdata,
) )
class PicartoIE(InfoExtractor): class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)(?:/(?P<token>[a-zA-Z0-9]+))?' _VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
_TEST = { _TEST = {
'url': 'https://picarto.tv/Setz', 'url': 'https://picarto.tv/Setz',
'info_dict': { 'info_dict': {
@ -34,65 +27,46 @@ class PicartoIE(InfoExtractor):
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url) return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) channel_id = self._match_id(url)
channel_id = mobj.group('id')
metadata = self._download_json( data = self._download_json(
'https://api.picarto.tv/v1/channel/name/' + channel_id, 'https://ptvintern.picarto.tv/ptvapi', channel_id, query={
channel_id) 'query': '''{
channel(name: "%s") {
adult
id
online
stream_name
title
}
getLoadBalancerUrl(channel_name: "%s") {
url
}
}''' % (channel_id, channel_id),
})['data']
metadata = data['channel']
if metadata.get('online') is False: if metadata.get('online') == 0:
raise ExtractorError('Stream is offline', expected=True) raise ExtractorError('Stream is offline', expected=True)
title = metadata['title']
cdn_data = self._download_json( cdn_data = self._download_json(
'https://picarto.tv/process/channel', channel_id, data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js',
data=urlencode_postdata({'loadbalancinginfo': channel_id}), channel_id, 'Downloading load balancing info')
note='Downloading load balancing info')
token = mobj.group('token') or 'public'
params = {
'con': int(time.time() * 1000),
'token': token,
}
prefered_edge = cdn_data.get('preferedEdge')
formats = [] formats = []
for source in (cdn_data.get('source') or []):
for edge in cdn_data['edges']: source_url = source.get('url')
edge_ep = edge.get('ep') if not source_url:
if not edge_ep or not isinstance(edge_ep, compat_str):
continue continue
edge_id = edge.get('id') source_type = source.get('type')
for tech in cdn_data['techs']: if source_type == 'html5/application/vnd.apple.mpegurl':
tech_label = tech.get('label') formats.extend(self._extract_m3u8_formats(
tech_type = tech.get('type') source_url, channel_id, 'mp4', m3u8_id='hls', fatal=False))
preference = 0 elif source_type == 'html5/video/mp4':
if edge_id == prefered_edge: formats.append({
preference += 1 'url': source_url,
format_id = [] })
if edge_id:
format_id.append(edge_id)
if tech_type == 'application/x-mpegurl' or tech_label == 'HLS':
format_id.append('hls')
formats.extend(self._extract_m3u8_formats(
update_url_query(
'https://%s/hls/%s/index.m3u8'
% (edge_ep, channel_id), params),
channel_id, 'mp4', preference=preference,
m3u8_id='-'.join(format_id), fatal=False))
continue
elif tech_type == 'video/mp4' or tech_label == 'MP4':
format_id.append('mp4')
formats.append({
'url': update_url_query(
'https://%s/mp4/%s.mp4' % (edge_ep, channel_id),
params),
'format_id': '-'.join(format_id),
'preference': preference,
})
else:
# rtmp format does not seem to work
continue
self._sort_formats(formats) self._sort_formats(formats)
mature = metadata.get('adult') mature = metadata.get('adult')
@ -103,10 +77,10 @@ class PicartoIE(InfoExtractor):
return { return {
'id': channel_id, 'id': channel_id,
'title': self._live_title(metadata.get('title') or channel_id), 'title': self._live_title(title.strip()),
'is_live': True, 'is_live': True,
'thumbnail': try_get(metadata, lambda x: x['thumbnails']['web']),
'channel': channel_id, 'channel': channel_id,
'channel_id': metadata.get('id'),
'channel_url': 'https://picarto.tv/%s' % channel_id, 'channel_url': 'https://picarto.tv/%s' % channel_id,
'age_limit': age_limit, 'age_limit': age_limit,
'formats': formats, 'formats': formats,

View File

@ -0,0 +1,65 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
smuggle_url,
try_get,
)
class PlayStuffIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?play\.stuff\.co\.nz/details/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://play.stuff.co.nz/details/608778ac1de1c4001a3fa09a',
'md5': 'c82d3669e5247c64bc382577843e5bd0',
'info_dict': {
'id': '6250584958001',
'ext': 'mp4',
'title': 'Episode 1: Rotorua/Mt Maunganui/Tauranga',
'description': 'md5:c154bafb9f0dd02d01fd4100fb1c1913',
'uploader_id': '6005208634001',
'timestamp': 1619491027,
'upload_date': '20210427',
},
'add_ie': ['BrightcoveNew'],
}, {
# geo restricted, bypassable
'url': 'https://play.stuff.co.nz/details/_6155660351001',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
state = self._parse_json(
self._search_regex(
r'__INITIAL_STATE__\s*=\s*({.+?})\s*;', webpage, 'state'),
video_id)
account_id = try_get(
state, lambda x: x['configurations']['accountId'],
compat_str) or '6005208634001'
player_id = try_get(
state, lambda x: x['configurations']['playerId'],
compat_str) or 'default'
entries = []
for item_id, video in state['items'].items():
if not isinstance(video, dict):
continue
asset_id = try_get(
video, lambda x: x['content']['attributes']['assetId'],
compat_str)
if not asset_id:
continue
entries.append(self.url_result(
smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, asset_id),
{'geo_countries': ['NZ']}),
'BrightcoveNew', video_id))
return self.playlist_result(entries, video_id)

View File

@ -393,7 +393,7 @@ query viewClip {
# To somewhat reduce the probability of these consequences # To somewhat reduce the probability of these consequences
# we will sleep random amount of time before each call to ViewClip. # we will sleep random amount of time before each call to ViewClip.
self._sleep( self._sleep(
random.randint(2, 5), display_id, random.randint(5, 10), display_id,
'%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling') '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
if not viewclip: if not viewclip:

View File

@ -30,6 +30,7 @@ from ..utils import (
class PornHubBaseIE(InfoExtractor): class PornHubBaseIE(InfoExtractor):
_NETRC_MACHINE = 'pornhub' _NETRC_MACHINE = 'pornhub'
_PORNHUB_HOST_RE = r'(?:(?P<host>pornhub(?:premium)?\.(?:com|net|org))|pornhubthbh7ap3u\.onion)'
def _download_webpage_handle(self, *args, **kwargs): def _download_webpage_handle(self, *args, **kwargs):
def dl(*args, **kwargs): def dl(*args, **kwargs):
@ -122,11 +123,13 @@ class PornHubIE(PornHubBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)| (?:[^/]+\.)?
%s
/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/ (?:www\.)?thumbzilla\.com/video/
) )
(?P<id>[\da-z]+) (?P<id>[\da-z]+)
''' ''' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{ _TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015', 'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': 'a6391306d050e4547f62b3f485dd9ba9', 'md5': 'a6391306d050e4547f62b3f485dd9ba9',
@ -236,6 +239,13 @@ class PornHubIE(PornHubBaseIE):
}, { }, {
'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5f75b0f4b18e3', 'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5f75b0f4b18e3',
'only_matching': True, 'only_matching': True,
}, {
# geo restricted
'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5a9813bfa7156',
'only_matching': True,
}, {
'url': 'http://pornhubthbh7ap3u.onion/view_video.php?viewkey=ph5a9813bfa7156',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -275,6 +285,11 @@ class PornHubIE(PornHubBaseIE):
'PornHub said: %s' % error_msg, 'PornHub said: %s' % error_msg,
expected=True, video_id=video_id) expected=True, video_id=video_id)
if any(re.search(p, webpage) for p in (
r'class=["\']geoBlocked["\']',
r'>\s*This content is unavailable in your country')):
self.raise_geo_restricted()
# video_title from flashvars contains whitespace instead of non-ASCII (see # video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore. # on that anymore.
@ -398,17 +413,24 @@ class PornHubIE(PornHubBaseIE):
formats = [] formats = []
def add_format(format_url, height=None): def add_format(format_url, height=None):
tbr = None ext = determine_ext(format_url)
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url) if ext == 'mpd':
if mobj: formats.extend(self._extract_mpd_formats(
if not height: format_url, video_id, mpd_id='dash', fatal=False))
height = int(mobj.group('height')) return
tbr = int(mobj.group('tbr')) if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
return
if not height:
height = int_or_none(self._search_regex(
r'(?P<height>\d+)[pP]?_\d+[kK]', format_url, 'height',
default=None))
formats.append({ formats.append({
'url': format_url, 'url': format_url,
'format_id': '%dp' % height if height else None, 'format_id': '%dp' % height if height else None,
'height': height, 'height': height,
'tbr': tbr,
}) })
for video_url, height in video_urls: for video_url, height in video_urls:
@ -417,16 +439,6 @@ class PornHubIE(PornHubBaseIE):
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None) r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date: if upload_date:
upload_date = upload_date.replace('/', '') upload_date = upload_date.replace('/', '')
ext = determine_ext(video_url)
if ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
continue
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
if '/video/get_media' in video_url: if '/video/get_media' in video_url:
medias = self._download_json(video_url, video_id, fatal=False) medias = self._download_json(video_url, video_id, fatal=False)
if isinstance(medias, list): if isinstance(medias, list):
@ -440,7 +452,8 @@ class PornHubIE(PornHubBaseIE):
add_format(video_url, height) add_format(video_url, height)
continue continue
add_format(video_url) add_format(video_url)
self._sort_formats(formats) self._sort_formats(
formats, field_preference=('height', 'width', 'fps', 'format_id'))
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
@ -513,7 +526,7 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
class PornHubUserIE(PornHubPlaylistBaseIE): class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?%s/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph', 'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118, 'playlist_mincount': 118,
@ -542,6 +555,9 @@ class PornHubUserIE(PornHubPlaylistBaseIE):
# Same as before, multi page # Same as before, multi page
'url': 'https://www.pornhubpremium.com/pornstar/lily-labeau', 'url': 'https://www.pornhubpremium.com/pornstar/lily-labeau',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://pornhubthbh7ap3u.onion/model/zoe_ph',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -617,7 +633,7 @@ class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE): class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?P<id>(?:[^/]+/)*[^/?#&]+)' _VALID_URL = r'https?://(?:[^/]+\.)?%s/(?P<id>(?:[^/]+/)*[^/?#&]+)' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph/videos', 'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True, 'only_matching': True,
@ -722,6 +738,9 @@ class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
}, { }, {
'url': 'https://de.pornhub.com/playlist/4667351', 'url': 'https://de.pornhub.com/playlist/4667351',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://pornhubthbh7ap3u.onion/model/zoe_ph/videos',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -732,7 +751,7 @@ class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE): class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?%s/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload', 'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': { 'info_dict': {
@ -742,4 +761,7 @@ class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
}, { }, {
'url': 'https://www.pornhub.com/model/zoe_ph/videos/upload', 'url': 'https://www.pornhub.com/model/zoe_ph/videos/upload',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://pornhubthbh7ap3u.onion/pornstar/jenny-blighe/videos/upload',
'only_matching': True,
}] }]

View File

@ -133,8 +133,10 @@ class RedBullEmbedIE(RedBullTVIE):
rrn_id = self._match_id(url) rrn_id = self._match_id(url)
asset_id = self._download_json( asset_id = self._download_json(
'https://edge-graphql.crepo-production.redbullaws.com/v1/graphql', 'https://edge-graphql.crepo-production.redbullaws.com/v1/graphql',
rrn_id, headers={'API-KEY': 'e90a1ff11335423998b100c929ecc866'}, rrn_id, headers={
query={ 'Accept': 'application/json',
'API-KEY': 'e90a1ff11335423998b100c929ecc866',
}, query={
'query': '''{ 'query': '''{
resource(id: "%s", enforceGeoBlocking: false) { resource(id: "%s", enforceGeoBlocking: false) {
%s %s

View File

@ -10,7 +10,7 @@ from ..utils import (
class SBSIE(InfoExtractor): class SBSIE(InfoExtractor):
IE_DESC = 'sbs.com.au' IE_DESC = 'sbs.com.au'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=)|news/(?:embeds/)?video/)(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=|/watch/)|news/(?:embeds/)?video/)(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
# Original URL is handled by the generic IE which finds the iframe: # Original URL is handled by the generic IE which finds the iframe:
@ -43,6 +43,9 @@ class SBSIE(InfoExtractor):
}, { }, {
'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866', 'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.sbs.com.au/ondemand/watch/1698704451971',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -2,12 +2,18 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json from ..utils import (
get_element_by_class,
int_or_none,
remove_start,
strip_or_none,
unified_strdate,
)
class ScreencastOMaticIE(InfoExtractor): class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://screencast-o-matic\.com/(?:(?:watch|player)/|embed\?.*?\bsc=)(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TESTS = [{
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl', 'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
'md5': '483583cb80d92588f15ccbedd90f0c18', 'md5': '483583cb80d92588f15ccbedd90f0c18',
'info_dict': { 'info_dict': {
@ -16,22 +22,30 @@ class ScreencastOMaticIE(InfoExtractor):
'title': 'Welcome to 3-4 Philosophy @ DECV!', 'title': 'Welcome to 3-4 Philosophy @ DECV!',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.', 'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
'duration': 369.163, 'duration': 369,
'upload_date': '20141216',
} }
} }, {
'url': 'http://screencast-o-matic.com/player/c2lD3BeOPl',
'only_matching': True,
}, {
'url': 'http://screencast-o-matic.com/embed?ff=true&sc=cbV2r4Q5TL&fromPH=true&a=1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(
'https://screencast-o-matic.com/player/' + video_id, video_id)
jwplayer_data = self._parse_json( info = self._parse_html5_media_entries(url, webpage, video_id)[0]
self._search_regex( info.update({
r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'), 'id': video_id,
video_id, transform_source=js_to_json) 'title': get_element_by_class('overlayTitle', webpage),
'description': strip_or_none(get_element_by_class('overlayDescription', webpage)) or None,
info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False) 'duration': int_or_none(self._search_regex(
info_dict.update({ r'player\.duration\s*=\s*function\(\)\s*{\s*return\s+(\d+);\s*};',
'title': self._og_search_title(webpage), webpage, 'duration', default=None)),
'description': self._og_search_description(webpage), 'upload_date': unified_strdate(remove_start(
get_element_by_class('overlayPublished', webpage), 'Published: ')),
}) })
return info_dict return info

View File

@ -21,6 +21,7 @@ from ..utils import (
class ShahidBaseIE(AWSIE): class ShahidBaseIE(AWSIE):
_AWS_PROXY_HOST = 'api2.shahid.net' _AWS_PROXY_HOST = 'api2.shahid.net'
_AWS_API_KEY = '2RRtuMHx95aNI1Kvtn2rChEuwsCogUd4samGPjLh' _AWS_API_KEY = '2RRtuMHx95aNI1Kvtn2rChEuwsCogUd4samGPjLh'
_VALID_URL_BASE = r'https?://shahid\.mbc\.net/[a-z]{2}/'
def _handle_error(self, e): def _handle_error(self, e):
fail_data = self._parse_json( fail_data = self._parse_json(
@ -49,7 +50,7 @@ class ShahidBaseIE(AWSIE):
class ShahidIE(ShahidBaseIE): class ShahidIE(ShahidBaseIE):
_NETRC_MACHINE = 'shahid' _NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)' _VALID_URL = ShahidBaseIE._VALID_URL_BASE + r'(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924', 'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924',
'info_dict': { 'info_dict': {
@ -73,6 +74,9 @@ class ShahidIE(ShahidBaseIE):
# shahid plus subscriber only # shahid plus subscriber only
'url': 'https://shahid.mbc.net/ar/series/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/episode-90511', 'url': 'https://shahid.mbc.net/ar/series/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/episode-90511',
'only_matching': True 'only_matching': True
}, {
'url': 'https://shahid.mbc.net/en/shows/Ramez-Fi-Al-Shallal-season-1-episode-1/episode-359319',
'only_matching': True
}] }]
def _real_initialize(self): def _real_initialize(self):
@ -168,7 +172,7 @@ class ShahidIE(ShahidBaseIE):
class ShahidShowIE(ShahidBaseIE): class ShahidShowIE(ShahidBaseIE):
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:show|serie)s/[^/]+/(?:show|series)-(?P<id>\d+)' _VALID_URL = ShahidBaseIE._VALID_URL_BASE + r'(?:show|serie)s/[^/]+/(?:show|series)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D8%B1%D8%A7%D9%85%D8%B2-%D9%82%D8%B1%D8%B4-%D8%A7%D9%84%D8%A8%D8%AD%D8%B1/show-79187', 'url': 'https://shahid.mbc.net/ar/shows/%D8%B1%D8%A7%D9%85%D8%B2-%D9%82%D8%B1%D8%B4-%D8%A7%D9%84%D8%A8%D8%AD%D8%B1/show-79187',
'info_dict': { 'info_dict': {

View File

@ -86,10 +86,10 @@ class SharedIE(SharedBaseIE):
class VivoIE(SharedBaseIE): class VivoIE(SharedBaseIE):
IE_DESC = 'vivo.sx' IE_DESC = 'vivo.sx'
_VALID_URL = r'https?://vivo\.sx/(?P<id>[\da-z]{10})' _VALID_URL = r'https?://vivo\.s[xt]/(?P<id>[\da-z]{10})'
_FILE_NOT_FOUND = '>The file you have requested does not exists or has been removed' _FILE_NOT_FOUND = '>The file you have requested does not exists or has been removed'
_TEST = { _TESTS = [{
'url': 'http://vivo.sx/d7ddda0e78', 'url': 'http://vivo.sx/d7ddda0e78',
'md5': '15b3af41be0b4fe01f4df075c2678b2c', 'md5': '15b3af41be0b4fe01f4df075c2678b2c',
'info_dict': { 'info_dict': {
@ -98,7 +98,10 @@ class VivoIE(SharedBaseIE):
'title': 'Chicken', 'title': 'Chicken',
'filesize': 515659, 'filesize': 515659,
}, },
} }, {
'url': 'http://vivo.st/d7ddda0e78',
'only_matching': True,
}]
def _extract_title(self, webpage): def _extract_title(self, webpage):
title = self._html_search_regex( title = self._html_search_regex(

View File

@ -146,7 +146,7 @@ class SVTPlayIE(SVTPlayBaseIE):
) )
(?P<svt_id>[^/?#&]+)| (?P<svt_id>[^/?#&]+)|
https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+) https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+)
(?:.*?modalId=(?P<modal_id>[\da-zA-Z-]+))? (?:.*?(?:modalId|id)=(?P<modal_id>[\da-zA-Z-]+))?
) )
''' '''
_TESTS = [{ _TESTS = [{
@ -177,6 +177,9 @@ class SVTPlayIE(SVTPlayBaseIE):
}, { }, {
'url': 'https://www.svtplay.se/video/30479064/husdrommar/husdrommar-sasong-8-designdrommar-i-stenungsund?modalId=8zVbDPA', 'url': 'https://www.svtplay.se/video/30479064/husdrommar/husdrommar-sasong-8-designdrommar-i-stenungsund?modalId=8zVbDPA',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.svtplay.se/video/30684086/rapport/rapport-24-apr-18-00-7?id=e72gVpa',
'only_matching': True,
}, { }, {
# geo restricted to Sweden # geo restricted to Sweden
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten', 'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
@ -259,7 +262,7 @@ class SVTPlayIE(SVTPlayBaseIE):
if not svt_id: if not svt_id:
svt_id = self._search_regex( svt_id = self._search_regex(
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)', (r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
r'<[^>]+\bdata-rt=["\']top-area-play-button["\'][^>]+\bhref=["\'][^"\']*video/%s/[^"\']*\bmodalId=([\da-zA-Z-]+)' % re.escape(video_id), r'<[^>]+\bdata-rt=["\']top-area-play-button["\'][^>]+\bhref=["\'][^"\']*video/%s/[^"\']*\b(?:modalId|id)=([\da-zA-Z-]+)' % re.escape(video_id),
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)', r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)', r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)',
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"', r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',

View File

@ -123,6 +123,10 @@ class TEDIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# with own formats and private Youtube external
'url': 'https://www.ted.com/talks/spencer_wells_a_family_tree_for_humanity',
'only_matching': True,
}] }]
_NATIVE_FORMATS = { _NATIVE_FORMATS = {
@ -210,16 +214,6 @@ class TEDIE(InfoExtractor):
player_talk = talk_info['player_talks'][0] player_talk = talk_info['player_talks'][0]
external = player_talk.get('external')
if isinstance(external, dict):
service = external.get('service')
if isinstance(service, compat_str):
ext_url = None
if service.lower() == 'youtube':
ext_url = external.get('code')
return self.url_result(ext_url or external['uri'])
resources_ = player_talk.get('resources') or talk_info.get('resources') resources_ = player_talk.get('resources') or talk_info.get('resources')
http_url = None http_url = None
@ -294,6 +288,16 @@ class TEDIE(InfoExtractor):
'vcodec': 'none', 'vcodec': 'none',
}) })
if not formats:
external = player_talk.get('external')
if isinstance(external, dict):
service = external.get('service')
if isinstance(service, compat_str):
ext_url = None
if service.lower() == 'youtube':
ext_url = external.get('code')
return self.url_result(ext_url or external['uri'])
self._sort_formats(formats) self._sort_formats(formats)
video_id = compat_str(talk_info['id']) video_id = compat_str(talk_info['id'])

View File

@ -74,6 +74,12 @@ class TV2DKIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
entries = [] entries = []
def add_entry(partner_id, kaltura_id):
entries.append(self.url_result(
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura',
video_id=kaltura_id))
for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage): for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage):
video = extract_attributes(video_el) video = extract_attributes(video_el)
kaltura_id = video.get('data-entryid') kaltura_id = video.get('data-entryid')
@ -82,9 +88,14 @@ class TV2DKIE(InfoExtractor):
partner_id = video.get('data-partnerid') partner_id = video.get('data-partnerid')
if not partner_id: if not partner_id:
continue continue
entries.append(self.url_result( add_entry(partner_id, kaltura_id)
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura', if not entries:
video_id=kaltura_id)) kaltura_id = self._search_regex(
r'entry_id\s*:\s*["\']([0-9a-z_]+)', webpage, 'kaltura id')
partner_id = self._search_regex(
(r'\\u002Fp\\u002F(\d+)\\u002F', r'/p/(\d+)/'), webpage,
'partner id')
add_entry(partner_id, kaltura_id)
return self.playlist_result(entries) return self.playlist_result(entries)

View File

@ -9,7 +9,6 @@ from ..utils import (
int_or_none, int_or_none,
remove_start, remove_start,
smuggle_url, smuggle_url,
strip_or_none,
try_get, try_get,
) )
@ -45,32 +44,18 @@ class TVerIE(InfoExtractor):
query={'token': self._TOKEN})['main'] query={'token': self._TOKEN})['main']
p_id = main['publisher_id'] p_id = main['publisher_id']
service = remove_start(main['service'], 'ts_') service = remove_start(main['service'], 'ts_')
info = {
r_id = main['reference_id']
if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'):
r_id = 'ref:' + r_id
bc_url = smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id),
{'geo_countries': ['JP']})
return {
'_type': 'url_transparent', '_type': 'url_transparent',
'description': try_get(main, lambda x: x['note'][0]['text'], compat_str), 'description': try_get(main, lambda x: x['note'][0]['text'], compat_str),
'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])), 'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])),
'url': bc_url,
'ie_key': 'BrightcoveNew',
} }
if service == 'cx':
title = main['title']
subtitle = strip_or_none(main.get('subtitle'))
if subtitle:
title += ' - ' + subtitle
info.update({
'title': title,
'url': 'https://i.fod.fujitv.co.jp/plus7/web/%s/%s.html' % (p_id[:4], p_id),
'ie_key': 'FujiTVFODPlus7',
})
else:
r_id = main['reference_id']
if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'):
r_id = 'ref:' + r_id
bc_url = smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id),
{'geo_countries': ['JP']})
info.update({
'url': bc_url,
'ie_key': 'BrightcoveNew',
})
return info

View File

@ -49,6 +49,7 @@ class TwitchBaseIE(InfoExtractor):
'ChannelCollectionsContent': '07e3691a1bad77a36aba590c351180439a40baefc1c275356f40fc7082419a84', 'ChannelCollectionsContent': '07e3691a1bad77a36aba590c351180439a40baefc1c275356f40fc7082419a84',
'StreamMetadata': '1c719a40e481453e5c48d9bb585d971b8b372f8ebb105b17076722264dfa5b3e', 'StreamMetadata': '1c719a40e481453e5c48d9bb585d971b8b372f8ebb105b17076722264dfa5b3e',
'ComscoreStreamingQuery': 'e1edae8122517d013405f237ffcc124515dc6ded82480a88daef69c83b53ac01', 'ComscoreStreamingQuery': 'e1edae8122517d013405f237ffcc124515dc6ded82480a88daef69c83b53ac01',
'VideoAccessToken_Clip': '36b89d2507fce29e5ca551df756d27c1cfe079e2609642b4390aa4c35796eb11',
'VideoPreviewOverlay': '3006e77e51b128d838fa4e835723ca4dc9a05c5efd4466c1085215c6e437e65c', 'VideoPreviewOverlay': '3006e77e51b128d838fa4e835723ca4dc9a05c5efd4466c1085215c6e437e65c',
'VideoMetadata': '226edb3e692509f727fd56821f5653c05740242c82b0388883e0c0e75dcbf687', 'VideoMetadata': '226edb3e692509f727fd56821f5653c05740242c82b0388883e0c0e75dcbf687',
} }
@ -893,7 +894,25 @@ class TwitchClipsIE(TwitchBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
clip = self._download_base_gql( clip = self._download_gql(
video_id, [{
'operationName': 'VideoAccessToken_Clip',
'variables': {
'slug': video_id,
},
}],
'Downloading clip access token GraphQL')[0]['data']['clip']
if not clip:
raise ExtractorError(
'This clip is no longer available', expected=True)
access_query = {
'sig': clip['playbackAccessToken']['signature'],
'token': clip['playbackAccessToken']['value'],
}
data = self._download_base_gql(
video_id, { video_id, {
'query': '''{ 'query': '''{
clip(slug: "%s") { clip(slug: "%s") {
@ -918,11 +937,10 @@ class TwitchClipsIE(TwitchBaseIE):
} }
viewCount viewCount
} }
}''' % video_id}, 'Downloading clip GraphQL')['data']['clip'] }''' % video_id}, 'Downloading clip GraphQL', fatal=False)
if not clip: if data:
raise ExtractorError( clip = try_get(data, lambda x: x['data']['clip'], dict) or clip
'This clip is no longer available', expected=True)
formats = [] formats = []
for option in clip.get('videoQualities', []): for option in clip.get('videoQualities', []):
@ -932,7 +950,7 @@ class TwitchClipsIE(TwitchBaseIE):
if not source: if not source:
continue continue
formats.append({ formats.append({
'url': source, 'url': update_url_query(source, access_query),
'format_id': option.get('quality'), 'format_id': option.get('quality'),
'height': int_or_none(option.get('quality')), 'height': int_or_none(option.get('quality')),
'fps': int_or_none(option.get('frameRate')), 'fps': int_or_none(option.get('frameRate')),

View File

@ -19,6 +19,7 @@ from ..utils import (
strip_or_none, strip_or_none,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_or_none,
xpath_text, xpath_text,
) )
@ -52,6 +53,9 @@ class TwitterBaseIE(InfoExtractor):
return [f] return [f]
def _extract_formats_from_vmap_url(self, vmap_url, video_id): def _extract_formats_from_vmap_url(self, vmap_url, video_id):
vmap_url = url_or_none(vmap_url)
if not vmap_url:
return []
vmap_data = self._download_xml(vmap_url, video_id) vmap_data = self._download_xml(vmap_url, video_id)
formats = [] formats = []
urls = [] urls = []

View File

@ -28,7 +28,7 @@ class UMGDeIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
'https://api.universal-music.de/graphql', 'https://graphql.universal-music.de/',
video_id, query={ video_id, query={
'query': '''{ 'query': '''{
universalMusic(channel:16) { universalMusic(channel:16) {
@ -56,11 +56,9 @@ class UMGDeIE(InfoExtractor):
formats = [] formats = []
def add_m3u8_format(format_id): def add_m3u8_format(format_id):
m3u8_formats = self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
hls_url_template % format_id, video_id, 'mp4', hls_url_template % format_id, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal='False') 'm3u8_native', m3u8_id='hls', fatal=False))
if m3u8_formats and m3u8_formats[0].get('height'):
formats.extend(m3u8_formats)
for f in video_data.get('formats', []): for f in video_data.get('formats', []):
f_url = f.get('url') f_url = f.get('url')

View File

@ -75,7 +75,7 @@ class UstreamIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:www\.)?(?:ustream\.tv|video\.ibm\.com)/embed/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?(?:ustream\.tv|video\.ibm\.com)/embed/.+?)\1', webpage)
if mobj is not None: if mobj is not None:
return mobj.group('url') return mobj.group('url')

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import base64 import base64
import functools import functools
import json
import re import re
import itertools import itertools
@ -17,14 +16,14 @@ from ..compat import (
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
dict_get,
ExtractorError, ExtractorError,
get_element_by_class,
js_to_json, js_to_json,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
OnDemandPagedList, OnDemandPagedList,
parse_filesize, parse_filesize,
RegexNotFoundError, parse_iso8601,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
std_headers, std_headers,
@ -74,25 +73,28 @@ class VimeoBaseInfoExtractor(InfoExtractor):
expected=True) expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _verify_video_password(self, url, video_id, webpage): def _get_video_password(self):
password = self._downloader.params.get('videopassword') password = self._downloader.params.get('videopassword')
if password is None: if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True) raise ExtractorError(
token, vuid = self._extract_xsrft_and_vuid(webpage) 'This video is protected by a password, use the --video-password option',
data = urlencode_postdata({ expected=True)
'password': password, return password
'token': token,
}) def _verify_video_password(self, url, video_id, password, token, vuid):
if url.startswith('http://'): if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url # vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://') url = url.replace('http://', 'https://')
password_request = sanitized_Request(url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Referer', url)
self._set_vimeo_cookie('vuid', vuid) self._set_vimeo_cookie('vuid', vuid)
return self._download_webpage( return self._download_webpage(
password_request, video_id, url + '/password', video_id, 'Verifying the password',
'Verifying the password', 'Wrong password') 'Wrong password', data=urlencode_postdata({
'password': password,
'token': token,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': url,
})
def _extract_xsrft_and_vuid(self, webpage): def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex( xsrft = self._search_regex(
@ -123,10 +125,11 @@ class VimeoBaseInfoExtractor(InfoExtractor):
video_title = video_data['title'] video_title = video_data['title']
live_event = video_data.get('live_event') or {} live_event = video_data.get('live_event') or {}
is_live = live_event.get('status') == 'started' is_live = live_event.get('status') == 'started'
request = config.get('request') or {}
formats = [] formats = []
config_files = video_data.get('files') or config['request'].get('files', {}) config_files = video_data.get('files') or request.get('files') or {}
for f in config_files.get('progressive', []): for f in (config_files.get('progressive') or []):
video_url = f.get('url') video_url = f.get('url')
if not video_url: if not video_url:
continue continue
@ -142,7 +145,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
# TODO: fix handling of 308 status code returned for live archive manifest requests # TODO: fix handling of 308 status code returned for live archive manifest requests
sep_pattern = r'/sep/video/' sep_pattern = r'/sep/video/'
for files_type in ('hls', 'dash'): for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items(): for cdn_name, cdn_data in (try_get(config_files, lambda x: x[files_type]['cdns']) or {}).items():
manifest_url = cdn_data.get('url') manifest_url = cdn_data.get('url')
if not manifest_url: if not manifest_url:
continue continue
@ -188,17 +191,15 @@ class VimeoBaseInfoExtractor(InfoExtractor):
f['preference'] = -40 f['preference'] = -40
subtitles = {} subtitles = {}
text_tracks = config['request'].get('text_tracks') for tt in (request.get('text_tracks') or []):
if text_tracks: subtitles[tt['lang']] = [{
for tt in text_tracks: 'ext': 'vtt',
subtitles[tt['lang']] = [{ 'url': urljoin('https://vimeo.com', tt['url']),
'ext': 'vtt', }]
'url': urljoin('https://vimeo.com', tt['url']),
}]
thumbnails = [] thumbnails = []
if not is_live: if not is_live:
for key, thumb in video_data.get('thumbs', {}).items(): for key, thumb in (video_data.get('thumbs') or {}).items():
thumbnails.append({ thumbnails.append({
'id': key, 'id': key,
'width': int_or_none(key), 'width': int_or_none(key),
@ -278,7 +279,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
)? )?
(?:videos?/)? (?:videos?/)?
(?P<id>[0-9]+) (?P<id>[0-9]+)
(?:/[\da-f]+)? (?:/(?P<unlisted_hash>[\da-f]{10}))?
/?(?:[?&].*)?(?:[#].*)?$ /?(?:[?&].*)?(?:[#].*)?$
''' '''
IE_NAME = 'vimeo' IE_NAME = 'vimeo'
@ -318,6 +319,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 1595, 'duration': 1595,
'upload_date': '20130610', 'upload_date': '20130610',
'timestamp': 1370893156, 'timestamp': 1370893156,
'license': 'by',
}, },
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
@ -331,9 +333,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
'id': '54469442', 'id': '54469442',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012', 'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
'uploader': 'The BLN & Business of Software', 'uploader': 'Business of Software',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
'uploader_id': 'theblnbusinessofsoftware', 'uploader_id': 'businessofsoftware',
'duration': 3610, 'duration': 3610,
'description': None, 'description': None,
}, },
@ -396,6 +398,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'staff', 'uploader_id': 'staff',
'uploader': 'Vimeo Staff', 'uploader': 'Vimeo Staff',
'duration': 62, 'duration': 62,
'subtitles': {
'de': [{'ext': 'vtt'}],
'en': [{'ext': 'vtt'}],
'es': [{'ext': 'vtt'}],
'fr': [{'ext': 'vtt'}],
},
} }
}, },
{ {
@ -468,6 +476,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Unable to download JSON metadata'], 'expected_warnings': ['Unable to download JSON metadata'],
'skip': 'this page is no longer available.',
}, },
{ {
'url': 'http://player.vimeo.com/video/68375962', 'url': 'http://player.vimeo.com/video/68375962',
@ -550,9 +559,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
return urls[0] if urls else None return urls[0] if urls else None
def _verify_player_video_password(self, url, video_id, headers): def _verify_player_video_password(self, url, video_id, headers):
password = self._downloader.params.get('videopassword') password = self._get_video_password()
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
data = urlencode_postdata({ data = urlencode_postdata({
'password': base64.b64encode(password.encode()), 'password': base64.b64encode(password.encode()),
}) })
@ -569,6 +576,37 @@ class VimeoIE(VimeoBaseInfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _extract_from_api(self, video_id, unlisted_hash=None):
token = self._download_json(
'https://vimeo.com/_rv/jwt', video_id, headers={
'X-Requested-With': 'XMLHttpRequest'
})['token']
api_url = 'https://api.vimeo.com/videos/' + video_id
if unlisted_hash:
api_url += ':' + unlisted_hash
video = self._download_json(
api_url, video_id, headers={
'Authorization': 'jwt ' + token,
}, query={
'fields': 'config_url,created_time,description,license,metadata.connections.comments.total,metadata.connections.likes.total,release_time,stats.plays',
})
info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id)
self._vimeo_sort_formats(info['formats'])
get_timestamp = lambda x: parse_iso8601(video.get(x + '_time'))
info.update({
'description': video.get('description'),
'license': video.get('license'),
'release_timestamp': get_timestamp('release'),
'timestamp': get_timestamp('created'),
'view_count': int_or_none(try_get(video, lambda x: x['stats']['plays'])),
})
connections = try_get(
video, lambda x: x['metadata']['connections'], dict) or {}
for k in ('comment', 'like'):
info[k + '_count'] = int_or_none(try_get(connections, lambda x: x[k + 's']['total']))
return info
def _real_extract(self, url): def _real_extract(self, url):
url, data = unsmuggle_url(url, {}) url, data = unsmuggle_url(url, {})
headers = std_headers.copy() headers = std_headers.copy()
@ -577,22 +615,19 @@ class VimeoIE(VimeoBaseInfoExtractor):
if 'Referer' not in headers: if 'Referer' not in headers:
headers['Referer'] = url headers['Referer'] = url
channel_id = self._search_regex( mobj = re.match(self._VALID_URL, url).groupdict()
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None) video_id, unlisted_hash = mobj['id'], mobj.get('unlisted_hash')
if unlisted_hash:
return self._extract_from_api(video_id, unlisted_hash)
# Extract ID from URL
video_id = self._match_id(url)
orig_url = url orig_url = url
is_pro = 'vimeopro.com/' in url is_pro = 'vimeopro.com/' in url
is_player = '://player.vimeo.com/video/' in url
if is_pro: if is_pro:
# some videos require portfolio_id to be present in player url # some videos require portfolio_id to be present in player url
# https://github.com/ytdl-org/youtube-dl/issues/20070 # https://github.com/ytdl-org/youtube-dl/issues/20070
url = self._extract_url(url, self._download_webpage(url, video_id)) url = self._extract_url(url, self._download_webpage(url, video_id))
if not url: if not url:
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
elif is_player:
url = 'https://player.vimeo.com/video/' + video_id
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')): elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
@ -612,14 +647,25 @@ class VimeoIE(VimeoBaseInfoExtractor):
expected=True) expected=True)
raise raise
# Now we begin extracting as much information as we can from what we if '//player.vimeo.com/video/' in url:
# retrieved. First we extract the information common to all extractors, config = self._parse_json(self._search_regex(
# and latter we extract those that are Vimeo specific. r'\bconfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
self.report_extraction(video_id) if config.get('view') == 4:
config = self._verify_player_video_password(
redirect_url, video_id, headers)
info = self._parse_config(config, video_id)
self._vimeo_sort_formats(info['formats'])
return info
if re.search(r'<form[^>]+?id="pw_form"', webpage):
video_password = self._get_video_password()
token, vuid = self._extract_xsrft_and_vuid(webpage)
webpage = self._verify_video_password(
redirect_url, video_id, video_password, token, vuid)
vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None) vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
if vimeo_config: if vimeo_config:
seed_status = vimeo_config.get('seed_status', {}) seed_status = vimeo_config.get('seed_status') or {}
if seed_status.get('state') == 'failed': if seed_status.get('state') == 'failed':
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, seed_status['title']), '%s said: %s' % (self.IE_NAME, seed_status['title']),
@ -628,67 +674,40 @@ class VimeoIE(VimeoBaseInfoExtractor):
cc_license = None cc_license = None
timestamp = None timestamp = None
video_description = None video_description = None
info_dict = {}
# Extract the config JSON channel_id = self._search_regex(
try: r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
try: if channel_id:
config_url = self._html_search_regex( config_url = self._html_search_regex(
r' data-config-url="(.+?)"', webpage, r'\bdata-config-url="([^"]+)"', webpage, 'config URL')
'config URL', default=None) video_description = clean_html(get_element_by_class('description', webpage))
if not config_url: info_dict.update({
# Sometimes new react-based page is served instead of old one that require 'channel_id': channel_id,
# different config URL extraction approach (see 'channel_url': 'https://vimeo.com/channels/' + channel_id,
# https://github.com/ytdl-org/youtube-dl/pull/7209) })
page_config = self._parse_json(self._search_regex(
r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
webpage, 'page config'), video_id)
config_url = page_config['player']['config_url']
cc_license = page_config.get('cc_license')
timestamp = try_get(
page_config, lambda x: x['clip']['uploaded_on'],
compat_str)
video_description = clean_html(dict_get(
page_config, ('description', 'description_html_escaped')))
config = self._download_json(config_url, video_id)
except RegexNotFoundError:
# For pro videos or player.vimeo.com urls
# We try to find out to which variable is assigned the config dic
m_variable_name = re.search(r'(\w)\.video\.id', webpage)
if m_variable_name is not None:
config_re = [r'%s=({[^}].+?});' % re.escape(m_variable_name.group(1))]
else:
config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;')
config_re.append(r'\bconfig\s*=\s*({.+?})\s*;')
config = self._search_regex(config_re, webpage, 'info section',
flags=re.DOTALL)
config = json.loads(config)
except Exception as e:
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
if '_video_password_verified' in data:
raise ExtractorError('video password verification failed!')
self._verify_video_password(redirect_url, video_id, webpage)
return self._real_extract(
smuggle_url(redirect_url, {'_video_password_verified': 'verified'}))
else:
raise ExtractorError('Unable to extract info section',
cause=e)
else: else:
if config.get('view') == 4: page_config = self._parse_json(self._search_regex(
config = self._verify_player_video_password(redirect_url, video_id, headers) r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
webpage, 'page config', default='{}'), video_id, fatal=False)
if not page_config:
return self._extract_from_api(video_id)
config_url = page_config['player']['config_url']
cc_license = page_config.get('cc_license')
clip = page_config.get('clip') or {}
timestamp = clip.get('uploaded_on')
video_description = clean_html(
clip.get('description') or page_config.get('description_html_escaped'))
config = self._download_json(config_url, video_id)
video = config.get('video') or {} video = config.get('video') or {}
vod = video.get('vod') or {} vod = video.get('vod') or {}
def is_rented(): def is_rented():
if '>You rented this title.<' in webpage: if '>You rented this title.<' in webpage:
return True return True
if config.get('user', {}).get('purchased'): if try_get(config, lambda x: x['user']['purchased']):
return True return True
for purchase_option in vod.get('purchase_options', []): for purchase_option in (vod.get('purchase_options') or []):
if purchase_option.get('purchased'): if purchase_option.get('purchased'):
return True return True
label = purchase_option.get('label_string') label = purchase_option.get('label_string')
@ -703,14 +722,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
'https://player.vimeo.com/player/%s' % feature_id, 'https://player.vimeo.com/player/%s' % feature_id,
{'force_feature_id': True}), 'Vimeo') {'force_feature_id': True}), 'Vimeo')
# Extract video description
if not video_description:
video_description = self._html_search_regex(
r'(?s)<div\s+class="[^"]*description[^"]*"[^>]*>(.*?)</div>',
webpage, 'description', default=None)
if not video_description: if not video_description:
video_description = self._html_search_meta( video_description = self._html_search_meta(
'description', webpage, default=None) ['description', 'og:description', 'twitter:description'],
webpage, default=None)
if not video_description and is_pro: if not video_description and is_pro:
orig_webpage = self._download_webpage( orig_webpage = self._download_webpage(
orig_url, video_id, orig_url, video_id,
@ -719,25 +734,14 @@ class VimeoIE(VimeoBaseInfoExtractor):
if orig_webpage: if orig_webpage:
video_description = self._html_search_meta( video_description = self._html_search_meta(
'description', orig_webpage, default=None) 'description', orig_webpage, default=None)
if not video_description and not is_player: if not video_description:
self._downloader.report_warning('Cannot find video description') self._downloader.report_warning('Cannot find video description')
# Extract upload date
if not timestamp: if not timestamp:
timestamp = self._search_regex( timestamp = self._search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, r'<time[^>]+datetime="([^"]+)"', webpage,
'timestamp', default=None) 'timestamp', default=None)
try:
view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, 'view count'))
like_count = int(self._search_regex(r'UserLikes:(\d+)', webpage, 'like count'))
comment_count = int(self._search_regex(r'UserComments:(\d+)', webpage, 'comment count'))
except RegexNotFoundError:
# This info is only available in vimeo.com/{id} urls
view_count = None
like_count = None
comment_count = None
formats = [] formats = []
source_format = self._extract_original_format( source_format = self._extract_original_format(
@ -756,29 +760,20 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1', r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license') webpage, 'license', default=None, group='license')
channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None info_dict.update({
info_dict = {
'formats': formats, 'formats': formats,
'timestamp': unified_timestamp(timestamp), 'timestamp': unified_timestamp(timestamp),
'description': video_description, 'description': video_description,
'webpage_url': url, 'webpage_url': url,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'license': cc_license, 'license': cc_license,
'channel_id': channel_id, })
'channel_url': channel_url,
}
info_dict = merge_dicts(info_dict, info_dict_config, json_ld) return merge_dicts(info_dict, info_dict_config, json_ld)
return info_dict
class VimeoOndemandIE(VimeoIE): class VimeoOndemandIE(VimeoIE):
IE_NAME = 'vimeo:ondemand' IE_NAME = 'vimeo:ondemand'
_VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/([^/]+/)?(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/(?:[^/]+/)?(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# ondemand video not available via https://vimeo.com/id # ondemand video not available via https://vimeo.com/id
'url': 'https://vimeo.com/ondemand/20704', 'url': 'https://vimeo.com/ondemand/20704',
@ -1062,10 +1057,23 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
page_url, video_id = re.match(self._VALID_URL, url).groups() page_url, video_id = re.match(self._VALID_URL, url).groups()
clip_data = self._download_json( data = self._download_json(
page_url.replace('/review/', '/review/data/'), page_url.replace('/review/', '/review/data/'), video_id)
video_id)['clipData'] if data.get('isLocked') is True:
config_url = clip_data['configUrl'] video_password = self._get_video_password()
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', video_id)
webpage = self._verify_video_password(
'https://vimeo.com/' + video_id, video_id,
video_password, viewer['xsrft'], viewer['vuid'])
clip_page_config = self._parse_json(self._search_regex(
r'window\.vimeo\.clip_page_config\s*=\s*({.+?});',
webpage, 'clip page config'), video_id)
config_url = clip_page_config['player']['config_url']
clip_data = clip_page_config.get('clip') or {}
else:
clip_data = data['clipData']
config_url = clip_data['configUrl']
config = self._download_json(config_url, video_id) config = self._download_json(config_url, video_id)
info_dict = self._parse_config(config, video_id) info_dict = self._parse_config(config, video_id)
source_format = self._extract_original_format( source_format = self._extract_original_format(

View File

@ -300,6 +300,13 @@ class VKIE(VKBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod
def _extract_sibnet_urls(webpage):
# https://help.sibnet.ru/?sibnet_video_embed
return [unescapeHTML(mobj.group('url')) for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//video\.sibnet\.ru/shell\.php\?.*?\bvideoid=\d+.*?)\1',
webpage)]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid') video_id = mobj.group('videoid')
@ -408,6 +415,10 @@ class VKIE(VKBaseIE):
if odnoklassniki_url: if odnoklassniki_url:
return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key()) return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
sibnet_urls = self._extract_sibnet_urls(info_page)
if sibnet_urls:
return self.url_result(sibnet_urls[0])
m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page) m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page)
if m_opts: if m_opts:
m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1)) m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1))

View File

@ -106,7 +106,7 @@ class VLiveIE(VLiveBaseIE):
raise ExtractorError('Unable to log in', expected=True) raise ExtractorError('Unable to log in', expected=True)
def _call_api(self, path_template, video_id, fields=None): def _call_api(self, path_template, video_id, fields=None):
query = {'appId': self._APP_ID, 'gcc': 'KR'} query = {'appId': self._APP_ID, 'gcc': 'KR', 'platformType': 'PC'}
if fields: if fields:
query['fields'] = fields query['fields'] = fields
try: try:

View File

@ -58,6 +58,7 @@ class XFileShareIE(InfoExtractor):
(r'vidlocker\.xyz', 'VidLocker'), (r'vidlocker\.xyz', 'VidLocker'),
(r'vidshare\.tv', 'VidShare'), (r'vidshare\.tv', 'VidShare'),
(r'vup\.to', 'VUp'), (r'vup\.to', 'VUp'),
(r'wolfstream\.tv', 'WolfStream'),
(r'xvideosharing\.com', 'XVideoSharing'), (r'xvideosharing\.com', 'XVideoSharing'),
) )
@ -82,6 +83,9 @@ class XFileShareIE(InfoExtractor):
}, { }, {
'url': 'https://aparat.cam/n4d6dh0wvlpr', 'url': 'https://aparat.cam/n4d6dh0wvlpr',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://wolfstream.tv/nthme29v9u2x',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -11,6 +11,7 @@ from ..utils import (
parse_duration, parse_duration,
sanitized_Request, sanitized_Request,
str_to_int, str_to_int,
url_or_none,
) )
@ -87,10 +88,10 @@ class XTubeIE(InfoExtractor):
'Cookie': 'age_verified=1; cookiesAccepted=1', 'Cookie': 'age_verified=1; cookiesAccepted=1',
}) })
title, thumbnail, duration = [None] * 3 title, thumbnail, duration, sources, media_definition = [None] * 5
config = self._parse_json(self._search_regex( config = self._parse_json(self._search_regex(
r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf)', webpage, 'config', r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf|playerWrapper)', webpage, 'config',
default='{}'), video_id, transform_source=js_to_json, fatal=False) default='{}'), video_id, transform_source=js_to_json, fatal=False)
if config: if config:
config = config.get('mainRoll') config = config.get('mainRoll')
@ -99,20 +100,52 @@ class XTubeIE(InfoExtractor):
thumbnail = config.get('poster') thumbnail = config.get('poster')
duration = int_or_none(config.get('duration')) duration = int_or_none(config.get('duration'))
sources = config.get('sources') or config.get('format') sources = config.get('sources') or config.get('format')
media_definition = config.get('mediaDefinition')
if not isinstance(sources, dict): if not isinstance(sources, dict) and not media_definition:
sources = self._parse_json(self._search_regex( sources = self._parse_json(self._search_regex(
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),', r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id, webpage, 'sources', group='sources'), video_id,
transform_source=js_to_json) transform_source=js_to_json)
formats = [] formats = []
for format_id, format_url in sources.items(): format_urls = set()
formats.append({
'url': format_url, if isinstance(sources, dict):
'format_id': format_id, for format_id, format_url in sources.items():
'height': int_or_none(format_id), format_url = url_or_none(format_url)
}) if not format_url:
continue
if format_url in format_urls:
continue
format_urls.add(format_url)
formats.append({
'url': format_url,
'format_id': format_id,
'height': int_or_none(format_id),
})
if isinstance(media_definition, list):
for media in media_definition:
video_url = url_or_none(media.get('videoUrl'))
if not video_url:
continue
if video_url in format_urls:
continue
format_urls.add(video_url)
format_id = media.get('format')
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif format_id == 'mp4':
height = int_or_none(media.get('quality'))
formats.append({
'url': video_url,
'format_id': '%s-%d' % (format_id, height) if height else format_id,
'height': height,
})
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -154,7 +154,7 @@ class YoukuIE(InfoExtractor):
# request basic data # request basic data
basic_data_params = { basic_data_params = {
'vid': video_id, 'vid': video_id,
'ccode': '0590', 'ccode': '0532',
'client_ip': '192.168.1.1', 'client_ip': '192.168.1.1',
'utid': cna, 'utid': cna,
'client_ts': time.time() / 1000, 'client_ts': time.time() / 1000,

View File

@ -4,13 +4,12 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
extract_attributes,
int_or_none, int_or_none,
str_to_int, str_to_int,
unescapeHTML,
unified_strdate, unified_strdate,
url_or_none, url_or_none,
) )
from ..aes import aes_decrypt_text
class YouPornIE(InfoExtractor): class YouPornIE(InfoExtractor):
@ -34,6 +33,7 @@ class YouPornIE(InfoExtractor):
'tags': list, 'tags': list,
'age_limit': 18, 'age_limit': 18,
}, },
'skip': 'This video has been disabled',
}, { }, {
# Unknown uploader # Unknown uploader
'url': 'http://www.youporn.com/watch/561726/big-tits-awesome-brunette-on-amazing-webcam-show/?from=related3&al=2&from_id=561726&pos=4', 'url': 'http://www.youporn.com/watch/561726/big-tits-awesome-brunette-on-amazing-webcam-show/?from=related3&al=2&from_id=561726&pos=4',
@ -78,6 +78,40 @@ class YouPornIE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id display_id = mobj.group('display_id') or video_id
definitions = self._download_json(
'https://www.youporn.com/api/video/media_definitions/%s/' % video_id,
display_id)
formats = []
for definition in definitions:
if not isinstance(definition, dict):
continue
video_url = url_or_none(definition.get('videoUrl'))
if not video_url:
continue
f = {
'url': video_url,
'filesize': int_or_none(definition.get('videoSize')),
}
height = int_or_none(definition.get('quality'))
# Video URL's path looks like this:
# /201012/17/505835/720p_1500k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
# /201012/17/505835/vl_240p_240k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
# /videos/201703/11/109285532/1080P_4000K_109285532.mp4
# We will benefit from it by extracting some metadata
mobj = re.search(r'(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
if mobj:
if not height:
height = int(mobj.group('height'))
bitrate = int(mobj.group('bitrate'))
f.update({
'format_id': '%dp-%dk' % (height, bitrate),
'tbr': bitrate,
})
f['height'] = height
formats.append(f)
self._sort_formats(formats)
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.youporn.com/watch/%s' % video_id, display_id, 'http://www.youporn.com/watch/%s' % video_id, display_id,
headers={'Cookie': 'age_verified=1'}) headers={'Cookie': 'age_verified=1'})
@ -88,65 +122,6 @@ class YouPornIE(InfoExtractor):
webpage, default=None) or self._html_search_meta( webpage, default=None) or self._html_search_meta(
'title', webpage, fatal=True) 'title', webpage, fatal=True)
links = []
# Main source
definitions = self._parse_json(
self._search_regex(
r'mediaDefinition\s*[=:]\s*(\[.+?\])\s*[;,]', webpage,
'media definitions', default='[]'),
video_id, fatal=False)
if definitions:
for definition in definitions:
if not isinstance(definition, dict):
continue
video_url = url_or_none(definition.get('videoUrl'))
if video_url:
links.append(video_url)
# Fallback #1, this also contains extra low quality 180p format
for _, link in re.findall(r'<a[^>]+href=(["\'])(http(?:(?!\1).)+\.mp4(?:(?!\1).)*)\1[^>]+title=["\']Download [Vv]ideo', webpage):
links.append(link)
# Fallback #2 (unavailable as at 22.06.2017)
sources = self._search_regex(
r'(?s)sources\s*:\s*({.+?})', webpage, 'sources', default=None)
if sources:
for _, link in re.findall(r'[^:]+\s*:\s*(["\'])(http.+?)\1', sources):
links.append(link)
# Fallback #3 (unavailable as at 22.06.2017)
for _, link in re.findall(
r'(?:videoSrc|videoIpadUrl|html5PlayerSrc)\s*[:=]\s*(["\'])(http.+?)\1', webpage):
links.append(link)
# Fallback #4, encrypted links (unavailable as at 22.06.2017)
for _, encrypted_link in re.findall(
r'encryptedQuality\d{3,4}URL\s*=\s*(["\'])([\da-zA-Z+/=]+)\1', webpage):
links.append(aes_decrypt_text(encrypted_link, title, 32).decode('utf-8'))
formats = []
for video_url in set(unescapeHTML(link) for link in links):
f = {
'url': video_url,
}
# Video URL's path looks like this:
# /201012/17/505835/720p_1500k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
# /201012/17/505835/vl_240p_240k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
# /videos/201703/11/109285532/1080P_4000K_109285532.mp4
# We will benefit from it by extracting some metadata
mobj = re.search(r'(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
if mobj:
height = int(mobj.group('height'))
bitrate = int(mobj.group('bitrate'))
f.update({
'format_id': '%dp-%dk' % (height, bitrate),
'height': height,
'tbr': bitrate,
})
formats.append(f)
self._sort_formats(formats)
description = self._html_search_regex( description = self._html_search_regex(
r'(?s)<div[^>]+\bid=["\']description["\'][^>]*>(.+?)</div>', r'(?s)<div[^>]+\bid=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', webpage, 'description',
@ -169,13 +144,12 @@ class YouPornIE(InfoExtractor):
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
average_rating = int_or_none(self._search_regex( view_count = None
r'<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>', views = self._search_regex(
webpage, 'average rating', fatal=False)) r'(<div[^>]+\bclass=["\']js_videoInfoViews["\']>)', webpage,
'views', default=None)
view_count = str_to_int(self._search_regex( if views:
r'(?s)<div[^>]+class=(["\']).*?\bvideoInfoViews\b.*?\1[^>]*>.*?(?P<count>[\d,.]+)<', view_count = str_to_int(extract_attributes(views).get('data-value'))
webpage, 'view count', fatal=False, group='count'))
comment_count = str_to_int(self._search_regex( comment_count = str_to_int(self._search_regex(
r'>All [Cc]omments? \(([\d,.]+)\)', r'>All [Cc]omments? \(([\d,.]+)\)',
webpage, 'comment count', default=None)) webpage, 'comment count', default=None))
@ -201,7 +175,6 @@ class YouPornIE(InfoExtractor):
'duration': duration, 'duration': duration,
'uploader': uploader, 'uploader': uploader,
'upload_date': upload_date, 'upload_date': upload_date,
'average_rating': average_rating,
'view_count': view_count, 'view_count': view_count,
'comment_count': comment_count, 'comment_count': comment_count,
'categories': categories, 'categories': categories,

View File

@ -24,6 +24,7 @@ from ..jsinterp import JSInterpreter
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html, clean_html,
dict_get,
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
@ -45,6 +46,10 @@ from ..utils import (
) )
def parse_qs(url):
return compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
class YoutubeBaseInfoExtractor(InfoExtractor): class YoutubeBaseInfoExtractor(InfoExtractor):
"""Provide base functions for Youtube extractors""" """Provide base functions for Youtube extractors"""
_LOGIN_URL = 'https://accounts.google.com/ServiceLogin' _LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
@ -60,11 +65,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)' _PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)'
def _ids_to_results(self, ids):
return [
self.url_result(vid_id, 'Youtube', video_id=vid_id)
for vid_id in ids]
def _login(self): def _login(self):
""" """
Attempt to log in to YouTube. Attempt to log in to YouTube.
@ -248,7 +248,23 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return True return True
def _initialize_consent(self):
cookies = self._get_cookies('https://www.youtube.com/')
if cookies.get('__Secure-3PSID'):
return
consent_id = None
consent = cookies.get('CONSENT')
if consent:
if 'YES' in consent.value:
return
consent_id = self._search_regex(
r'PENDING\+(\d+)', consent.value, 'consent', default=None)
if not consent_id:
consent_id = random.randint(100, 999)
self._set_cookie('.youtube.com', 'CONSENT', 'YES+cb.20210328-17-p0.en+FX+%s' % consent_id)
def _real_initialize(self): def _real_initialize(self):
self._initialize_consent()
if self._downloader is None: if self._downloader is None:
return return
if not self._login(): if not self._login():
@ -289,7 +305,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return self._parse_json( return self._parse_json(
self._search_regex( self._search_regex(
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg', r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
default='{}'), video_id, fatal=False) default='{}'), video_id, fatal=False) or {}
def _extract_video(self, renderer): def _extract_video(self, renderer):
video_id = renderer['videoId'] video_id = renderer['videoId']
@ -312,7 +328,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
(lambda x: x['ownerText']['runs'][0]['text'], (lambda x: x['ownerText']['runs'][0]['text'],
lambda x: x['shortBylineText']['runs'][0]['text']), compat_str) lambda x: x['shortBylineText']['runs'][0]['text']), compat_str)
return { return {
'_type': 'url_transparent', '_type': 'url',
'ie_key': YoutubeIE.ie_key(), 'ie_key': YoutubeIE.ie_key(),
'id': video_id, 'id': video_id,
'url': video_id, 'url': video_id,
@ -337,26 +353,41 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'(?:www\.)?invidious\.13ad\.de', r'(?:www\.)?invidious\.13ad\.de',
r'(?:www\.)?invidious\.mastodon\.host', r'(?:www\.)?invidious\.mastodon\.host',
r'(?:www\.)?invidious\.zapashcanon\.fr', r'(?:www\.)?invidious\.zapashcanon\.fr',
r'(?:www\.)?invidious\.kavin\.rocks', r'(?:www\.)?(?:invidious(?:-us)?|piped)\.kavin\.rocks',
r'(?:www\.)?invidious\.tinfoil-hat\.net',
r'(?:www\.)?invidious\.himiko\.cloud',
r'(?:www\.)?invidious\.reallyancient\.tech',
r'(?:www\.)?invidious\.tube', r'(?:www\.)?invidious\.tube',
r'(?:www\.)?invidiou\.site', r'(?:www\.)?invidiou\.site',
r'(?:www\.)?invidious\.site', r'(?:www\.)?invidious\.site',
r'(?:www\.)?invidious\.xyz', r'(?:www\.)?invidious\.xyz',
r'(?:www\.)?invidious\.nixnet\.xyz', r'(?:www\.)?invidious\.nixnet\.xyz',
r'(?:www\.)?invidious\.048596\.xyz',
r'(?:www\.)?invidious\.drycat\.fr', r'(?:www\.)?invidious\.drycat\.fr',
r'(?:www\.)?inv\.skyn3t\.in',
r'(?:www\.)?tube\.poal\.co', r'(?:www\.)?tube\.poal\.co',
r'(?:www\.)?tube\.connect\.cafe', r'(?:www\.)?tube\.connect\.cafe',
r'(?:www\.)?vid\.wxzm\.sx', r'(?:www\.)?vid\.wxzm\.sx',
r'(?:www\.)?vid\.mint\.lgbt', r'(?:www\.)?vid\.mint\.lgbt',
r'(?:www\.)?vid\.puffyan\.us',
r'(?:www\.)?yewtu\.be', r'(?:www\.)?yewtu\.be',
r'(?:www\.)?yt\.elukerio\.org', r'(?:www\.)?yt\.elukerio\.org',
r'(?:www\.)?yt\.lelux\.fi', r'(?:www\.)?yt\.lelux\.fi',
r'(?:www\.)?invidious\.ggc-project\.de', r'(?:www\.)?invidious\.ggc-project\.de',
r'(?:www\.)?yt\.maisputain\.ovh', r'(?:www\.)?yt\.maisputain\.ovh',
r'(?:www\.)?ytprivate\.com',
r'(?:www\.)?invidious\.13ad\.de', r'(?:www\.)?invidious\.13ad\.de',
r'(?:www\.)?invidious\.toot\.koeln', r'(?:www\.)?invidious\.toot\.koeln',
r'(?:www\.)?invidious\.fdn\.fr', r'(?:www\.)?invidious\.fdn\.fr',
r'(?:www\.)?watch\.nettohikari\.com', r'(?:www\.)?watch\.nettohikari\.com',
r'(?:www\.)?invidious\.namazso\.eu',
r'(?:www\.)?invidious\.silkky\.cloud',
r'(?:www\.)?invidious\.exonip\.de',
r'(?:www\.)?invidious\.riverside\.rocks',
r'(?:www\.)?invidious\.blamefran\.net',
r'(?:www\.)?invidious\.moomoo\.de',
r'(?:www\.)?ytb\.trom\.tf',
r'(?:www\.)?yt\.cyberhost\.uk',
r'(?:www\.)?kgg2m7yk5aybusll\.onion', r'(?:www\.)?kgg2m7yk5aybusll\.onion',
r'(?:www\.)?qklhadlycap4cnod\.onion', r'(?:www\.)?qklhadlycap4cnod\.onion',
r'(?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion', r'(?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion',
@ -365,6 +396,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'(?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion', r'(?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion',
r'(?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p', r'(?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p',
r'(?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion', r'(?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion',
r'(?:www\.)?w6ijuptxiku4xpnnaetxvnkc5vqcdu7mgns2u77qefoixi63vbvnpnqd\.onion',
r'(?:www\.)?kbjggqkzv65ivcqj6bumvp337z6264huv5kpkwuv6gu5yjiskvan7fad\.onion',
r'(?:www\.)?grwp24hodrefzvjjuccrkw3mjq4tzhaaq32amf33dzpmuxe7ilepcmad\.onion',
r'(?:www\.)?hpniueoejy4opn7bc4ftgazyqjoeqwlvh2uiku2xqku6zpoa4bf5ruid\.onion',
) )
_VALID_URL = r"""(?x)^ _VALID_URL = r"""(?x)^
( (
@ -396,16 +431,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId= |(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId=
) )
)? # all until now is optional -> you can pass the naked ID )? # all until now is optional -> you can pass the naked ID
(?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID (?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?!.*?\blist=
(?:
%(playlist_id)s| # combined list/video URLs are handled by the playlist IE
WL # WL are handled by the watch later IE
)
)
(?(1).+)? # if we found the ID, everything can follow (?(1).+)? # if we found the ID, everything can follow
$""" % { $""" % {
'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE,
'invidious': '|'.join(_INVIDIOUS_SITES), 'invidious': '|'.join(_INVIDIOUS_SITES),
} }
_PLAYER_INFO_RE = ( _PLAYER_INFO_RE = (
@ -792,6 +820,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}, },
'skip': 'This video does not exist.', 'skip': 'This video does not exist.',
}, },
{
# Video with incomplete 'yt:stretch=16:'
'url': 'https://www.youtube.com/watch?v=FRhJzUSJbGI',
'only_matching': True,
},
{ {
# Video licensed under Creative Commons # Video licensed under Creative Commons
'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA', 'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA',
@ -1082,6 +1115,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel': 'inabakumori', 'channel': 'inabakumori',
'uploader_id': 'UCNElM45JypxqAR73RoUQ10g', 'uploader_id': 'UCNElM45JypxqAR73RoUQ10g',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCNElM45JypxqAR73RoUQ10g', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCNElM45JypxqAR73RoUQ10g',
},
{
# restricted location, https://github.com/ytdl-org/youtube-dl/issues/28685
'url': 'cBvYw8_A0vQ',
'info_dict': {
'id': 'cBvYw8_A0vQ',
'ext': 'mp4',
'title': '4K Ueno Okachimachi Street Scenes 上野御徒町歩き',
'description': 'md5:ea770e474b7cd6722b4c95b833c03630',
'upload_date': '20201120',
'uploader': 'Walk around Japan',
'uploader_id': 'UC3o_t8PzBmXf5S9b7GLx1Mw',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UC3o_t8PzBmXf5S9b7GLx1Mw',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1194,6 +1240,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'}, '397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'},
} }
@classmethod
def suitable(cls, url):
# Hack for lazy extractors until more generic solution is implemented
# (see #28780)
from .youtube import parse_qs
qs = parse_qs(url)
if qs.get('list', [None])[0]:
return False
return super(YoutubeIE, cls).suitable(url)
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(YoutubeIE, self).__init__(*args, **kwargs) super(YoutubeIE, self).__init__(*args, **kwargs)
self._code_cache = {} self._code_cache = {}
@ -1451,7 +1507,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
base_url = self.http_scheme() + '//www.youtube.com/' base_url = self.http_scheme() + '//www.youtube.com/'
webpage_url = base_url + 'watch?v=' + video_id webpage_url = base_url + 'watch?v=' + video_id
webpage = self._download_webpage( webpage = self._download_webpage(
webpage_url + '&bpctr=9999999999', video_id, fatal=False) webpage_url + '&bpctr=9999999999&has_verified=1', video_id, fatal=False)
player_response = None player_response = None
if webpage: if webpage:
@ -1464,18 +1520,25 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
playability_status = player_response.get('playabilityStatus') or {} playability_status = player_response.get('playabilityStatus') or {}
if playability_status.get('reason') == 'Sign in to confirm your age': if playability_status.get('reason') == 'Sign in to confirm your age':
pr = self._parse_json(try_get(compat_parse_qs( video_info = self._download_webpage(
self._download_webpage( base_url + 'get_video_info', video_id,
base_url + 'get_video_info', video_id, 'Refetching age-gated info webpage',
'Refetching age-gated info webpage', 'unable to download video info webpage', query={
'unable to download video info webpage', query={ 'video_id': video_id,
'video_id': video_id, 'eurl': 'https://youtube.googleapis.com/v/' + video_id,
'eurl': 'https://youtube.googleapis.com/v/' + video_id, 'html5': 1,
}, fatal=False)), # See https://github.com/ytdl-org/youtube-dl/issues/29333#issuecomment-864049544
lambda x: x['player_response'][0], 'c': 'TVHTML5',
compat_str) or '{}', video_id) 'cver': '6.20180913',
if pr: }, fatal=False)
player_response = pr if video_info:
pr = self._parse_json(
try_get(
compat_parse_qs(video_info),
lambda x: x['player_response'][0], compat_str) or '{}',
video_id, fatal=False)
if pr and isinstance(pr, dict):
player_response = pr
trailer_video_id = try_get( trailer_video_id = try_get(
playability_status, playability_status,
@ -1488,7 +1551,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def get_text(x): def get_text(x):
if not x: if not x:
return return
return x.get('simpleText') or ''.join([r['text'] for r in x['runs']]) text = x.get('simpleText')
if text and isinstance(text, compat_str):
return text
runs = x.get('runs')
if not isinstance(runs, list):
return
return ''.join([r['text'] for r in runs if isinstance(r.get('text'), compat_str)])
search_meta = ( search_meta = (
lambda x: self._html_search_meta(x, webpage, default=None)) \ lambda x: self._html_search_meta(x, webpage, default=None)) \
@ -1687,13 +1756,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
for m in re.finditer(self._meta_regex('og:video:tag'), webpage)] for m in re.finditer(self._meta_regex('og:video:tag'), webpage)]
for keyword in keywords: for keyword in keywords:
if keyword.startswith('yt:stretch='): if keyword.startswith('yt:stretch='):
w, h = keyword.split('=')[1].split(':') mobj = re.search(r'(\d+)\s*:\s*(\d+)', keyword)
w, h = int(w), int(h) if mobj:
if w > 0 and h > 0: # NB: float is intentional for forcing float division
ratio = w / h w, h = (float(v) for v in mobj.groups())
for f in formats: if w > 0 and h > 0:
if f.get('vcodec') != 'none': ratio = w / h
f['stretched_ratio'] = ratio for f in formats:
if f.get('vcodec') != 'none':
f['stretched_ratio'] = ratio
break
thumbnails = [] thumbnails = []
for container in (video_details, microformat): for container in (video_details, microformat):
@ -1963,7 +2035,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
invidio\.us invidio\.us
)/ )/
(?: (?:
(?:channel|c|user|feed)/| (?:channel|c|user|feed|hashtag)/|
(?:playlist|watch)\?.*?\blist=| (?:playlist|watch)\?.*?\blist=|
(?!(?:watch|embed|v|e)\b) (?!(?:watch|embed|v|e)\b)
) )
@ -1989,6 +2061,15 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'title': 'Игорь Клейнер - Playlists', 'title': 'Игорь Клейнер - Playlists',
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2', 'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
}, },
}, {
# playlists, series
'url': 'https://www.youtube.com/c/3blue1brown/playlists?view=50&sort=dd&shelf_id=3',
'playlist_mincount': 5,
'info_dict': {
'id': 'UCYO_jab_esuFRV4b17AJtAw',
'title': '3Blue1Brown - Playlists',
'description': 'md5:e1384e8a133307dd10edee76e875d62f',
},
}, { }, {
# playlists, singlepage # playlists, singlepage
'url': 'https://www.youtube.com/user/ThirstForScience/playlists', 'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
@ -2249,6 +2330,16 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
}, { }, {
'url': 'https://www.youtube.com/TheYoungTurks/live', 'url': 'https://www.youtube.com/TheYoungTurks/live',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.youtube.com/hashtag/cctv9',
'info_dict': {
'id': 'cctv9',
'title': '#cctv9',
},
'playlist_mincount': 350,
}, {
'url': 'https://www.youtube.com/watch?list=PLW4dVinRY435CBE_JD3t-0SRXKfnZHS1P&feature=youtu.be&v=M9cJMXmQ_ZU',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -2271,10 +2362,13 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
@staticmethod @staticmethod
def _extract_grid_item_renderer(item): def _extract_grid_item_renderer(item):
for item_kind in ('Playlist', 'Video', 'Channel'): assert isinstance(item, dict)
renderer = item.get('grid%sRenderer' % item_kind) for key, renderer in item.items():
if renderer: if not key.startswith('grid') or not key.endswith('Renderer'):
return renderer continue
if not isinstance(renderer, dict):
continue
return renderer
def _grid_entries(self, grid_renderer): def _grid_entries(self, grid_renderer):
for item in grid_renderer['items']: for item in grid_renderer['items']:
@ -2284,7 +2378,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if not isinstance(renderer, dict): if not isinstance(renderer, dict):
continue continue
title = try_get( title = try_get(
renderer, lambda x: x['title']['runs'][0]['text'], compat_str) renderer, (lambda x: x['title']['runs'][0]['text'],
lambda x: x['title']['simpleText']), compat_str)
# playlist # playlist
playlist_id = renderer.get('playlistId') playlist_id = renderer.get('playlistId')
if playlist_id: if playlist_id:
@ -2292,10 +2387,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'https://www.youtube.com/playlist?list=%s' % playlist_id, 'https://www.youtube.com/playlist?list=%s' % playlist_id,
ie=YoutubeTabIE.ie_key(), video_id=playlist_id, ie=YoutubeTabIE.ie_key(), video_id=playlist_id,
video_title=title) video_title=title)
continue
# video # video
video_id = renderer.get('videoId') video_id = renderer.get('videoId')
if video_id: if video_id:
yield self._extract_video(renderer) yield self._extract_video(renderer)
continue
# channel # channel
channel_id = renderer.get('channelId') channel_id = renderer.get('channelId')
if channel_id: if channel_id:
@ -2304,6 +2401,17 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
yield self.url_result( yield self.url_result(
'https://www.youtube.com/channel/%s' % channel_id, 'https://www.youtube.com/channel/%s' % channel_id,
ie=YoutubeTabIE.ie_key(), video_title=title) ie=YoutubeTabIE.ie_key(), video_title=title)
continue
# generic endpoint URL support
ep_url = urljoin('https://www.youtube.com/', try_get(
renderer, lambda x: x['navigationEndpoint']['commandMetadata']['webCommandMetadata']['url'],
compat_str))
if ep_url:
for ie in (YoutubeTabIE, YoutubePlaylistIE, YoutubeIE):
if ie.suitable(ep_url):
yield self.url_result(
ep_url, ie=ie.ie_key(), video_id=ie._match_id(ep_url), video_title=title)
break
def _shelf_entries_from_content(self, shelf_renderer): def _shelf_entries_from_content(self, shelf_renderer):
content = shelf_renderer.get('content') content = shelf_renderer.get('content')
@ -2396,6 +2504,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
for entry in self._post_thread_entries(renderer): for entry in self._post_thread_entries(renderer):
yield entry yield entry
def _rich_grid_entries(self, contents):
for content in contents:
video_renderer = try_get(content, lambda x: x['richItemRenderer']['content']['videoRenderer'], dict)
if video_renderer:
entry = self._video_entry(video_renderer)
if entry:
yield entry
@staticmethod @staticmethod
def _build_continuation_query(continuation, ctp=None): def _build_continuation_query(continuation, ctp=None):
query = { query = {
@ -2441,81 +2557,97 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
ctp = continuation_ep.get('clickTrackingParams') ctp = continuation_ep.get('clickTrackingParams')
return YoutubeTabIE._build_continuation_query(continuation, ctp) return YoutubeTabIE._build_continuation_query(continuation, ctp)
def _entries(self, tab, identity_token): def _entries(self, tab, item_id, webpage):
tab_content = try_get(tab, lambda x: x['content'], dict) tab_content = try_get(tab, lambda x: x['content'], dict)
if not tab_content: if not tab_content:
return return
slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict) slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict)
if not slr_renderer: if slr_renderer:
return is_channels_tab = tab.get('title') == 'Channels'
is_channels_tab = tab.get('title') == 'Channels' continuation = None
continuation = None slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or []
slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or [] for slr_content in slr_contents:
for slr_content in slr_contents: if not isinstance(slr_content, dict):
if not isinstance(slr_content, dict):
continue
is_renderer = try_get(slr_content, lambda x: x['itemSectionRenderer'], dict)
if not is_renderer:
continue
isr_contents = try_get(is_renderer, lambda x: x['contents'], list) or []
for isr_content in isr_contents:
if not isinstance(isr_content, dict):
continue continue
renderer = isr_content.get('playlistVideoListRenderer') is_renderer = try_get(slr_content, lambda x: x['itemSectionRenderer'], dict)
if renderer: if not is_renderer:
for entry in self._playlist_entries(renderer):
yield entry
continuation = self._extract_continuation(renderer)
continue continue
renderer = isr_content.get('gridRenderer') isr_contents = try_get(is_renderer, lambda x: x['contents'], list) or []
if renderer: for isr_content in isr_contents:
for entry in self._grid_entries(renderer): if not isinstance(isr_content, dict):
yield entry continue
continuation = self._extract_continuation(renderer) renderer = isr_content.get('playlistVideoListRenderer')
continue if renderer:
renderer = isr_content.get('shelfRenderer') for entry in self._playlist_entries(renderer):
if renderer: yield entry
for entry in self._shelf_entries(renderer, not is_channels_tab): continuation = self._extract_continuation(renderer)
yield entry continue
continue renderer = isr_content.get('gridRenderer')
renderer = isr_content.get('backstagePostThreadRenderer') if renderer:
if renderer: for entry in self._grid_entries(renderer):
for entry in self._post_thread_entries(renderer): yield entry
yield entry continuation = self._extract_continuation(renderer)
continuation = self._extract_continuation(renderer) continue
continue renderer = isr_content.get('shelfRenderer')
renderer = isr_content.get('videoRenderer') if renderer:
if renderer: for entry in self._shelf_entries(renderer, not is_channels_tab):
entry = self._video_entry(renderer) yield entry
if entry: continue
yield entry renderer = isr_content.get('backstagePostThreadRenderer')
if renderer:
for entry in self._post_thread_entries(renderer):
yield entry
continuation = self._extract_continuation(renderer)
continue
renderer = isr_content.get('videoRenderer')
if renderer:
entry = self._video_entry(renderer)
if entry:
yield entry
if not continuation:
continuation = self._extract_continuation(is_renderer)
if not continuation: if not continuation:
continuation = self._extract_continuation(is_renderer) continuation = self._extract_continuation(slr_renderer)
else:
rich_grid_renderer = tab_content.get('richGridRenderer')
if not rich_grid_renderer:
return
for entry in self._rich_grid_entries(rich_grid_renderer.get('contents') or []):
yield entry
continuation = self._extract_continuation(rich_grid_renderer)
if not continuation: ytcfg = self._extract_ytcfg(item_id, webpage)
continuation = self._extract_continuation(slr_renderer) client_version = try_get(
ytcfg, lambda x: x['INNERTUBE_CLIENT_VERSION'], compat_str) or '2.20210407.08.00'
headers = { headers = {
'x-youtube-client-name': '1', 'x-youtube-client-name': '1',
'x-youtube-client-version': '2.20201112.04.01', 'x-youtube-client-version': client_version,
'content-type': 'application/json', 'content-type': 'application/json',
} }
context = try_get(ytcfg, lambda x: x['INNERTUBE_CONTEXT'], dict) or {
'client': {
'clientName': 'WEB',
'clientVersion': client_version,
}
}
visitor_data = try_get(context, lambda x: x['client']['visitorData'], compat_str)
identity_token = self._extract_identity_token(ytcfg, webpage)
if identity_token: if identity_token:
headers['x-youtube-identity-token'] = identity_token headers['x-youtube-identity-token'] = identity_token
data = { data = {
'context': { 'context': context,
'client': {
'clientName': 'WEB',
'clientVersion': '2.20201021.03.00',
}
},
} }
for page_num in itertools.count(1): for page_num in itertools.count(1):
if not continuation: if not continuation:
break break
if visitor_data:
headers['x-goog-visitor-id'] = visitor_data
data['continuation'] = continuation['continuation'] data['continuation'] = continuation['continuation']
data['clickTracking'] = { data['clickTracking'] = {
'clickTrackingParams': continuation['itct'] 'clickTrackingParams': continuation['itct']
@ -2540,6 +2672,9 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
if not response: if not response:
break break
visitor_data = try_get(
response, lambda x: x['responseContext']['visitorData'], compat_str) or visitor_data
continuation_contents = try_get( continuation_contents = try_get(
response, lambda x: x['continuationContents'], dict) response, lambda x: x['continuationContents'], dict)
if continuation_contents: if continuation_contents:
@ -2562,13 +2697,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
continuation = self._extract_continuation(continuation_renderer) continuation = self._extract_continuation(continuation_renderer)
continue continue
on_response_received = dict_get(response, ('onResponseReceivedActions', 'onResponseReceivedEndpoints'))
continuation_items = try_get( continuation_items = try_get(
response, lambda x: x['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems'], list) on_response_received, lambda x: x[0]['appendContinuationItemsAction']['continuationItems'], list)
if continuation_items: if continuation_items:
continuation_item = continuation_items[0] continuation_item = continuation_items[0]
if not isinstance(continuation_item, dict): if not isinstance(continuation_item, dict):
continue continue
renderer = continuation_item.get('gridVideoRenderer') renderer = self._extract_grid_item_renderer(continuation_item)
if renderer: if renderer:
grid_renderer = {'items': continuation_items} grid_renderer = {'items': continuation_items}
for entry in self._grid_entries(grid_renderer): for entry in self._grid_entries(grid_renderer):
@ -2582,6 +2718,19 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
yield entry yield entry
continuation = self._extract_continuation(video_list_renderer) continuation = self._extract_continuation(video_list_renderer)
continue continue
renderer = continuation_item.get('backstagePostThreadRenderer')
if renderer:
continuation_renderer = {'contents': continuation_items}
for entry in self._post_thread_continuation_entries(continuation_renderer):
yield entry
continuation = self._extract_continuation(continuation_renderer)
continue
renderer = continuation_item.get('richItemRenderer')
if renderer:
for entry in self._rich_grid_entries(continuation_items):
yield entry
continuation = self._extract_continuation({'contents': continuation_items})
continue
break break
@ -2634,11 +2783,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
alerts.append(text) alerts.append(text)
return '\n'.join(alerts) return '\n'.join(alerts)
def _extract_from_tabs(self, item_id, webpage, data, tabs, identity_token): def _extract_from_tabs(self, item_id, webpage, data, tabs):
selected_tab = self._extract_selected_tab(tabs) selected_tab = self._extract_selected_tab(tabs)
renderer = try_get( renderer = try_get(
data, lambda x: x['metadata']['channelMetadataRenderer'], dict) data, lambda x: x['metadata']['channelMetadataRenderer'], dict)
playlist_id = title = description = None playlist_id = item_id
title = description = None
if renderer: if renderer:
channel_title = renderer.get('title') or item_id channel_title = renderer.get('title') or item_id
tab_title = selected_tab.get('title') tab_title = selected_tab.get('title')
@ -2647,14 +2797,18 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
title += ' - %s' % tab_title title += ' - %s' % tab_title
description = renderer.get('description') description = renderer.get('description')
playlist_id = renderer.get('externalId') playlist_id = renderer.get('externalId')
renderer = try_get( else:
data, lambda x: x['metadata']['playlistMetadataRenderer'], dict) renderer = try_get(
if renderer: data, lambda x: x['metadata']['playlistMetadataRenderer'], dict)
title = renderer.get('title') if renderer:
description = None title = renderer.get('title')
playlist_id = item_id else:
renderer = try_get(
data, lambda x: x['header']['hashtagHeaderRenderer'], dict)
if renderer:
title = try_get(renderer, lambda x: x['hashtag']['simpleText'])
playlist = self.playlist_result( playlist = self.playlist_result(
self._entries(selected_tab, identity_token), self._entries(selected_tab, item_id, webpage),
playlist_id=playlist_id, playlist_title=title, playlist_id=playlist_id, playlist_title=title,
playlist_description=description) playlist_description=description)
playlist.update(self._extract_uploader(data)) playlist.update(self._extract_uploader(data))
@ -2678,8 +2832,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
self._playlist_entries(playlist), playlist_id=playlist_id, self._playlist_entries(playlist), playlist_id=playlist_id,
playlist_title=title) playlist_title=title)
def _extract_identity_token(self, webpage, item_id): def _extract_identity_token(self, ytcfg, webpage):
ytcfg = self._extract_ytcfg(item_id, webpage)
if ytcfg: if ytcfg:
token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str) token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str)
if token: if token:
@ -2693,7 +2846,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
url = compat_urlparse.urlunparse( url = compat_urlparse.urlunparse(
compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com')) compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com'))
# Handle both video/playlist URLs # Handle both video/playlist URLs
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = parse_qs(url)
video_id = qs.get('v', [None])[0] video_id = qs.get('v', [None])[0]
playlist_id = qs.get('list', [None])[0] playlist_id = qs.get('list', [None])[0]
if video_id and playlist_id: if video_id and playlist_id:
@ -2702,12 +2855,11 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id) return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id)) self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
webpage = self._download_webpage(url, item_id) webpage = self._download_webpage(url, item_id)
identity_token = self._extract_identity_token(webpage, item_id)
data = self._extract_yt_initial_data(item_id, webpage) data = self._extract_yt_initial_data(item_id, webpage)
tabs = try_get( tabs = try_get(
data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list) data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list)
if tabs: if tabs:
return self._extract_from_tabs(item_id, webpage, data, tabs, identity_token) return self._extract_from_tabs(item_id, webpage, data, tabs)
playlist = try_get( playlist = try_get(
data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict) data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict)
if playlist: if playlist:
@ -2790,12 +2942,19 @@ class YoutubePlaylistIE(InfoExtractor):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if YoutubeTabIE.suitable(url) else super( if YoutubeTabIE.suitable(url):
YoutubePlaylistIE, cls).suitable(url) return False
# Hack for lazy extractors until more generic solution is implemented
# (see #28780)
from .youtube import parse_qs
qs = parse_qs(url)
if qs.get('v', [None])[0]:
return False
return super(YoutubePlaylistIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = parse_qs(url)
if not qs: if not qs:
qs = {'list': playlist_id} qs = {'list': playlist_id}
return self.url_result( return self.url_result(

View File

@ -768,7 +768,7 @@ def parseOpts(overrideArguments=None):
action='store_true', dest='rm_cachedir', action='store_true', dest='rm_cachedir',
help='Delete all filesystem cache files') help='Delete all filesystem cache files')
thumbnail = optparse.OptionGroup(parser, 'Thumbnail images') thumbnail = optparse.OptionGroup(parser, 'Thumbnail Options')
thumbnail.add_option( thumbnail.add_option(
'--write-thumbnail', '--write-thumbnail',
action='store_true', dest='writethumbnail', default=False, action='store_true', dest='writethumbnail', default=False,

View File

@ -231,7 +231,10 @@ class FFmpegPostProcessor(PostProcessor):
stdout, stderr = p.communicate() stdout, stderr = p.communicate()
if p.returncode != 0: if p.returncode != 0:
stderr = stderr.decode('utf-8', 'replace') stderr = stderr.decode('utf-8', 'replace')
msg = stderr.strip().split('\n')[-1] msgs = stderr.strip().split('\n')
msg = msgs[-1]
if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] ' + '\n'.join(msgs[:-1]))
raise FFmpegPostProcessorError(msg) raise FFmpegPostProcessorError(msg)
self.try_utime(out_path, oldest_mtime, oldest_mtime) self.try_utime(out_path, oldest_mtime, oldest_mtime)

View File

@ -39,6 +39,7 @@ import zlib
from .compat import ( from .compat import (
compat_HTMLParseError, compat_HTMLParseError,
compat_HTMLParser, compat_HTMLParser,
compat_HTTPError,
compat_basestring, compat_basestring,
compat_chr, compat_chr,
compat_cookiejar, compat_cookiejar,
@ -2879,12 +2880,60 @@ class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler): class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
if sys.version_info[0] < 3: """YoutubeDL redirect handler
def redirect_request(self, req, fp, code, msg, headers, newurl):
# On python 2 urlh.geturl() may sometimes return redirect URL The code is based on HTTPRedirectHandler implementation from CPython [1].
# as byte string instead of unicode. This workaround allows
# to force it always return unicode. This redirect handler solves two issues:
return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl)) - ensures redirect URL is always unicode under python 2
- introduces support for experimental HTTP response status code
308 Permanent Redirect [2] used by some sites [3]
1. https://github.com/python/cpython/blob/master/Lib/urllib/request.py
2. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/308
3. https://github.com/ytdl-org/youtube-dl/issues/28768
"""
http_error_301 = http_error_303 = http_error_307 = http_error_308 = compat_urllib_request.HTTPRedirectHandler.http_error_302
def redirect_request(self, req, fp, code, msg, headers, newurl):
"""Return a Request or None in response to a redirect.
This is called by the http_error_30x methods when a
redirection response is received. If a redirection should
take place, return a new Request to allow http_error_30x to
perform the redirect. Otherwise, raise HTTPError if no-one
else should try to handle this url. Return None if you can't
but another Handler might.
"""
m = req.get_method()
if (not (code in (301, 302, 303, 307, 308) and m in ("GET", "HEAD")
or code in (301, 302, 303) and m == "POST")):
raise compat_HTTPError(req.full_url, code, msg, headers, fp)
# Strictly (according to RFC 2616), 301 or 302 in response to
# a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib.request, in this case). In practice,
# essentially all clients do redirect in this case, so we do
# the same.
# On python 2 urlh.geturl() may sometimes return redirect URL
# as byte string instead of unicode. This workaround allows
# to force it always return unicode.
if sys.version_info[0] < 3:
newurl = compat_str(newurl)
# Be conciliant with URIs containing a space. This is mainly
# redundant with the more complete encoding done in http_error_302(),
# but it is kept for compatibility with other callers.
newurl = newurl.replace(' ', '%20')
CONTENT_HEADERS = ("content-length", "content-type")
# NB: don't use dict comprehension for python 2.6 compatibility
newheaders = dict((k, v) for k, v in req.headers.items()
if k.lower() not in CONTENT_HEADERS)
return compat_urllib_request.Request(
newurl, headers=newheaders, origin_req_host=req.origin_req_host,
unverifiable=True)
def extract_timezone(date_str): def extract_timezone(date_str):

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2021.03.25' __version__ = '2021.06.06'