Merge branch 'master' into cspanlive

This commit is contained in:
Clay Freeman 2019-12-04 07:10:45 -06:00 committed by GitHub
commit 91f0b0477c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
159 changed files with 6504 additions and 6999 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2019.07.16**
- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.07.16
[debug] youtube-dl version 2019.11.28
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2019.07.16**
- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2019.07.16**
- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2019.07.16**
- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.07.16
[debug] youtube-dl version 2019.11.28
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2019.07.16**
- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -21,6 +21,12 @@ matrix:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core

View File

@ -339,6 +339,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

342
ChangeLog
View File

@ -1,3 +1,345 @@
version 2019.11.28
Core
+ [utils] Add generic caesar cipher and rot47
* [utils] Handle rd-suffixed day parts in unified_strdate (#23199)
Extractors
* [vimeo] Improve extraction
* Fix review extraction
* Fix ondemand extraction
* Make password protected player case as an expected error (#22896)
* Simplify channel based extractors code
- [openload] Remove extractor (#11999)
- [verystream] Remove extractor
- [streamango] Remove extractor (#15406)
* [dailymotion] Improve extraction
* Extract http formats included in m3u8 manifest
* Fix user extraction (#3553, #21415)
+ Add suport for User Authentication (#11491)
* Fix password protected videos extraction (#23176)
* Respect age limit option and family filter cookie value (#18437)
* Handle video url playlist query param
* Report allowed countries for geo-restricted videos
* [corus] Improve extraction
+ Add support for Series Plus, W Network, YTV, ABC Spark, disneychannel.com
and disneylachaine.ca (#20861)
+ Add support for self hosted videos (#22075)
* Detect DRM protection (#14910, #9164)
* [vivo] Fix extraction (#22328, #22279)
+ [bitchute] Extract upload date (#22990, #23193)
* [soundcloud] Update client id (#23214)
version 2019.11.22
Core
+ [extractor/common] Clean jwplayer description HTML tags
+ [extractor/common] Add data, headers and query to all major extract formats
methods
Extractors
* [chaturbate] Fix extraction (#23010, #23012)
+ [ntvru] Add support for non relative file URLs (#23140)
* [vk] Fix wall audio thumbnails extraction (#23135)
* [ivi] Fix format extraction (#21991)
- [comcarcoff] Remove extractor
+ [drtv] Add support for new URL schema (#23059)
+ [nexx] Add support for Multi Player JS Setup (#23052)
+ [teamcoco] Add support for new videos (#23054)
* [soundcloud] Check if the soundtrack has downloads left (#23045)
* [facebook] Fix posts video data extraction (#22473)
- [addanime] Remove extractor
- [minhateca] Remove extractor
- [daisuki] Remove extractor
* [seeker] Fix extraction
- [revision3] Remove extractors
* [twitch] Fix video comments URL (#18593, #15828)
* [twitter] Improve extraction
+ Add support for generic embeds (#22168)
* Always extract http formats for native videos (#14934)
+ Add support for Twitter Broadcasts (#21369)
+ Extract more metadata
* Improve VMap format extraction
* Unify extraction code for both twitter statuses and cards
+ [twitch] Add support for Clip embed URLs
* [lnkgo] Fix extraction (#16834)
* [mixcloud] Improve extraction
* Improve metadata extraction (#11721)
* Fix playlist extraction (#22378)
* Fix user mixes extraction (#15197, #17865)
+ [kinja] Add support for Kinja embeds (#5756, #11282, #22237, #22384)
* [onionstudios] Fix extraction
+ [hotstar] Pass Referer header to format requests (#22836)
* [dplay] Minimize response size
+ [patreon] Extract uploader_id and filesize
* [patreon] Minimize response size
* [roosterteeth] Fix login request (#16094, #22689)
version 2019.11.05
Extractors
+ [scte] Add support for learning.scte.org (#22975)
+ [msn] Add support for Vidible and AOL embeds (#22195, #22227)
* [myspass] Fix video URL extraction and improve metadata extraction (#22448)
* [jamendo] Improve extraction
* Fix album extraction (#18564)
* Improve metadata extraction (#18565, #21379)
* [mediaset] Relax URL guid matching (#18352)
+ [mediaset] Extract unprotected M3U and MPD manifests (#17204)
* [telegraaf] Fix extraction
+ [bellmedia] Add support for marilyn.ca videos (#22193)
* [stv] Fix extraction (#22928)
- [iconosquare] Remove extractor
- [keek] Remove extractor
- [gameone] Remove extractor (#21778)
- [flipagram] Remove extractor
- [bambuser] Remove extractor
* [wistia] Reduce embed extraction false positives
+ [wistia] Add support for inline embeds (#22931)
- [go90] Remove extractor
* [kakao] Remove raw request
+ [kakao] Extract format total bitrate
* [daum] Fix VOD and Clip extracton (#15015)
* [kakao] Improve extraction
+ Add support for embed URLs
+ Add support for Kakao Legacy vid based embed URLs
* Only extract fields used for extraction
* Strip description and extract tags
* [mixcloud] Fix cloudcast data extraction (#22821)
* [yahoo] Improve extraction
+ Add support for live streams (#3597, #3779, #22178)
* Bypass cookie consent page for european domains (#16948, #22576)
+ Add generic support for embeds (#20332)
* [tv2] Fix and improve extraction (#22787)
+ [tv2dk] Add support for TV2 DK sites
* [onet] Improve extraction …
+ Add support for onet100.vod.pl
+ Extract m3u8 formats
* Correct audio only format info
* [fox9] Fix extraction
version 2019.10.29
Core
* [utils] Actualize major IPv4 address blocks per country
Extractors
+ [go] Add support for abc.com and freeform.com (#22823, #22864)
+ [mtv] Add support for mtvjapan.com
* [mtv] Fix extraction for mtv.de (#22113)
* [videodetective] Fix extraction
* [internetvideoarchive] Fix extraction
* [nbcnews] Fix extraction (#12569, #12576, #21703, #21923)
- [hark] Remove extractor
- [tutv] Remove extractor
- [learnr] Remove extractor
- [macgamestore] Remove extractor
* [la7] Update Kaltura service URL (#22358)
* [thesun] Fix extraction (#16966)
- [makertv] Remove extractor
+ [tenplay] Add support for 10play.com.au (#21446)
* [soundcloud] Improve extraction
* Improve format extraction (#22123)
+ Extract uploader_id and uploader_url (#21916)
+ Extract all known thumbnails (#19071, #20659)
* Fix extration for private playlists (#20976)
+ Add support for playlist embeds (#20976)
* Skip preview formats (#22806)
* [dplay] Improve extraction
+ Add support for dplay.fi, dplay.jp and es.dplay.com (#16969)
* Fix it.dplay.com extraction (#22826)
+ Extract creator, tags and thumbnails
* Handle playback API call errors
+ [discoverynetworks] Add support for dplay.co.uk
* [vk] Improve extraction
+ Add support for Odnoklassniki embeds
+ Extract more videos from user lists (#4470)
+ Fix wall post audio extraction (#18332)
* Improve error detection (#22568)
+ [odnoklassniki] Add support for embeds
* [puhutv] Improve extraction
* Fix subtitles extraction
* Transform HLS URLs to HTTP URLs
* Improve metadata extraction
* [ceskatelevize] Skip DRM media
+ [facebook] Extract subtitles (#22777)
* [globo] Handle alternative hash signing method
version 2019.10.22
Core
* [utils] Improve subtitles_filename (#22753)
Extractors
* [facebook] Bypass download rate limits (#21018)
+ [contv] Add support for contv.com
- [viewster] Remove extractor
* [xfileshare] Improve extractor (#17032, #17906, #18237, #18239)
* Update the list of domains
+ Add support for aa-encoded video data
* Improve jwplayer format extraction
+ Add support for Clappr sources
* [mangomolo] Fix video format extraction and add support for player URLs
* [audioboom] Improve metadata extraction
* [twitch] Update VOD URL matching (#22395, #22727)
- [mit] Remove support for video.mit.edu (#22403)
- [servingsys] Remove extractor (#22639)
* [dumpert] Fix extraction (#22428, #22564)
* [atresplayer] Fix extraction (#16277, #16716)
version 2019.10.16
Core
* [extractor/common] Make _is_valid_url more relaxed
Extractors
* [vimeo] Improve album videos id extraction (#22599)
+ [globo] Extract subtitles (#22713)
* [bokecc] Improve player params extraction (#22638)
* [nexx] Handle result list (#22666)
* [vimeo] Fix VHX embed extraction
* [nbc] Switch to graphql API (#18581, #22693, #22701)
- [vessel] Remove extractor
- [promptfile] Remove extractor (#6239)
* [kaltura] Fix service URL extraction (#22658)
* [kaltura] Fix embed info strip (#22658)
* [globo] Fix format extraction (#20319)
* [redtube] Improve metadata extraction (#22492, #22615)
* [pornhub:uservideos:upload] Fix extraction (#22619)
+ [telequebec:squat] Add support for squat.telequebec.tv (#18503)
- [wimp] Remove extractor (#22088, #22091)
+ [gfycat] Extend URL regular expression (#22225)
+ [chaturbate] Extend URL regular expression (#22309)
* [peertube] Update instances (#22414)
+ [telequebec] Add support for coucou.telequebec.tv (#22482)
+ [xvideos] Extend URL regular expression (#22471)
- [youtube] Remove support for invidious.enkirton.net (#22543)
+ [openload] Add support for oload.monster (#22592)
* [nrktv:seriebase] Fix extraction (#22596)
+ [youtube] Add support for yt.lelux.fi (#22597)
* [orf:tvthek] Make manifest requests non fatal (#22578)
* [teachable] Skip login when already logged in (#22572)
* [viewlift] Improve extraction (#22545)
* [nonktube] Fix extraction (#22544)
version 2019.09.28
Core
* [YoutubeDL] Honour all --get-* options with --flat-playlist (#22493)
Extractors
* [vk] Fix extraction (#22522)
* [heise] Fix kaltura embeds extraction (#22514)
* [ted] Check for resources validity and extract subtitled downloads (#22513)
+ [youtube] Add support for
owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya.b32.i2p (#22292)
+ [nhk] Add support for clips
* [nhk] Fix video extraction (#22249, #22353)
* [byutv] Fix extraction (#22070)
+ [openload] Add support for oload.online (#22304)
+ [youtube] Add support for invidious.drycat.fr (#22451)
* [jwplatfom] Do not match video URLs (#20596, #22148)
* [youtube:playlist] Unescape playlist uploader (#22483)
+ [bilibili] Add support audio albums and songs (#21094)
+ [instagram] Add support for tv URLs
+ [mixcloud] Allow uppercase letters in format URLs (#19280)
* [brightcove] Delegate all supported legacy URLs to new extractor (#11523,
#12842, #13912, #15669, #16303)
* [hotstar] Use native HLS downloader by default
+ [hotstar] Extract more formats (#22323)
* [9now] Fix extraction (#22361)
* [zdf] Bypass geo restriction
+ [tv4] Extract series metadata
* [tv4] Fix extraction (#22443)
version 2019.09.12.1
Extractors
* [youtube] Remove quality and tbr for itag 43 (#22372)
version 2019.09.12
Extractors
* [youtube] Quick extraction tempfix (#22367, #22163)
version 2019.09.01
Core
+ [extractor/generic] Add support for squarespace embeds (#21294, #21802,
#21859)
+ [downloader/external] Respect mtime option for aria2c (#22242)
Extractors
+ [xhamster:user] Add support for user pages (#16330, #18454)
+ [xhamster] Add support for more domains
+ [verystream] Add support for woof.tube (#22217)
+ [dailymotion] Add support for lequipe.fr (#21328, #22152)
+ [openload] Add support for oload.vip (#22205)
+ [bbccouk] Extend URL regular expression (#19200)
+ [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223)
* [safari] Fix authentication (#22161, #22184)
* [usanetwork] Fix extraction (#22105)
+ [einthusan] Add support for einthusan.ca (#22171)
* [youtube] Improve unavailable message extraction (#22117)
+ [piksel] Extract subtitles (#20506)
version 2019.08.13
Core
* [downloader/fragment] Fix ETA calculation of resumed download (#21992)
* [YoutubeDL] Check annotations availability (#18582)
Extractors
* [youtube:playlist] Improve flat extraction (#21927)
* [youtube] Fix annotations extraction (#22045)
+ [discovery] Extract series meta field (#21808)
* [youtube] Improve error detection (#16445)
* [vimeo] Fix album extraction (#1933, #15704, #15855, #18967, #21986)
+ [roosterteeth] Add support for watch URLs
* [discovery] Limit video data by show slug (#21980)
version 2019.08.02
Extractors
+ [tvigle] Add support for HLS and DASH formats (#21967)
* [tvigle] Fix extraction (#21967)
+ [yandexvideo] Add support for DASH formats (#21971)
* [discovery] Use API call for video data extraction (#21808)
+ [mgtv] Extract format_note (#21881)
* [tvn24] Fix metadata extraction (#21833, #21834)
* [dlive] Relax URL regular expression (#21909)
+ [openload] Add support for oload.best (#21913)
* [youtube] Improve metadata extraction for age gate content (#21943)
version 2019.07.30
Extractors
* [youtube] Fix and improve title and description extraction (#21934)
version 2019.07.27
Extractors
+ [yahoo:japannews] Add support for yahoo.co.jp (#21698, #21265)
+ [discovery] Add support go.discovery.com URLs
* [youtube:playlist] Relax video regular expression (#21844)
* [generic] Restrict --default-search schemeless URLs detection pattern
(#21842)
* [vrv] Fix CMS signing query extraction (#21809)
version 2019.07.16
Extractors

View File

@ -752,8 +752,8 @@ As a last resort, you can also uninstall the version installed by your package m
Afterwards, simply follow [our manual installation instructions](https://ytdl-org.github.io/youtube-dl/download.html):
```
sudo wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
hash -r
```
@ -1216,6 +1216,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

View File

@ -26,7 +26,6 @@
- **AcademicEarth:Course**
- **acast**
- **acast:channel**
- **AddAnime**
- **ADN**: Anime Digital Network
- **AdobeConnect**
- **AdobeTV**
@ -76,8 +75,6 @@
- **awaan:video**
- **AZMedien**: AZ Medien videos
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
- **Bandcamp**
- **Bandcamp:album**
- **Bandcamp:weekly**
@ -98,6 +95,8 @@
- **Bigflix**
- **Bild**: Bild.de
- **BiliBili**
- **BilibiliAudio**
- **BilibiliAudioAlbum**
- **BioBioChileTV**
- **BIQLE**
- **BitChute**
@ -175,12 +174,12 @@
- **CNN**
- **CNNArticle**
- **CNNBlogs**
- **ComCarCoff**
- **ComedyCentral**
- **ComedyCentralFullEpisodes**
- **ComedyCentralShortname**
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **CONtv**
- **Corus**
- **Coub**
- **Cracked**
@ -202,8 +201,6 @@
- **dailymotion**
- **dailymotion:playlist**
- **dailymotion:user**
- **DaisukiMotto**
- **DaisukiMottoPlaylist**
- **daum.net**
- **daum.net:clip**
- **daum.net:playlist**
@ -229,7 +226,6 @@
- **DouyuShow**
- **DouyuTV**: 斗鱼
- **DPlay**
- **DPlayIt**
- **DRBonanza**
- **Dropbox**
- **DrTuber**
@ -282,12 +278,12 @@
- **FiveThirtyEight**
- **FiveTV**
- **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **Formula1**
- **FOX**
- **FOX9**
- **FOX9News**
- **Foxgay**
- **foxnews**: Fox News and Fox Business Video
- **foxnews:article**
@ -313,8 +309,6 @@
- **FXNetworks**
- **Gaia**
- **GameInformer**
- **GameOne**
- **gameone:playlist**
- **GameSpot**
- **GameStar**
- **Gaskrank**
@ -329,14 +323,12 @@
- **Globo**
- **GloboArticle**
- **Go**
- **Go90**
- **GodTube**
- **Golem**
- **GoogleDrive**
- **Goshgay**
- **GPUTechConf**
- **Groupon**
- **Hark**
- **hbo**
- **HearThisAt**
- **Heise**
@ -365,7 +357,6 @@
- **Hungama**
- **HungamaSong**
- **Hypem**
- **Iconosquare**
- **ign.com**
- **imdb**: Internet Movie Database trailers
- **imdb:list**: Internet Movie Database lists
@ -405,11 +396,11 @@
- **Kankan**
- **Karaoketv**
- **KarriereVideos**
- **keek**
- **KeezMovies**
- **Ketnet**
- **KhanAcademy**
- **KickStarter**
- **KinjaEmbed**
- **KinoPoisk**
- **KonserthusetPlay**
- **kontrtube**: KontrTube.ru - Труба зовёт
@ -429,7 +420,6 @@
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
- **Lecturio**
- **LecturioCourse**
@ -463,11 +453,9 @@
- **lynda**: lynda.com videos
- **lynda:course**: lynda.com online courses
- **m6**
- **macgamestore**: MacGameStore trailers
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MakerTV**
- **MallTV**
- **mangomolo:live**
- **mangomolo:video**
@ -494,14 +482,12 @@
- **Mgoon**
- **MGTV**: 芒果TV
- **MiaoPai**
- **Minhateca**
- **MinistryGrid**
- **Minoto**
- **miomio.tv**
- **MiTele**: mitele.es
- **mixcloud**
- **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user**
- **Mixer:live**
- **Mixer:vod**
@ -523,8 +509,8 @@
- **mtg**: MTG services
- **mtv**
- **mtv.de**
- **mtv81**
- **mtv:video**
- **mtvjapan**
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
@ -632,7 +618,6 @@
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **Openload**
- **OraTV**
- **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories
@ -692,7 +677,6 @@
- **PornoXO**
- **PornTube**
- **PressTV**
- **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital
- **puhutv**
- **puhutv:serie**
@ -733,8 +717,6 @@
- **Restudy**
- **Reuters**
- **ReverbNation**
- **revision**
- **revision3:embed**
- **RICE**
- **RMCDecouverte**
- **RockstarGames**
@ -780,10 +762,11 @@
- **Screencast**
- **ScreencastOMatic**
- **scrippsnetworks:watch**
- **SCTE**
- **SCTECourse**
- **Seeker**
- **SenateISVP**
- **SendtoNews**
- **ServingSys**
- **Servus**
- **Sexu**
- **SeznamZpravy**
@ -814,6 +797,7 @@
- **soundcloud:set**
- **soundcloud:trackstation**
- **soundcloud:user**
- **SoundcloudEmbed**
- **soundgasm**
- **soundgasm:profile**
- **southpark.cc.com**
@ -840,7 +824,6 @@
- **Steam**
- **Stitcher**
- **Streamable**
- **Streamango**
- **streamcloud.eu**
- **StreamCZ**
- **StreetVoice**
@ -882,9 +865,11 @@
- **TeleQuebec**
- **TeleQuebecEmission**
- **TeleQuebecLive**
- **TeleQuebecSquat**
- **TeleTask**
- **Telewebion**
- **TennisTV**
- **TenPlay**
- **TF1**
- **TFO**
- **TheIntercept**
@ -923,11 +908,11 @@
- **tunein:topic**
- **TunePk**
- **Turbo**
- **Tutv**
- **tv.dfb.de**
- **TV2**
- **tv2.hu**
- **TV2Article**
- **TV2DK**
- **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+
- **TVA**
@ -964,6 +949,7 @@
- **twitch:vod**
- **twitter**
- **twitter:amplify**
- **twitter:broadcast**
- **twitter:card**
- **udemy**
- **udemy:course**
@ -988,8 +974,6 @@
- **Vbox7**
- **VeeHD**
- **Veoh**
- **verystream**
- **Vessel**
- **Vesti**: Вести.Ru
- **Vevo**
- **VevoPlaylist**
@ -1004,7 +988,6 @@
- **Viddler**
- **Videa**
- **video.google:search**: Google Video search
- **video.mit.edu**
- **VideoDetective**
- **videofy.me**
- **videomore**
@ -1022,7 +1005,6 @@
- **vier:videos**
- **ViewLift**
- **ViewLiftEmbed**
- **Viewster**
- **Viidea**
- **viki**
- **viki:channel**
@ -1088,7 +1070,6 @@
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV
- **Wimp**
- **Wistia**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop**
@ -1097,9 +1078,10 @@
- **WWE**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
- **XFileShare**: XFileShare based sites: ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing
- **XHamster**
- **XHamsterEmbed**
- **XHamsterUser**
- **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
@ -1117,6 +1099,7 @@
- **Yahoo**: Yahoo screen and movies
- **yahoo:gyao**
- **yahoo:gyao:player**
- **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист

View File

@ -123,12 +123,6 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['pbs'])
self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['pbs'])
def test_yahoo_https(self):
# https://github.com/ytdl-org/youtube-dl/issues/2701
self.assertMatch(
'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
['Yahoo'])
def test_no_duplicated_ie_names(self):
name_accu = collections.defaultdict(list)
for ie in self.ies:

View File

@ -19,6 +19,7 @@ from youtube_dl.utils import (
age_restricted,
args_to_str,
encode_base_n,
caesar,
clean_html,
date_from_str,
DateRange,
@ -69,11 +70,13 @@ from youtube_dl.utils import (
remove_start,
remove_end,
remove_quotes,
rot47,
shell_quote,
smuggle_url,
str_to_int,
strip_jsonp,
strip_or_none,
subtitles_filename,
timeconvert,
unescapeHTML,
unified_strdate,
@ -261,6 +264,11 @@ class TestUtil(unittest.TestCase):
self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
def test_subtitles_filename(self):
self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt'), 'abc.en.vtt')
self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt', 'ext'), 'abc.en.vtt')
self.assertEqual(subtitles_filename('abc.unexpected_ext', 'en', 'vtt', 'ext'), 'abc.unexpected_ext.en.vtt')
def test_remove_start(self):
self.assertEqual(remove_start(None, 'A - '), None)
self.assertEqual(remove_start('A - B', 'A - '), 'B')
@ -334,6 +342,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('July 15th, 2013'), '20130715')
self.assertEqual(unified_strdate('September 1st, 2013'), '20130901')
self.assertEqual(unified_strdate('Sep 2nd, 2013'), '20130902')
self.assertEqual(unified_strdate('November 3rd, 2019'), '20191103')
self.assertEqual(unified_strdate('October 23rd, 2005'), '20051023')
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@ -489,6 +499,7 @@ class TestUtil(unittest.TestCase):
def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456)
self.assertEqual(str_to_int('123.456'), 123456)
self.assertEqual(str_to_int(523), 523)
def test_url_basename(self):
self.assertEqual(url_basename('http://foo.de/'), '')
@ -1361,6 +1372,20 @@ Line 1
self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
def test_caesar(self):
self.assertEqual(caesar('ace', 'abcdef', 2), 'cea')
self.assertEqual(caesar('cea', 'abcdef', -2), 'ace')
self.assertEqual(caesar('ace', 'abcdef', -2), 'eac')
self.assertEqual(caesar('eac', 'abcdef', 2), 'ace')
self.assertEqual(caesar('ace', 'abcdef', 0), 'ace')
self.assertEqual(caesar('xyz', 'abcdef', 2), 'xyz')
self.assertEqual(caesar('abc', 'acegik', 2), 'ebg')
self.assertEqual(caesar('ebg', 'acegik', -2), 'abc')
def test_rot47(self):
self.assertEqual(rot47('youtube-dl'), r'J@FEF36\5=')
self.assertEqual(rot47('YOUTUBE-DL'), r'*~&%&qt\s{')
def test_urshift(self):
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)

View File

@ -852,8 +852,9 @@ class YoutubeDL(object):
extract_flat = self.params.get('extract_flat', False)
if ((extract_flat == 'in_playlist' and 'playlist' in extra_info)
or extract_flat is True):
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(ie_result))
self.__forced_printings(
ie_result, self.prepare_filename(ie_result),
incomplete=True)
return ie_result
if result_type == 'video':
@ -1693,6 +1694,36 @@ class YoutubeDL(object):
subs[lang] = f
return subs
def __forced_printings(self, info_dict, filename, incomplete):
def print_mandatory(field):
if (self.params.get('force%s' % field, False)
and (not incomplete or info_dict.get(field) is not None)):
self.to_stdout(info_dict[field])
def print_optional(field):
if (self.params.get('force%s' % field, False)
and info_dict.get(field) is not None):
self.to_stdout(info_dict[field])
print_mandatory('title')
print_mandatory('id')
if self.params.get('forceurl', False) and not incomplete:
if info_dict.get('requested_formats') is not None:
for f in info_dict['requested_formats']:
self.to_stdout(f['url'] + f.get('play_path', ''))
else:
# For RTMP URLs, also include the playpath
self.to_stdout(info_dict['url'] + info_dict.get('play_path', ''))
print_optional('thumbnail')
print_optional('description')
if self.params.get('forcefilename', False) and filename is not None:
self.to_stdout(filename)
if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
self.to_stdout(formatSeconds(info_dict['duration']))
print_mandatory('format')
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(info_dict))
def process_info(self, info_dict):
"""Process a single resolved IE result."""
@ -1703,9 +1734,8 @@ class YoutubeDL(object):
if self._num_downloads >= int(max_downloads):
raise MaxDownloadsReached()
# TODO: backward compatibility, to be removed
info_dict['fulltitle'] = info_dict['title']
if len(info_dict['title']) > 200:
info_dict['title'] = info_dict['title'][:197] + '...'
if 'format' not in info_dict:
info_dict['format'] = info_dict['ext']
@ -1720,29 +1750,7 @@ class YoutubeDL(object):
info_dict['_filename'] = filename = self.prepare_filename(info_dict)
# Forced printings
if self.params.get('forcetitle', False):
self.to_stdout(info_dict['fulltitle'])
if self.params.get('forceid', False):
self.to_stdout(info_dict['id'])
if self.params.get('forceurl', False):
if info_dict.get('requested_formats') is not None:
for f in info_dict['requested_formats']:
self.to_stdout(f['url'] + f.get('play_path', ''))
else:
# For RTMP URLs, also include the playpath
self.to_stdout(info_dict['url'] + info_dict.get('play_path', ''))
if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
self.to_stdout(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
self.to_stdout(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None:
self.to_stdout(filename)
if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
self.to_stdout(formatSeconds(info_dict['duration']))
if self.params.get('forceformat', False):
self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(info_dict))
self.__forced_printings(info_dict, filename, incomplete=False)
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
@ -1783,6 +1791,8 @@ class YoutubeDL(object):
annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present')
elif not info_dict.get('annotations'):
self.report_warning('There are no annotations to write.')
else:
try:
self.to_screen('[info] Writing video annotations to: ' + annofn)
@ -1804,7 +1814,7 @@ class YoutubeDL(object):
ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
else:

View File

@ -94,7 +94,7 @@ def _real_main(argv=None):
if opts.verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
except IOError:
sys.exit('ERROR: batch file could not be read')
sys.exit('ERROR: batch file %s could not be read' % opts.batchfile)
all_urls = batch_urls + [url.strip() for url in args] # batch_urls are already striped in read_batch_urls
_enc = preferredencoding()
all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]

View File

@ -53,7 +53,7 @@ class DashSegmentsFD(FragmentFD):
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.

View File

@ -194,6 +194,7 @@ class Aria2cFD(ExternalFD):
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += ['--', info_dict['url']]
return cmd

View File

@ -190,12 +190,13 @@ class FragmentFD(FileDownloader):
})
def _start_frag_download(self, ctx):
resume_len = ctx['complete_frags_downloaded_bytes']
total_frags = ctx['total_frags']
# This dict stores the download progress, it's updated by the progress
# hook
state = {
'status': 'downloading',
'downloaded_bytes': ctx['complete_frags_downloaded_bytes'],
'downloaded_bytes': resume_len,
'fragment_index': ctx['fragment_index'],
'fragment_count': total_frags,
'filename': ctx['filename'],
@ -234,8 +235,8 @@ class FragmentFD(FileDownloader):
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
if not ctx['live']:
state['eta'] = self.calc_eta(
start, time_now, estimated_size,
state['downloaded_bytes'])
start, time_now, estimated_size - resume_len,
state['downloaded_bytes'] - resume_len)
state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed']
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes

View File

@ -64,7 +64,7 @@ class HlsFD(FragmentFD):
s = urlh.read().decode('utf-8', 'ignore')
if not self.can_download(s, info_dict):
if info_dict.get('extra_param_to_segment_url'):
if info_dict.get('extra_param_to_segment_url') or info_dict.get('_decryption_key_url'):
self.report_error('pycrypto not found. Please install it.')
return False
self.report_warning(
@ -169,7 +169,7 @@ class HlsFD(FragmentFD):
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, decrypt_info['URI'])).read()
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content)

View File

@ -146,7 +146,7 @@ def write_piff_header(stream, params):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete representation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps

View File

@ -4,29 +4,30 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
dict_get,
int_or_none,
parse_iso8601,
try_get,
)
class ABCOTVSIE(InfoExtractor):
IE_NAME = 'abcotvs'
IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_VALID_URL = r'https?://(?P<site>abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:(?:/[^/]+)*/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
'info_dict': {
'id': '472581',
'id': '472548',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4',
'title': 'East Bay museum celebrates vintage synthesizers',
'title': 'East Bay museum celebrates synthesized music',
'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1421123075,
'timestamp': 1421118520,
'upload_date': '20150113',
'uploader': 'Jonathan Bloom',
},
'params': {
# m3u8 download
@ -37,39 +38,63 @@ class ABCOTVSIE(InfoExtractor):
'url': 'http://abc7news.com/472581',
'only_matching': True,
},
{
'url': 'https://6abc.com/man-75-killed-after-being-struck-by-vehicle-in-chester/5725182/',
'only_matching': True,
},
]
_SITE_MAP = {
'6abc': 'wpvi',
'abc11': 'wtvd',
'abc13': 'ktrk',
'abc30': 'kfsn',
'abc7': 'kabc',
'abc7chicago': 'wls',
'abc7news': 'kgo',
'abc7ny': 'wabc',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
site, display_id, video_id = re.match(self._VALID_URL, url).groups()
display_id = display_id or video_id
station = self._SITE_MAP[site]
webpage = self._download_webpage(url, display_id)
data = self._download_json(
'https://api.abcotvs.com/v2/content', display_id, query={
'id': video_id,
'key': 'otv.web.%s.story' % station,
'station': station,
})['data']
video = try_get(data, lambda x: x['featuredMedia']['video'], dict) or data
video_id = compat_str(dict_get(video, ('id', 'publishedKey'), video_id))
title = video.get('title') or video['linkText']
m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
formats = []
m3u8_url = video.get('m3u8')
if m3u8_url:
formats = self._extract_m3u8_formats(
video['m3u8'].split('?')[0], display_id, 'mp4', m3u8_id='hls', fatal=False)
mp4_url = video.get('mp4')
if mp4_url:
formats.append({
'abr': 128,
'format_id': 'https',
'height': 360,
'url': mp4_url,
'width': 640,
})
self._sort_formats(formats)
title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip()
thumbnail = self._og_search_thumbnail(webpage)
timestamp = parse_iso8601(self._search_regex(
r'<div class="meta">\s*<time class="timeago" datetime="([^"]+)">',
webpage, 'upload date', fatal=False))
uploader = self._search_regex(
r'rel="author">([^<]+)</a>',
webpage, 'uploader', default=None)
image = video.get('image') or {}
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader': uploader,
'description': dict_get(video, ('description', 'caption'), try_get(video, lambda x: x['meta']['description'])),
'thumbnail': dict_get(image, ('source', 'dynamicSource')),
'timestamp': int_or_none(video.get('date')),
'duration': int_or_none(video.get('length')),
'formats': formats,
}

View File

@ -1,95 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
qualities,
)
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
'md5': '72954ea10bc979ab5e2eb288b21425a0',
'info_dict': {
'id': '24MR3YO5SAS9',
'ext': 'mp4',
'description': 'One Piece 606',
'title': 'One Piece 606',
},
'skip': 'Video is gone',
}, {
'url': 'http://add-anime.net/video/MDUGWYKNGBD8/One-Piece-687',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
try:
webpage = self._download_webpage(url, video_id)
except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError) or \
ee.cause.code != 503:
raise
redir_webpage = ee.cause.read().decode('utf-8')
action = self._search_regex(
r'<form id="challenge-form" action="([^"]+)"',
redir_webpage, 'Redirect form')
vc = self._search_regex(
r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
redir_webpage, 'redirect vc value')
av = re.search(
r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
redir_webpage)
if av is None:
raise ExtractorError('Cannot find redirect math task')
av_res = int(av.group(1)) + int(av.group(2)) * int(av.group(3))
parsed_url = compat_urllib_parse_urlparse(url)
av_val = av_res + len(parsed_url.netloc)
confirm_url = (
parsed_url.scheme + '://' + parsed_url.netloc
+ action + '?'
+ compat_urllib_parse_urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,
note='Confirming after redirect')
webpage = self._download_webpage(url, video_id)
FORMATS = ('normal', 'hq')
quality = qualities(FORMATS)
formats = []
for format_id in FORMATS:
rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
video_url = self._search_regex(rex, webpage, 'video file URLx',
fatal=False)
if not video_url:
continue
formats.append({
'format_id': format_id,
'url': video_url,
'quality': quality(format_id),
})
self._sort_formats(formats)
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
return {
'_type': 'video',
'id': video_id,
'formats': formats,
'title': video_title,
'description': video_description
}

View File

@ -1,25 +1,119 @@
from __future__ import unicode_literals
import functools
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
parse_duration,
unified_strdate,
str_to_int,
int_or_none,
float_or_none,
int_or_none,
ISO639Utils,
determine_ext,
OnDemandPagedList,
parse_duration,
str_or_none,
str_to_int,
unified_strdate,
)
class AdobeTVBaseIE(InfoExtractor):
_API_BASE_URL = 'http://tv.adobe.com/api/v4/'
def _call_api(self, path, video_id, query, note=None):
return self._download_json(
'http://tv.adobe.com/api/v4/' + path,
video_id, note, query=query)['data']
def _parse_subtitles(self, video_data, url_key):
subtitles = {}
for translation in video_data.get('translations', []):
vtt_path = translation.get(url_key)
if not vtt_path:
continue
lang = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
subtitles.setdefault(lang, []).append({
'ext': 'vtt',
'url': vtt_path,
})
return subtitles
def _parse_video_data(self, video_data):
video_id = compat_str(video_data['id'])
title = video_data['title']
s3_extracted = False
formats = []
for source in video_data.get('videos', []):
source_url = source.get('url')
if not source_url:
continue
f = {
'format_id': source.get('quality_level'),
'fps': int_or_none(source.get('frame_rate')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('video_data_rate')),
'width': int_or_none(source.get('width')),
'url': source_url,
}
original_filename = source.get('original_filename')
if original_filename:
if not (f.get('height') and f.get('width')):
mobj = re.search(r'_(\d+)x(\d+)', original_filename)
if mobj:
f.update({
'height': int(mobj.group(2)),
'width': int(mobj.group(1)),
})
if original_filename.startswith('s3://') and not s3_extracted:
formats.append({
'format_id': 'original',
'preference': 1,
'url': original_filename.replace('s3://', 'https://s3.amazonaws.com/'),
})
s3_extracted = True
formats.append(f)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnail'),
'upload_date': unified_strdate(video_data.get('start_date')),
'duration': parse_duration(video_data.get('duration')),
'view_count': str_to_int(video_data.get('playcount')),
'formats': formats,
'subtitles': self._parse_subtitles(video_data, 'vtt'),
}
class AdobeTVEmbedIE(AdobeTVBaseIE):
IE_NAME = 'adobetv:embed'
_VALID_URL = r'https?://tv\.adobe\.com/embed/\d+/(?P<id>\d+)'
_TEST = {
'url': 'https://tv.adobe.com/embed/22/4153',
'md5': 'c8c0461bf04d54574fc2b4d07ac6783a',
'info_dict': {
'id': '4153',
'ext': 'flv',
'title': 'Creating Graphics Optimized for BlackBerry',
'description': 'md5:eac6e8dced38bdaae51cd94447927459',
'thumbnail': r're:https?://.*\.jpg$',
'upload_date': '20091109',
'duration': 377,
'view_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._call_api(
'episode/' + video_id, video_id, {'disclosure': 'standard'})[0]
return self._parse_video_data(video_data)
class AdobeTVIE(AdobeTVBaseIE):
IE_NAME = 'adobetv'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?watch/(?P<show_urlname>[^/]+)/(?P<id>[^/]+)'
_TEST = {
@ -42,45 +136,33 @@ class AdobeTVIE(AdobeTVBaseIE):
if not language:
language = 'en'
video_data = self._download_json(
self._API_BASE_URL + 'episode/get/?language=%s&show_urlname=%s&urlname=%s&disclosure=standard' % (language, show_urlname, urlname),
urlname)['data'][0]
formats = [{
'url': source['url'],
'format_id': source.get('quality_level') or source['url'].split('-')[-1].split('.')[0] or None,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('video_data_rate')),
} for source in video_data['videos']]
self._sort_formats(formats)
return {
'id': compat_str(video_data['id']),
'title': video_data['title'],
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnail'),
'upload_date': unified_strdate(video_data.get('start_date')),
'duration': parse_duration(video_data.get('duration')),
'view_count': str_to_int(video_data.get('playcount')),
'formats': formats,
}
video_data = self._call_api(
'episode/get', urlname, {
'disclosure': 'standard',
'language': language,
'show_urlname': show_urlname,
'urlname': urlname,
})[0]
return self._parse_video_data(video_data)
class AdobeTVPlaylistBaseIE(AdobeTVBaseIE):
def _parse_page_data(self, page_data):
return [self.url_result(self._get_element_url(element_data)) for element_data in page_data]
_PAGE_SIZE = 25
def _extract_playlist_entries(self, url, display_id):
page = self._download_json(url, display_id)
entries = self._parse_page_data(page['data'])
for page_num in range(2, page['paging']['pages'] + 1):
entries.extend(self._parse_page_data(
self._download_json(url + '&page=%d' % page_num, display_id)['data']))
return entries
def _fetch_page(self, display_id, query, page):
page += 1
query['page'] = page
for element_data in self._call_api(
self._RESOURCE, display_id, query, 'Download Page %d' % page):
yield self._process_data(element_data)
def _extract_playlist_entries(self, display_id, query):
return OnDemandPagedList(functools.partial(
self._fetch_page, display_id, query), self._PAGE_SIZE)
class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
IE_NAME = 'adobetv:show'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?show/(?P<id>[^/]+)'
_TEST = {
@ -92,26 +174,31 @@ class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
},
'playlist_mincount': 136,
}
def _get_element_url(self, element_data):
return element_data['urls'][0]
_RESOURCE = 'episode'
_process_data = AdobeTVBaseIE._parse_video_data
def _real_extract(self, url):
language, show_urlname = re.match(self._VALID_URL, url).groups()
if not language:
language = 'en'
query = 'language=%s&show_urlname=%s' % (language, show_urlname)
query = {
'disclosure': 'standard',
'language': language,
'show_urlname': show_urlname,
}
show_data = self._download_json(self._API_BASE_URL + 'show/get/?%s' % query, show_urlname)['data'][0]
show_data = self._call_api(
'show/get', show_urlname, query)[0]
return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'episode/?%s' % query, show_urlname),
compat_str(show_data['id']),
show_data['show_name'],
show_data['show_description'])
self._extract_playlist_entries(show_urlname, query),
str_or_none(show_data.get('id')),
show_data.get('show_name'),
show_data.get('show_description'))
class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
IE_NAME = 'adobetv:channel'
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?channel/(?P<id>[^/]+)(?:/(?P<category_urlname>[^/]+))?'
_TEST = {
@ -121,24 +208,30 @@ class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
},
'playlist_mincount': 96,
}
_RESOURCE = 'show'
def _get_element_url(self, element_data):
return element_data['url']
def _process_data(self, show_data):
return self.url_result(
show_data['url'], 'AdobeTVShow', str_or_none(show_data.get('id')))
def _real_extract(self, url):
language, channel_urlname, category_urlname = re.match(self._VALID_URL, url).groups()
if not language:
language = 'en'
query = 'language=%s&channel_urlname=%s' % (language, channel_urlname)
query = {
'channel_urlname': channel_urlname,
'language': language,
}
if category_urlname:
query += '&category_urlname=%s' % category_urlname
query['category_urlname'] = category_urlname
return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'show/?%s' % query, channel_urlname),
self._extract_playlist_entries(channel_urlname, query),
channel_urlname)
class AdobeTVVideoIE(InfoExtractor):
class AdobeTVVideoIE(AdobeTVBaseIE):
IE_NAME = 'adobetv:video'
_VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)'
_TEST = {
@ -160,38 +253,36 @@ class AdobeTVVideoIE(InfoExtractor):
video_data = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
title = video_data['title']
formats = [{
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),
'url': source['src'],
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('bitrate')),
} for source in video_data['sources']]
formats = []
sources = video_data.get('sources') or []
for source in sources:
source_src = source.get('src')
if not source_src:
continue
formats.append({
'filesize': int_or_none(source.get('kilobytes') or None, invscale=1000),
'format_id': '-'.join(filter(None, [source.get('format'), source.get('label')])),
'height': int_or_none(source.get('height') or None),
'tbr': int_or_none(source.get('bitrate') or None),
'width': int_or_none(source.get('width') or None),
'url': source_src,
})
self._sort_formats(formats)
# For both metadata and downloaded files the duration varies among
# formats. I just pick the max one
duration = max(filter(None, [
float_or_none(source.get('duration'), scale=1000)
for source in video_data['sources']]))
subtitles = {}
for translation in video_data.get('translations', []):
lang_id = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
if lang_id not in subtitles:
subtitles[lang_id] = []
subtitles[lang_id].append({
'url': translation['vttPath'],
'ext': 'vtt',
})
for source in sources]))
return {
'id': video_id,
'formats': formats,
'title': video_data['title'],
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data['video'].get('poster'),
'thumbnail': video_data.get('video', {}).get('poster'),
'duration': duration,
'subtitles': subtitles,
'subtitles': self._parse_subtitles(video_data, 'vttPath'),
}

View File

@ -1,202 +1,118 @@
# coding: utf-8
from __future__ import unicode_literals
import time
import hmac
import hashlib
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
xpath_text,
)
class AtresPlayerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?atresplayer\.com/television/[^/]+/[^/]+/[^/]+/(?P<id>.+?)_\d+\.html'
_VALID_URL = r'https?://(?:www\.)?atresplayer\.com/[^/]+/[^/]+/[^/]+/[^/]+/(?P<display_id>.+?)_(?P<id>[0-9a-f]{24})'
_NETRC_MACHINE = 'atresplayer'
_TESTS = [
{
'url': 'http://www.atresplayer.com/television/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_2014122100174.html',
'md5': 'efd56753cda1bb64df52a3074f62e38a',
'url': 'https://www.atresplayer.com/antena3/series/pequenas-coincidencias/temporada-1/capitulo-7-asuntos-pendientes_5d4aa2c57ed1a88fc715a615/',
'info_dict': {
'id': 'capitulo-10-especial-solidario-nochebuena',
'id': '5d4aa2c57ed1a88fc715a615',
'ext': 'mp4',
'title': 'Especial Solidario de Nochebuena',
'description': 'md5:e2d52ff12214fa937107d21064075bf1',
'duration': 5527.6,
'thumbnail': r're:^https?://.*\.jpg$',
'title': 'Capítulo 7: Asuntos pendientes',
'description': 'md5:7634cdcb4d50d5381bedf93efb537fbc',
'duration': 3413,
},
'params': {
'format': 'bestvideo',
},
'skip': 'This video is only available for registered users'
},
{
'url': 'http://www.atresplayer.com/television/especial/videoencuentros/temporada-1/capitulo-112-david-bustamante_2014121600375.html',
'md5': '6e52cbb513c405e403dbacb7aacf8747',
'info_dict': {
'id': 'capitulo-112-david-bustamante',
'ext': 'flv',
'title': 'David Bustamante',
'description': 'md5:f33f1c0a05be57f6708d4dd83a3b81c6',
'duration': 1439.0,
'thumbnail': r're:^https?://.*\.jpg$',
},
'url': 'https://www.atresplayer.com/lasexta/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_5ad08edf986b2855ed47adc4/',
'only_matching': True,
},
{
'url': 'http://www.atresplayer.com/television/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_2014122400174.html',
'url': 'https://www.atresplayer.com/antena3/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_5ad51046986b2886722ccdea/',
'only_matching': True,
},
]
_USER_AGENT = 'Dalvik/1.6.0 (Linux; U; Android 4.3; GT-I9300 Build/JSS15J'
_MAGIC = 'QWtMLXs414Yo+c#_+Q#K@NN)'
_TIMESTAMP_SHIFT = 30000
_TIME_API_URL = 'http://servicios.atresplayer.com/api/admin/time.json'
_URL_VIDEO_TEMPLATE = 'https://servicios.atresplayer.com/api/urlVideo/{1}/{0}/{1}|{2}|{3}.json'
_PLAYER_URL_TEMPLATE = 'https://servicios.atresplayer.com/episode/getplayer.json?episodePk=%s'
_EPISODE_URL_TEMPLATE = 'http://www.atresplayer.com/episodexml/%s'
_LOGIN_URL = 'https://servicios.atresplayer.com/j_spring_security_check'
_ERRORS = {
'UNPUBLISHED': 'We\'re sorry, but this video is not yet available.',
'DELETED': 'This video has expired and is no longer available for online streaming.',
'GEOUNPUBLISHED': 'We\'re sorry, but this video is not available in your region due to right restrictions.',
# 'PREMIUM': 'PREMIUM',
}
_API_BASE = 'https://api.atresplayer.com/'
def _real_initialize(self):
self._login()
def _handle_error(self, e, code):
if isinstance(e.cause, compat_HTTPError) and e.cause.code == code:
error = self._parse_json(e.cause.read(), None)
if error.get('error') == 'required_registered':
self.raise_login_required()
raise ExtractorError(error['error_description'], expected=True)
raise
def _login(self):
username, password = self._get_login_info()
if username is None:
return
login_form = {
'j_username': username,
'j_password': password,
}
self._request_webpage(
self._API_BASE + 'login', None, 'Downloading login page')
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
request, None, 'Logging in')
try:
target_url = self._download_json(
'https://account.atresmedia.com/api/login', None,
'Logging in', headers={
'Content-Type': 'application/x-www-form-urlencoded'
}, data=urlencode_postdata({
'username': username,
'password': password,
}))['targetUrl']
except ExtractorError as e:
self._handle_error(e, 400)
error = self._html_search_regex(
r'(?s)<ul[^>]+class="[^"]*\blist_error\b[^"]*">(.+?)</ul>',
response, 'error', default=None)
if error:
raise ExtractorError(
'Unable to login: %s' % error, expected=True)
self._request_webpage(target_url, None, 'Following Target URL')
def _real_extract(self, url):
video_id = self._match_id(url)
display_id, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
try:
episode = self._download_json(
self._API_BASE + 'client/v1/player/episode/' + video_id, video_id)
except ExtractorError as e:
self._handle_error(e, 403)
episode_id = self._search_regex(
r'episode="([^"]+)"', webpage, 'episode id')
request = sanitized_Request(
self._PLAYER_URL_TEMPLATE % episode_id,
headers={'User-Agent': self._USER_AGENT})
player = self._download_json(request, episode_id, 'Downloading player JSON')
episode_type = player.get('typeOfEpisode')
error_message = self._ERRORS.get(episode_type)
if error_message:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
title = episode['titulo']
formats = []
video_url = player.get('urlVideo')
if video_url:
format_info = {
'url': video_url,
'format_id': 'http',
}
mobj = re.search(r'(?P<bitrate>\d+)K_(?P<width>\d+)x(?P<height>\d+)', video_url)
if mobj:
format_info.update({
'width': int_or_none(mobj.group('width')),
'height': int_or_none(mobj.group('height')),
'tbr': int_or_none(mobj.group('bitrate')),
})
formats.append(format_info)
timestamp = int_or_none(self._download_webpage(
self._TIME_API_URL,
video_id, 'Downloading timestamp', fatal=False), 1000, time.time())
timestamp_shifted = compat_str(timestamp + self._TIMESTAMP_SHIFT)
token = hmac.new(
self._MAGIC.encode('ascii'),
(episode_id + timestamp_shifted).encode('utf-8'), hashlib.md5
).hexdigest()
request = sanitized_Request(
self._URL_VIDEO_TEMPLATE.format('windows', episode_id, timestamp_shifted, token),
headers={'User-Agent': self._USER_AGENT})
fmt_json = self._download_json(
request, video_id, 'Downloading windows video JSON')
result = fmt_json.get('resultDes')
if result.lower() != 'ok':
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, result), expected=True)
for format_id, video_url in fmt_json['resultObject'].items():
if format_id == 'token' or not video_url.startswith('http'):
for source in episode.get('sources', []):
src = source.get('src')
if not src:
continue
if 'geodeswowsmpra3player' in video_url:
# f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
# f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
# this videos are protected by DRM, the f4m downloader doesn't support them
continue
video_url_hd = video_url.replace('free_es', 'es')
formats.extend(self._extract_f4m_formats(
video_url_hd[:-9] + '/manifest.f4m', video_id, f4m_id='hds',
fatal=False))
formats.extend(self._extract_mpd_formats(
video_url_hd[:-9] + '/manifest.mpd', video_id, mpd_id='dash',
fatal=False))
src_type = source.get('type')
if src_type == 'application/vnd.apple.mpegurl':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif src_type == 'application/dash+xml':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
path_data = player.get('pathData')
episode = self._download_xml(
self._EPISODE_URL_TEMPLATE % path_data, video_id,
'Downloading episode XML')
duration = float_or_none(xpath_text(
episode, './media/asset/info/technical/contentDuration', 'duration'))
art = episode.find('./media/asset/info/art')
title = xpath_text(art, './name', 'title')
description = xpath_text(art, './description', 'description')
thumbnail = xpath_text(episode, './media/asset/files/background', 'thumbnail')
subtitles = {}
subtitle_url = xpath_text(episode, './media/asset/files/subtitle', 'subtitle')
if subtitle_url:
subtitles['es'] = [{
'ext': 'srt',
'url': subtitle_url,
}]
heartbeat = episode.get('heartbeat') or {}
omniture = episode.get('omniture') or {}
get_meta = lambda x: heartbeat.get(x) or omniture.get(x)
return {
'display_id': display_id,
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'description': episode.get('descripcion'),
'thumbnail': episode.get('imgPoster'),
'duration': int_or_none(episode.get('duration')),
'formats': formats,
'subtitles': subtitles,
'channel': get_meta('channel'),
'season': get_meta('season'),
'episode_number': int_or_none(get_meta('episodeNumber')),
}

View File

@ -2,22 +2,25 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import float_or_none
from ..utils import (
clean_html,
float_or_none,
)
class AudioBoomIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audioboom\.com/(?:boos|posts)/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0',
'md5': '63a8d73a055c6ed0f1e51921a10a5a76',
'url': 'https://audioboom.com/posts/7398103-asim-chaudhry',
'md5': '7b00192e593ff227e6a315486979a42d',
'info_dict': {
'id': '4279833',
'id': '7398103',
'ext': 'mp3',
'title': '3/09/2016 Czaban Hour 3',
'description': 'Guest: Nate Davis - NFL free agency, Guest: Stan Gans',
'duration': 2245.72,
'uploader': 'SB Nation A.M.',
'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
'title': 'Asim Chaudhry',
'description': 'md5:2f3fef17dacc2595b5362e1d7d3602fc',
'duration': 4000.99,
'uploader': 'Sue Perkins: An hour or so with...',
'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/perkins',
}
}, {
'url': 'https://audioboom.com/posts/4279833-3-09-2016-czaban-hour-3?t=0',
@ -32,8 +35,8 @@ class AudioBoomIE(InfoExtractor):
clip = None
clip_store = self._parse_json(
self._search_regex(
r'data-new-clip-store=(["\'])(?P<json>{.*?"clipId"\s*:\s*%s.*?})\1' % video_id,
self._html_search_regex(
r'data-new-clip-store=(["\'])(?P<json>{.+?})\1',
webpage, 'clip store', default='{}', group='json'),
video_id, fatal=False)
if clip_store:
@ -47,14 +50,15 @@ class AudioBoomIE(InfoExtractor):
audio_url = from_clip('clipURLPriorToLoading') or self._og_search_property(
'audio', webpage, 'audio url')
title = from_clip('title') or self._og_search_title(webpage)
description = from_clip('description') or self._og_search_description(webpage)
title = from_clip('title') or self._html_search_meta(
['og:title', 'og:audio:title', 'audio_title'], webpage)
description = from_clip('description') or clean_html(from_clip('formattedDescription')) or self._og_search_description(webpage)
duration = float_or_none(from_clip('duration') or self._html_search_meta(
'weibo:audio:duration', webpage))
uploader = from_clip('author') or self._og_search_property(
'audio:artist', webpage, 'uploader', fatal=False)
uploader = from_clip('author') or self._html_search_meta(
['og:audio:artist', 'twitter:audio:artist_name', 'audio_artist'], webpage, 'uploader')
uploader_url = from_clip('author_url') or self._html_search_meta(
'audioboo:channel', webpage, 'uploader url')

View File

@ -1,142 +0,0 @@
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
)
class BambuserIE(InfoExtractor):
IE_NAME = 'bambuser'
_VALID_URL = r'https?://bambuser\.com/v/(?P<id>\d+)'
_API_KEY = '005f64509e19a868399060af746a00aa'
_LOGIN_URL = 'https://bambuser.com/user'
_NETRC_MACHINE = 'bambuser'
_TEST = {
'url': 'http://bambuser.com/v/4050584',
# MD5 seems to be flaky, see https://travis-ci.org/ytdl-org/youtube-dl/jobs/14051016#L388
# 'md5': 'fba8f7693e48fd4e8641b3fd5539a641',
'info_dict': {
'id': '4050584',
'ext': 'flv',
'title': 'Education engineering days - lightning talks',
'duration': 3741,
'uploader': 'pixelversity',
'uploader_id': '344706',
'timestamp': 1382976692,
'upload_date': '20131028',
'view_count': int,
},
'params': {
# It doesn't respect the 'Range' header, it would download the whole video
# caused the travis builds to fail: https://travis-ci.org/ytdl-org/youtube-dl/jobs/14493845#L59
'skip_download': True,
},
}
def _login(self):
username, password = self._get_login_info()
if username is None:
return
login_form = {
'form_id': 'user_login',
'op': 'Log in',
'name': username,
'pass': password,
}
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
request, None, 'Logging in')
login_error = self._html_search_regex(
r'(?s)<div class="messages error">(.+?)</div>',
response, 'login error', default=None)
if login_error:
raise ExtractorError(
'Unable to login: %s' % login_error, expected=True)
def _real_initialize(self):
self._login()
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(
'http://player-c.api.bambuser.com/getVideo.json?api_key=%s&vid=%s'
% (self._API_KEY, video_id), video_id)
error = info.get('error')
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error), expected=True)
result = info['result']
return {
'id': video_id,
'title': result['title'],
'url': result['url'],
'thumbnail': result.get('preview'),
'duration': int_or_none(result.get('length')),
'uploader': result.get('username'),
'uploader_id': compat_str(result.get('owner', {}).get('uid')),
'timestamp': int_or_none(result.get('created')),
'fps': float_or_none(result.get('framerate')),
'view_count': int_or_none(result.get('views_total')),
'comment_count': int_or_none(result.get('comment_count')),
}
class BambuserChannelIE(InfoExtractor):
IE_NAME = 'bambuser:channel'
_VALID_URL = r'https?://bambuser\.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
# The maximum number we can get with each request
_STEP = 50
_TEST = {
'url': 'http://bambuser.com/channel/pixelversity',
'info_dict': {
'title': 'pixelversity',
},
'playlist_mincount': 60,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
urls = []
last_id = ''
for i in itertools.count(1):
req_url = (
'http://bambuser.com/xhr-api/index.php?username={user}'
'&sort=created&access_mode=0%2C1%2C2&limit={count}'
'&method=broadcast&format=json&vid_older_than={last}'
).format(user=user, count=self._STEP, last=last_id)
req = sanitized_Request(req_url)
# Without setting this header, we wouldn't get any result
req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
data = self._download_json(
req, user, 'Downloading page %d' % i)
results = data['result']
if not results:
break
last_id = results[-1]['vid']
urls.extend(self.url_result(v['page'], 'Bambuser') for v in results)
return {
'_type': 'playlist',
'title': user,
'entries': urls,
}

View File

@ -40,6 +40,7 @@ class BBCCoUkIE(InfoExtractor):
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/(?:clips|audiovideo/popular)[/#]|
radio/player/|
sounds/play/|
events/[^/]+/play/[^/]+/
)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
@ -70,7 +71,7 @@ class BBCCoUkIE(InfoExtractor):
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
},
'params': {
@ -220,6 +221,20 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
}, {
'url': 'https://www.bbc.co.uk/sounds/play/m0007jzb',
'note': 'Audio',
'info_dict': {
'id': 'm0007jz9',
'ext': 'mp4',
'title': 'BBC Proms, 2019, Prom 34: WestEastern Divan Orchestra',
'description': "Live BBC Proms. WestEastern Divan Orchestra with Daniel Barenboim and Martha Argerich.",
'duration': 9840,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
@ -609,7 +624,7 @@ class BBCIE(BBCCoUkIE):
'url': 'http://www.bbc.com/news/world-europe-32668511',
'info_dict': {
'id': 'world-europe-32668511',
'title': 'Russia stages massive WW2 parade despite Western boycott',
'title': 'Russia stages massive WW2 parade',
'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
},
'playlist_count': 2,

View File

@ -22,7 +22,8 @@ class BellMediaIE(InfoExtractor):
bravo|
mtv|
space|
etalk
etalk|
marilyn
)\.ca|
much\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
@ -70,6 +71,7 @@ class BellMediaIE(InfoExtractor):
'animalplanet': 'aniplan',
'etalk': 'ctv',
'bnnbloomberg': 'bnn',
'marilyn': 'ctv_marilyn',
}
def _real_extract(self, url):

View File

@ -15,6 +15,7 @@ from ..utils import (
float_or_none,
parse_iso8601,
smuggle_url,
str_or_none,
strip_jsonp,
unified_timestamp,
unsmuggle_url,
@ -306,3 +307,115 @@ class BiliBiliBangumiIE(InfoExtractor):
return self.playlist_result(
entries, bangumi_id,
season_info.get('bangumi_title'), season_info.get('evaluate'))
class BilibiliAudioBaseIE(InfoExtractor):
def _call_api(self, path, sid, query=None):
if not query:
query = {'sid': sid}
return self._download_json(
'https://www.bilibili.com/audio/music-service-c/web/' + path,
sid, query=query)['data']
class BilibiliAudioIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/au(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/au1003142',
'md5': 'fec4987014ec94ef9e666d4d158ad03b',
'info_dict': {
'id': '1003142',
'ext': 'm4a',
'title': '【tsukimi】YELLOW / 神山羊',
'artist': 'tsukimi',
'comment_count': int,
'description': 'YELLOW的mp3版',
'duration': 183,
'subtitles': {
'origin': [{
'ext': 'lrc',
}],
},
'thumbnail': r're:^https?://.+\.jpg',
'timestamp': 1564836614,
'upload_date': '20190803',
'uploader': 'tsukimi-つきみぐー',
'view_count': int,
},
}
def _real_extract(self, url):
au_id = self._match_id(url)
play_data = self._call_api('url', au_id)
formats = [{
'url': play_data['cdns'][0],
'filesize': int_or_none(play_data.get('size')),
}]
song = self._call_api('song/info', au_id)
title = song['title']
statistic = song.get('statistic') or {}
subtitles = None
lyric = song.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}]
}
return {
'id': au_id,
'title': title,
'formats': formats,
'artist': song.get('author'),
'comment_count': int_or_none(statistic.get('comment')),
'description': song.get('intro'),
'duration': int_or_none(song.get('duration')),
'subtitles': subtitles,
'thumbnail': song.get('cover'),
'timestamp': int_or_none(song.get('passtime')),
'uploader': song.get('uname'),
'view_count': int_or_none(statistic.get('play')),
}
class BilibiliAudioAlbumIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/am(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/am10624',
'info_dict': {
'id': '10624',
'title': '每日新曲推荐每日11:00更新',
'description': '每天11:00更新为你推送最新音乐',
},
'playlist_count': 19,
}
def _real_extract(self, url):
am_id = self._match_id(url)
songs = self._call_api(
'song/of-menu', am_id, {'sid': am_id, 'pn': 1, 'ps': 100})['data']
entries = []
for song in songs:
sid = str_or_none(song.get('id'))
if not sid:
continue
entries.append(self.url_result(
'https://www.bilibili.com/audio/au' + sid,
BilibiliAudioIE.ie_key(), sid))
if entries:
album_data = self._call_api('menu/info', am_id) or {}
album_title = album_data.get('title')
if album_title:
for entry in entries:
entry['album'] = album_title
return self.playlist_result(
entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id)

View File

@ -7,6 +7,7 @@ import re
from .common import InfoExtractor
from ..utils import (
orderedSet,
unified_strdate,
urlencode_postdata,
)
@ -23,6 +24,7 @@ class BitChuteIE(InfoExtractor):
'description': 'md5:3f21f6fb5b1d17c3dee9cf6b5fe60b3a',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Victoria X Rave',
'upload_date': '20170813',
},
}, {
'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/',
@ -74,12 +76,17 @@ class BitChuteIE(InfoExtractor):
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>'),
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'class=["\']video-publish-date[^>]+>[^<]+ at \d+:\d+ UTC on (.+?)\.',
webpage, 'upload date', fatal=False))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
'formats': formats,
}

View File

@ -11,8 +11,8 @@ from ..utils import ExtractorError
class BokeCCBaseIE(InfoExtractor):
def _extract_bokecc_formats(self, webpage, video_id, format_id=None):
player_params_str = self._html_search_regex(
r'<(?:script|embed)[^>]+src="http://p\.bokecc\.com/player\?([^"]+)',
webpage, 'player params')
r'<(?:script|embed)[^>]+src=(?P<q>["\'])(?:https?:)?//p\.bokecc\.com/(?:player|flash/player\.swf)\?(?P<query>.+?)(?P=q)',
webpage, 'player params', group='query')
player_params = compat_parse_qs(player_params_str)
@ -36,9 +36,9 @@ class BokeCCIE(BokeCCBaseIE):
_VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{
'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',
'url': 'http://union.bokecc.com/playvideo.bo?vid=E0ABAE9D4F509B189C33DC5901307461&uid=FE644790DE9D154A',
'info_dict': {
'id': 'CD0C5D3C8614B28B_E44D40C15E65EA30',
'id': 'FE644790DE9D154A_E0ABAE9D4F509B189C33DC5901307461',
'ext': 'flv',
'title': 'BokeCC Video',
},

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals
import base64
import json
import re
import struct
@ -19,7 +18,6 @@ from ..compat import (
compat_HTTPError,
)
from ..utils import (
determine_ext,
ExtractorError,
extract_attributes,
find_xpath_attr,
@ -28,18 +26,19 @@ from ..utils import (
js_to_json,
int_or_none,
parse_iso8601,
smuggle_url,
unescapeHTML,
unsmuggle_url,
update_url_query,
clean_html,
mimetype2ext,
UnsupportedError,
)
class BrightcoveLegacyIE(InfoExtractor):
IE_NAME = 'brightcove:legacy'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL = 'http://c.brightcove.com/services/viewer/htmlFederated'
_TESTS = [
{
@ -56,7 +55,8 @@ class BrightcoveLegacyIE(InfoExtractor):
'timestamp': 1368213670,
'upload_date': '20130510',
'uploader_id': '1589608506001',
}
},
'skip': 'The player has been deactivated by the content owner',
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
@ -71,6 +71,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'upload_date': '20120814',
'uploader_id': '1460825906',
},
'skip': 'video not playable',
},
{
# From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/
@ -80,7 +81,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'ext': 'mp4',
'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable',
# 'uploader': 'Mashable',
'timestamp': 1382041798,
'upload_date': '20131017',
'uploader_id': '1130468786001',
@ -125,6 +126,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'id': '3550319591001',
},
'playlist_mincount': 7,
'skip': 'Unsupported URL',
},
{
# playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965)
@ -134,6 +136,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Lesson 08',
},
'playlist_mincount': 10,
'skip': 'Unsupported URL',
},
{
# playerID inferred from bcpid
@ -142,12 +145,6 @@ class BrightcoveLegacyIE(InfoExtractor):
'only_matching': True, # Tested in GenericIE
}
]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod
def _build_brighcove_url(cls, object_str):
@ -239,7 +236,8 @@ class BrightcoveLegacyIE(InfoExtractor):
@classmethod
def _make_brightcove_url(cls, params):
return update_url_query(cls._FEDERATED_URL, params)
return update_url_query(
'http://c.brightcove.com/services/viewer/htmlFederated', params)
@classmethod
def _extract_brightcove_url(cls, webpage):
@ -298,38 +296,12 @@ class BrightcoveLegacyIE(InfoExtractor):
videoPlayer = query.get('@videoPlayer')
if videoPlayer:
# We set the original url as the default 'Referer' header
referer = smuggled_data.get('Referer', url)
referer = query.get('linkBaseURL', [None])[0] or smuggled_data.get('Referer', url)
video_id = videoPlayer[0]
if 'playerID' not in query:
mobj = re.search(r'/bcpid(\d+)', url)
if mobj is not None:
query['playerID'] = [mobj.group(1)]
return self._get_video_info(
videoPlayer[0], query, referer=referer)
elif 'playerKey' in query:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
else:
raise ExtractorError(
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _brightcove_new_url_result(self, publisher_id, video_id):
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
if linkBase is not None:
referer = linkBase[0]
if referer is not None:
headers['Referer'] = referer
webpage = self._download_webpage(self._FEDERATED_URL, video_id, headers=headers, query=query)
error_msg = self._html_search_regex(
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
'error message', default=None)
if error_msg is not None:
publisher_id = query.get('publisherId')
if publisher_id and publisher_id[0].isdigit():
publisher_id = publisher_id[0]
@ -340,6 +312,9 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
player_id = query.get('playerID')
if player_id and player_id[0].isdigit():
headers = {}
if referer:
headers['Referer'] = referer
player_page = self._download_webpage(
'http://link.brightcove.com/services/player/bcpid' + player_id[0],
video_id, headers=headers, fatal=False)
@ -350,136 +325,16 @@ class BrightcoveLegacyIE(InfoExtractor):
if player_key:
enc_pub_id = player_key.split(',')[1].replace('~', '=')
publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0]
if publisher_id:
return self._brightcove_new_url_result(publisher_id, video_id)
raise ExtractorError(
'brightcove said: %s' % error_msg, expected=True)
self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*});', webpage, 'json')
info = json.loads(info)['data']
video_info = info['programmedContent']['videoPlayer']['mediaDTO']
video_info['_youtubedl_adServerURL'] = info.get('adServerURL')
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
info_url = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s' % player_key
playlist_info = self._download_webpage(
info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist')
videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
publisher_id = video_info.get('publisherId')
info = {
'id': video_id,
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
'uploader_id': compat_str(publisher_id) if publisher_id else None,
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
}
renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions:
formats = []
for rend in renditions:
url = rend['defaultURL']
if not url:
continue
ext = None
if rend['remote']:
url_comp = compat_urllib_parse_urlparse(url)
if url_comp.path.endswith('.m3u8'):
formats.extend(
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue
elif 'akamaihd.net' in url_comp.netloc:
# This type of renditions are served through
# akamaihd.net, but they don't use f4m manifests
url = url.replace('control/', '') + '?&v=3.3.0&fp=13&r=FEEFJ&g=RTSJIMBMPFPB'
ext = 'flv'
if ext is None:
ext = determine_ext(url)
tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,
'ext': ext,
'filesize': int_or_none(rend.get('size')) or None,
'tbr': tbr,
}
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
})
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8_native',
})
formats.append(a_format)
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
})
if self._downloader.params.get('include_ads', False):
adServerURL = video_info.get('_youtubedl_adServerURL')
if adServerURL:
ad_info = {
'_type': 'url',
'url': adServerURL,
}
if 'url' in info:
return {
'_type': 'playlist',
'title': info['title'],
'entries': [ad_info, info],
}
else:
return ad_info
if not info.get('url') and not info.get('formats'):
uploader_id = info.get('uploader_id')
if uploader_id:
info.update(self._brightcove_new_url_result(uploader_id, video_id))
else:
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info
if publisher_id:
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
if referer:
brightcove_new_url = smuggle_url(brightcove_new_url, {'referrer': referer})
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
# TODO: figure out if it's possible to extract playlistId from playerKey
# elif 'playerKey' in query:
# player_key = query['playerKey']
# return self._get_playlist_info(player_key[0])
raise UnsupportedError(url)
class BrightcoveNewIE(AdobePassIE):

View File

@ -3,7 +3,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_duration
from ..utils import (
determine_ext,
merge_dicts,
parse_duration,
url_or_none,
)
class BYUtvIE(InfoExtractor):
@ -51,7 +56,7 @@ class BYUtvIE(InfoExtractor):
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
info = self._download_json(
video = self._download_json(
'https://api.byutv.org/api3/catalog/getvideosforcontent',
display_id, query={
'contentid': video_id,
@ -62,7 +67,7 @@ class BYUtvIE(InfoExtractor):
'x-byutv-platformkey': 'xsaaw9c7y5',
})
ep = info.get('ooyalaVOD')
ep = video.get('ooyalaVOD')
if ep:
return {
'_type': 'url_transparent',
@ -75,18 +80,38 @@ class BYUtvIE(InfoExtractor):
'thumbnail': ep.get('imageThumbnail'),
}
ep = info['dvr']
title = ep['title']
formats = self._extract_m3u8_formats(
ep['videoUrl'], video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
info = {}
formats = []
for format_id, ep in video.items():
if not isinstance(ep, dict):
continue
video_url = url_or_none(ep.get('videoUrl'))
if not video_url:
continue
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': video_url,
'format_id': format_id,
})
merge_dicts(info, {
'title': ep.get('title'),
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
'duration': parse_duration(ep.get('length')),
})
self._sort_formats(formats)
return {
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'title': title,
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
'duration': parse_duration(ep.get('length')),
'title': display_id,
'formats': formats,
}
})

View File

@ -147,6 +147,8 @@ class CeskaTelevizeIE(InfoExtractor):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'drmOnly=true' in stream_url:
continue
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', 'm3u8_native',

View File

@ -32,7 +32,7 @@ class Channel9IE(InfoExtractor):
'upload_date': '20130828',
'session_code': 'KOS002',
'session_room': 'Arena 1A',
'session_speakers': ['Andrew Coates', 'Brady Gaster', 'Mads Kristensen', 'Ed Blankenship', 'Patrick Klug'],
'session_speakers': 'count:5',
},
}, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
@ -64,15 +64,15 @@ class Channel9IE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_mincount': 100,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'info_dict': {
'id': 'Events/DEVintersection/DEVintersection-2016',
'title': 'DEVintersection 2016 Orlando Sessions',
},
'playlist_mincount': 14,
}, {
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'only_matching': True,
}, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
@ -112,11 +112,11 @@ class Channel9IE(InfoExtractor):
episode_data), content_path)
content_id = episode_data['contentId']
is_session = '/Sessions(' in episode_data['api']
content_url = 'https://channel9.msdn.com/odata' + episode_data['api']
content_url = 'https://channel9.msdn.com/odata' + episode_data['api'] + '?$select=Captions,CommentCount,MediaLengthInSeconds,PublishedDate,Rating,RatingCount,Title,VideoMP4High,VideoMP4Low,VideoMP4Medium,VideoPlayerPreviewImage,VideoWMV,VideoWMVHQ,Views,'
if is_session:
content_url += '?$expand=Speakers'
content_url += 'Code,Description,Room,Slides,Speakers,ZipFile&$expand=Speakers'
else:
content_url += '?$expand=Authors'
content_url += 'Authors,Body&$expand=Authors'
content_data = self._download_json(content_url, content_id)
title = content_data['Title']
@ -210,7 +210,7 @@ class Channel9IE(InfoExtractor):
'id': content_id,
'title': title,
'description': clean_html(content_data.get('Description') or content_data.get('Body')),
'thumbnail': content_data.get('Thumbnail') or content_data.get('VideoPlayerPreviewImage'),
'thumbnail': content_data.get('VideoPlayerPreviewImage'),
'duration': int_or_none(content_data.get('MediaLengthInSeconds')),
'timestamp': parse_iso8601(content_data.get('PublishedDate')),
'avg_rating': int_or_none(content_data.get('Rating')),

View File

@ -3,11 +3,15 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
lowercase_escape,
url_or_none,
)
class ChaturbateIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?:fullvideo/?\?.*?\bb=)?(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.chaturbate.com/siswet19/',
'info_dict': {
@ -21,6 +25,9 @@ class ChaturbateIE(InfoExtractor):
'skip_download': True,
},
'skip': 'Room is offline',
}, {
'url': 'https://chaturbate.com/fullvideo/?b=caylin',
'only_matching': True,
}, {
'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True,
@ -32,14 +39,34 @@ class ChaturbateIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, headers=self.geo_verification_headers())
'https://chaturbate.com/%s/' % video_id, video_id,
headers=self.geo_verification_headers())
found_m3u8_urls = []
data = self._parse_json(
self._search_regex(
r'initialRoomDossier\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'data', default='{}', group='value'),
video_id, transform_source=lowercase_escape, fatal=False)
if data:
m3u8_url = url_or_none(data.get('hls_source'))
if m3u8_url:
found_m3u8_urls.append(m3u8_url)
if not found_m3u8_urls:
for m in re.finditer(
r'(\\u002[27])(?P<url>http.+?\.m3u8.*?)\1', webpage):
found_m3u8_urls.append(lowercase_escape(m.group('url')))
if not found_m3u8_urls:
for m in re.finditer(
r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage):
found_m3u8_urls.append(m.group('url'))
m3u8_urls = []
for m in re.finditer(
r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage):
m3u8_fast_url, m3u8_no_fast_url = m.group('url'), m.group(
'url').replace('_fast', '')
for found_m3u8_url in found_m3u8_urls:
m3u8_fast_url, m3u8_no_fast_url = found_m3u8_url, found_m3u8_url.replace('_fast', '')
for m3u8_url in (m3u8_fast_url, m3u8_no_fast_url):
if m3u8_url not in m3u8_urls:
m3u8_urls.append(m3u8_url)
@ -59,7 +86,12 @@ class ChaturbateIE(InfoExtractor):
formats = []
for m3u8_url in m3u8_urls:
m3u8_id = 'fast' if '_fast' in m3u8_url else 'slow'
for known_id in ('fast', 'slow'):
if '_%s' % known_id in m3u8_url:
m3u8_id = known_id
break
else:
m3u8_id = None
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4',
# ffmpeg skips segments for fast m3u8

View File

@ -1,74 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
)
class ComCarCoffIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {
'id': '2494164',
'ext': 'mp4',
'upload_date': '20141127',
'timestamp': 1417107600,
'duration': 1232,
'title': 'Happy Thanksgiving Miranda',
'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
},
'params': {
'skip_download': 'requires ffmpeg',
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
if not display_id:
display_id = 'comediansincarsgettingcoffee.com'
webpage = self._download_webpage(url, display_id)
full_data = self._parse_json(
self._search_regex(
r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
display_id)['videoData']
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
'url': video_data['images']['poster'],
}]
timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
video_data.get('pubDate'))
duration = int_or_none(video_data.get('durationSeconds')) or parse_duration(
video_data.get('duration'))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': video_data.get('description'),
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
}

View File

@ -1424,12 +1424,10 @@ class InfoExtractor(object):
try:
self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
return True
except ExtractorError as e:
if isinstance(e.cause, compat_urllib_error.URLError):
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
return False
raise
except ExtractorError:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
return False
def http_scheme(self):
""" Either "http:" or "https:", depending on the user's preferences """
@ -1457,14 +1455,14 @@ class InfoExtractor(object):
def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True, m3u8_id=None):
fatal=True, m3u8_id=None, data=None, headers={}, query={}):
manifest = self._download_xml(
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest',
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244)
transform_source=transform_source,
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if manifest is False:
return []
@ -1588,12 +1586,13 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
fatal=True, live=False):
fatal=True, live=False, data=None, headers={},
query={}):
res = self._download_webpage_handle(
m3u8_url, video_id,
note=note or 'Downloading m3u8 information',
errnote=errnote or 'Failed to download m3u8 information',
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if res is False:
return []
@ -1767,6 +1766,19 @@ class InfoExtractor(object):
# the same GROUP-ID
f['acodec'] = 'none'
formats.append(f)
# for DailyMotion
progressive_uri = last_stream_inf.get('PROGRESSIVE-URI')
if progressive_uri:
http_f = f.copy()
del http_f['manifest_url']
http_f.update({
'format_id': f['format_id'].replace('hls-', 'http-'),
'protocol': 'http',
'url': progressive_uri,
})
formats.append(http_f)
last_stream_inf = {}
return formats
@ -2011,12 +2023,12 @@ class InfoExtractor(object):
})
return entries
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}, data=None, headers={}, query={}):
res = self._download_xml_handle(
mpd_url, video_id,
note=note or 'Downloading MPD manifest',
errnote=errnote or 'Failed to download MPD manifest',
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if res is False:
return []
mpd_doc, urlh = res
@ -2319,12 +2331,12 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
res = self._download_xml_handle(
ism_url, video_id,
note=note or 'Downloading ISM manifest',
errnote=errnote or 'Failed to download ISM manifest',
fatal=fatal)
fatal=fatal, data=data, headers=headers, query=query)
if res is False:
return []
ism_doc, urlh = res
@ -2691,7 +2703,7 @@ class InfoExtractor(object):
entry = {
'id': this_video_id,
'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
'description': video_data.get('description'),
'description': clean_html(video_data.get('description')),
'thumbnail': urljoin(base_url, self._proto_relative_url(video_data.get('image'))),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),

View File

@ -0,0 +1,118 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
)
class CONtvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?contv\.com/details-movie/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.contv.com/details-movie/CEG10022949/days-of-thrills-&-laughter',
'info_dict': {
'id': 'CEG10022949',
'ext': 'mp4',
'title': 'Days Of Thrills & Laughter',
'description': 'md5:5d6b3d0b1829bb93eb72898c734802eb',
'upload_date': '20180703',
'timestamp': 1530634789.61,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://www.contv.com/details-movie/CLIP-show_fotld_bts/fight-of-the-living-dead:-behind-the-scenes-bites',
'info_dict': {
'id': 'CLIP-show_fotld_bts',
'title': 'Fight of the Living Dead: Behind the Scenes Bites',
},
'playlist_mincount': 7,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
details = self._download_json(
'http://metax.contv.live.junctiontv.net/metax/2.5/details/' + video_id,
video_id, query={'device': 'web'})
if details.get('type') == 'episodic':
seasons = self._download_json(
'http://metax.contv.live.junctiontv.net/metax/2.5/seriesfeed/json/' + video_id,
video_id)
entries = []
for season in seasons:
for episode in season.get('episodes', []):
episode_id = episode.get('id')
if not episode_id:
continue
entries.append(self.url_result(
'https://www.contv.com/details-movie/' + episode_id,
CONtvIE.ie_key(), episode_id))
return self.playlist_result(entries, video_id, details.get('title'))
m_details = details['details']
title = details['title']
formats = []
media_hls_url = m_details.get('media_hls_url')
if media_hls_url:
formats.extend(self._extract_m3u8_formats(
media_hls_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
media_mp4_url = m_details.get('media_mp4_url')
if media_mp4_url:
formats.append({
'format_id': 'http',
'url': media_mp4_url,
})
self._sort_formats(formats)
subtitles = {}
captions = m_details.get('captions') or {}
for caption_url in captions.values():
subtitles.setdefault('en', []).append({
'url': caption_url
})
thumbnails = []
for image in m_details.get('images', []):
image_url = image.get('url')
if not image_url:
continue
thumbnails.append({
'url': image_url,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
})
description = None
for p in ('large_', 'medium_', 'small_', ''):
d = m_details.get(p + 'description')
if d:
description = d
break
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': description,
'timestamp': float_or_none(details.get('metax_added_on'), 1000),
'subtitles': subtitles,
'duration': float_or_none(m_details.get('duration'), 1000),
'view_count': int_or_none(details.get('num_watched')),
'like_count': int_or_none(details.get('num_fav')),
'categories': details.get('category'),
'tags': details.get('tags'),
'season_number': int_or_none(details.get('season')),
'episode_number': int_or_none(details.get('episode')),
'release_year': int_or_none(details.get('pub_year')),
}

View File

@ -4,7 +4,12 @@ from __future__ import unicode_literals
import re
from .theplatform import ThePlatformFeedIE
from ..utils import int_or_none
from ..utils import (
dict_get,
ExtractorError,
float_or_none,
int_or_none,
)
class CorusIE(ThePlatformFeedIE):
@ -12,24 +17,49 @@ class CorusIE(ThePlatformFeedIE):
https?://
(?:www\.)?
(?P<domain>
(?:globaltv|etcanada)\.com|
(?:hgtv|foodnetwork|slice|history|showcase|bigbrothercanada)\.ca
(?:
globaltv|
etcanada|
seriesplus|
wnetwork|
ytv
)\.com|
(?:
hgtv|
foodnetwork|
slice|
history|
showcase|
bigbrothercanada|
abcspark|
disney(?:channel|lachaine)
)\.ca
)
/(?:[^/]+/)*
(?:
video\.html\?.*?\bv=|
videos?/(?:[^/]+/)*(?:[a-z0-9-]+-)?
)
(?P<id>
[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}|
(?:[A-Z]{4})?\d{12,20}
)
/(?:video/(?:[^/]+/)?|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
'info_dict': {
'id': '870923331648',
'ext': 'mp4',
'title': 'Movie Night Popcorn with Bryan',
'description': 'Bryan whips up homemade popcorn, the old fashion way for Jojo and Lincoln.',
'uploader': 'SHWM-NEW',
'upload_date': '20170206',
'timestamp': 1486392197,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
'expected_warnings': ['Failed to parse JSON'],
}, {
'url': 'http://www.foodnetwork.ca/shows/chopped/video/episode/chocolate-obsession/video.html?v=872683587753',
'only_matching': True,
@ -48,58 +78,83 @@ class CorusIE(ThePlatformFeedIE):
}, {
'url': 'https://www.bigbrothercanada.ca/video/big-brother-canada-704/1457812035894/',
'only_matching': True
}, {
'url': 'https://www.seriesplus.com/emissions/dre-mary-mort-sur-ordonnance/videos/deux-coeurs-battant/SERP0055626330000200/',
'only_matching': True
}, {
'url': 'https://www.disneychannel.ca/shows/gabby-duran-the-unsittables/video/crybaby-duran-clip/2f557eec-0588-11ea-ae2b-e2c6776b770e/',
'only_matching': True
}]
_TP_FEEDS = {
'globaltv': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'etcanada': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'hgtv': {
'feed_id': 'L0BMHXi2no43',
'account_id': 2414428465,
},
'foodnetwork': {
'feed_id': 'ukK8o58zbRmJ',
'account_id': 2414429569,
},
'slice': {
'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935,
},
'history': {
'feed_id': 'tQFx_TyyEq4J',
'account_id': 2369613659,
},
'showcase': {
'feed_id': '9H6qyshBZU3E',
'account_id': 2414426607,
},
'bigbrothercanada': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
_GEO_BYPASS = False
_SITE_MAP = {
'globaltv': 'series',
'etcanada': 'series',
'foodnetwork': 'food',
'bigbrothercanada': 'series',
'disneychannel': 'disneyen',
'disneylachaine': 'disneyfr',
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
feed_info = self._TP_FEEDS[domain.split('.')[0]]
return self._extract_feed_info('dtjsEC', feed_info['feed_id'], 'byId=' + video_id, video_id, lambda e: {
'episode_number': int_or_none(e.get('pl1$episode')),
'season_number': int_or_none(e.get('pl1$season')),
'series': e.get('pl1$show'),
}, {
'HLS': {
'manifest': 'm3u',
},
'DesktopHLS Default': {
'manifest': 'm3u',
},
'MP4 MBR': {
'manifest': 'm3u',
},
}, feed_info['account_id'])
site = domain.split('.')[0]
path = self._SITE_MAP.get(site, site)
if path != 'series':
path = 'migration/' + path
video = self._download_json(
'https://globalcontent.corusappservices.com/templates/%s/playlist/' % path,
video_id, query={'byId': video_id},
headers={'Accept': 'application/json'})[0]
title = video['title']
formats = []
for source in video.get('sources', []):
smil_url = source.get('file')
if not smil_url:
continue
source_type = source.get('type')
note = 'Downloading%s smil file' % (' ' + source_type if source_type else '')
resp = self._download_webpage(
smil_url, video_id, note, fatal=False,
headers=self.geo_verification_headers())
if not resp:
continue
error = self._parse_json(resp, video_id, fatal=False)
if error:
if error.get('exception') == 'GeoLocationBlocked':
self.raise_geo_restricted(countries=['CA'])
raise ExtractorError(error['description'])
smil = self._parse_xml(resp, video_id, fatal=False)
if smil is None:
continue
namespace = self._parse_smil_namespace(smil)
formats.extend(self._parse_smil_formats(
smil, smil_url, video_id, namespace))
if not formats and video.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats)
subtitles = {}
for track in video.get('tracks', []):
track_url = track.get('file')
if not track_url:
continue
lang = 'fr' if site in ('disneylachaine', 'seriesplus') else 'en'
subtitles.setdefault(lang, []).append({'url': track_url})
metadata = video.get('metadata') or {}
get_number = lambda x: int_or_none(video.get('pl1$' + x) or metadata.get(x + 'Number'))
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': dict_get(video, ('defaultThumbnailUrl', 'thumbnail', 'image')),
'description': video.get('description'),
'timestamp': int_or_none(video.get('availableDate'), 1000),
'subtitles': subtitles,
'duration': float_or_none(metadata.get('duration')),
'series': dict_get(video, ('show', 'pl1$show')),
'season_number': get_number('season'),
'episode_number': get_number('episode'),
}

View File

@ -1,64 +1,105 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import functools
import hashlib
import itertools
import json
import random
import re
import string
from .common import InfoExtractor
from ..compat import compat_struct_pack
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
error_to_compat_str,
age_restricted,
clean_html,
ExtractorError,
int_or_none,
mimetype2ext,
OnDemandPagedList,
parse_iso8601,
sanitized_Request,
str_to_int,
try_get,
unescapeHTML,
update_url_query,
url_or_none,
urlencode_postdata,
)
class DailymotionBaseInfoExtractor(InfoExtractor):
_FAMILY_FILTER = None
_HEADERS = {
'Content-Type': 'application/json',
'Origin': 'https://www.dailymotion.com',
}
_NETRC_MACHINE = 'dailymotion'
def _get_dailymotion_cookies(self):
return self._get_cookies('https://www.dailymotion.com/')
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = sanitized_Request(url)
request.add_header('Cookie', 'family_filter=off; ff=off')
return request
def _get_cookie_value(cookies, name):
cookie = cookies.get('name')
if cookie:
return cookie.value
def _download_webpage_handle_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage_handle(request, *args, **kwargs)
def _set_dailymotion_cookie(self, name, value):
self._set_cookie('www.dailymotion.com', name, value)
def _download_webpage_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage(request, *args, **kwargs)
def _real_initialize(self):
cookies = self._get_dailymotion_cookies()
ff = self._get_cookie_value(cookies, 'ff')
self._FAMILY_FILTER = ff == 'on' if ff else age_restricted(18, self._downloader.params.get('age_limit'))
self._set_dailymotion_cookie('ff', 'on' if self._FAMILY_FILTER else 'off')
def _call_api(self, object_type, xid, object_fields, note, filter_extra=None):
if not self._HEADERS.get('Authorization'):
cookies = self._get_dailymotion_cookies()
token = self._get_cookie_value(cookies, 'access_token') or self._get_cookie_value(cookies, 'client_token')
if not token:
data = {
'client_id': 'f1a362d288c1b98099c7',
'client_secret': 'eea605b96e01c796ff369935357eca920c5da4c5',
}
username, password = self._get_login_info()
if username:
data.update({
'grant_type': 'password',
'password': password,
'username': username,
})
else:
data['grant_type'] = 'client_credentials'
try:
token = self._download_json(
'https://graphql.api.dailymotion.com/oauth/token',
None, 'Downloading Access Token',
data=urlencode_postdata(data))['access_token']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
raise ExtractorError(self._parse_json(
e.cause.read().decode(), xid)['error_description'], expected=True)
raise
self._set_dailymotion_cookie('access_token' if username else 'client_token', token)
self._HEADERS['Authorization'] = 'Bearer ' + token
resp = self._download_json(
'https://graphql.api.dailymotion.com/', xid, note, data=json.dumps({
'query': '''{
%s(xid: "%s"%s) {
%s
}
}''' % (object_type, xid, ', ' + filter_extra if filter_extra else '', object_fields),
}).encode(), headers=self._HEADERS)
obj = resp['data'][object_type]
if not obj:
raise ExtractorError(resp['errors'][0]['message'], expected=True)
return obj
class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'(?i)https?://(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|#)/)?video|swf)/(?P<id>[^/?_]+)'
_VALID_URL = r'''(?ix)
https?://
(?:
(?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
(?:www\.)?lequipe\.fr/video
)
/(?P<id>[^/?_]+)(?:.+?\bplaylist=(?P<playlist_id>x[0-9a-z]+))?
'''
IE_NAME = 'dailymotion'
_FORMATS = [
('stream_h264_ld_url', 'ld'),
('stream_h264_url', 'standard'),
('stream_h264_hq_url', 'hq'),
('stream_h264_hd_url', 'hd'),
('stream_h264_hd1080_url', 'hd180'),
]
_TESTS = [{
'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
'md5': '074b95bdee76b9e3654137aee9c79dfe',
@ -67,7 +108,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'ext': 'mp4',
'title': 'Office Christmas Party Review Jason Bateman, Olivia Munn, T.J. Miller',
'description': 'Office Christmas Party Review - Jason Bateman, Olivia Munn, T.J. Miller',
'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
'duration': 187,
'timestamp': 1493651285,
'upload_date': '20170501',
@ -133,7 +173,22 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}, {
'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/x791mem',
'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
'only_matching': True,
}, {
'url': 'https://www.dailymotion.com/video/x3z49k?playlist=xv4bw',
'only_matching': True,
}]
_GEO_BYPASS = False
_COMMON_MEDIA_FIELDS = '''description
geoblockedCountries {
allowed
}
xid'''
@staticmethod
def _extract_urls(webpage):
@ -149,264 +204,140 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
return urls
def _real_extract(self, url):
video_id = self._match_id(url)
video_id, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage_no_ff(
'https://www.dailymotion.com/video/%s' % video_id, video_id)
if playlist_id:
if not self._downloader.params.get('noplaylist'):
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
return self.url_result(
'http://www.dailymotion.com/playlist/' + playlist_id,
'DailymotionPlaylist', playlist_id)
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
age_limit = self._rta_search(webpage)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage, 'description')
view_count_str = self._search_regex(
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
webpage, 'view count', default=None)
if view_count_str:
view_count_str = re.sub(r'\s', '', view_count_str)
view_count = str_to_int(view_count_str)
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', default=None))
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/ytdl-org/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});',
# New layout regex (see https://github.com/ytdl-org/youtube-dl/issues/13580)
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id, fatal=False) or {}
metadata = try_get(player, lambda x: x['metadata'], dict)
if not metadata:
metadata_url = url_or_none(try_get(
player, lambda x: x['context']['metadata_template_url1']))
if metadata_url:
metadata_url = metadata_url.replace(':videoId', video_id)
else:
metadata_url = update_url_query(
'https://www.dailymotion.com/player/metadata/video/%s'
% video_id, {
'embedder': url,
'integration': 'inline',
'GK_PV5_NEON': '1',
})
metadata = self._download_json(
metadata_url, video_id, 'Downloading metadata JSON')
if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)
us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
t = ''.join(random.choice(string.ascii_letters) for i in range(10))
n = us64e(compat_struct_pack('I', r))
i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
metadata = self._download_json(
'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
self._check_error(metadata)
formats = []
for quality, media_list in metadata['qualities'].items():
for media in media_list:
media_url = media.get('url')
if not media_url:
continue
type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest':
continue
ext = mimetype2ext(type_) or determine_ext(media_url)
if ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False)
for f in m3u8_formats:
f['url'] = f['url'].split('#')[0]
formats.append(f)
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else:
f = {
'url': media_url,
'format_id': 'http-%s' % quality,
'ext': ext,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(f)
self._sort_formats(formats)
title = metadata['title']
duration = int_or_none(metadata.get('duration'))
timestamp = int_or_none(metadata.get('created_time'))
thumbnail = metadata.get('poster_url')
uploader = metadata.get('owner', {}).get('screenname')
uploader_id = metadata.get('owner', {}).get('id')
subtitles = {}
subtitles_data = metadata.get('subtitles', {}).get('data', {})
if subtitles_data and isinstance(subtitles_data, dict):
for subtitle_lang, subtitle in subtitles_data.items():
subtitles[subtitle_lang] = [{
'ext': determine_ext(subtitle_url),
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'age_limit': age_limit,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats,
'subtitles': subtitles,
}
# vevo embed
vevo_id = self._search_regex(
r'<link rel="video_src" href="[^"]*?vevo\.com[^"]*?video=(?P<id>[\w]*)',
webpage, 'vevo embed', default=None)
if vevo_id:
return self.url_result('vevo:%s' % vevo_id, 'Vevo')
# fallback old player
embed_page = self._download_webpage_no_ff(
'https://www.dailymotion.com/embed/video/%s' % video_id,
video_id, 'Downloading embed page')
timestamp = parse_iso8601(self._html_search_meta(
'video:release_date', webpage, 'upload date'))
info = self._parse_json(
self._search_regex(
r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE),
video_id)
self._check_error(info)
formats = []
for (key, format_id) in self._FORMATS:
video_url = info.get(key)
if video_url is not None:
m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if m_size is not None:
width, height = map(int_or_none, (m_size.group(1), m_size.group(2)))
else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
self._sort_formats(formats)
# subtitles
video_subtitles = self.extract_subtitles(video_id, webpage)
title = self._og_search_title(webpage, default=None)
if title is None:
title = self._html_search_regex(
r'(?s)<span\s+id="video_title"[^>]*>(.*?)</span>', webpage,
'title')
return {
'id': video_id,
'formats': formats,
'uploader': info['owner.screenname'],
'timestamp': timestamp,
'title': title,
'description': description,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'],
'age_limit': age_limit,
'view_count': view_count,
'duration': info['duration']
password = self._downloader.params.get('videopassword')
media = self._call_api(
'media', video_id, '''... on Video {
%s
stats {
likes {
total
}
views {
total
}
}
}
... on Live {
%s
audienceCount
isOnAir
}''' % (self._COMMON_MEDIA_FIELDS, self._COMMON_MEDIA_FIELDS), 'Downloading media JSON metadata',
'password: "%s"' % self._downloader.params.get('videopassword') if password else None)
xid = media['xid']
def _check_error(self, info):
error = info.get('error')
metadata = self._download_json(
'https://www.dailymotion.com/player/metadata/video/' + xid,
xid, 'Downloading metadata JSON',
query={'app': 'com.dailymotion.neon'})
error = metadata.get('error')
if error:
title = error.get('title') or error['message']
title = error.get('title') or error['raw_message']
# See https://developer.dailymotion.com/api#access-error
if error.get('code') == 'DM007':
self.raise_geo_restricted(msg=title)
allowed_countries = try_get(media, lambda x: x['geoblockedCountries']['allowed'], list)
self.raise_geo_restricted(msg=title, countries=allowed_countries)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, title), expected=True)
def _get_subtitles(self, video_id, webpage):
try:
sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
sub_lang_list = dict((l['language'], [{'url': l['url'], 'ext': 'srt'}]) for l in info['list'])
return sub_lang_list
self._downloader.report_warning('video doesn\'t have subtitles')
return {}
title = metadata['title']
is_live = media.get('isOnAir')
formats = []
for quality, media_list in metadata['qualities'].items():
for m in media_list:
media_url = m.get('url')
media_type = m.get('type')
if not media_url or media_type == 'application/vnd.lumberjack.manifest':
continue
if media_type == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4',
'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
f = {
'url': media_url,
'format_id': 'http-' + quality,
}
m = re.search(r'/H264-(\d+)x(\d+)(?:-(60)/)?', media_url)
if m:
width, height, fps = map(int_or_none, m.groups())
f.update({
'fps': fps,
'height': height,
'width': width,
})
formats.append(f)
for f in formats:
f['url'] = f['url'].split('#')[0]
if not f.get('fps') and f['format_id'].endswith('@60'):
f['fps'] = 60
self._sort_formats(formats)
subtitles = {}
subtitles_data = try_get(metadata, lambda x: x['subtitles']['data'], dict) or {}
for subtitle_lang, subtitle in subtitles_data.items():
subtitles[subtitle_lang] = [{
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
thumbnails = []
for height, poster_url in metadata.get('posters', {}).items():
thumbnails.append({
'height': int_or_none(height),
'id': height,
'url': poster_url,
})
owner = metadata.get('owner') or {}
stats = media.get('stats') or {}
get_count = lambda x: int_or_none(try_get(stats, lambda y: y[x + 's']['total']))
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': clean_html(media.get('description')),
'thumbnails': thumbnails,
'duration': int_or_none(metadata.get('duration')) or None,
'timestamp': int_or_none(metadata.get('created_time')),
'uploader': owner.get('screenname'),
'uploader_id': owner.get('id') or metadata.get('screenname'),
'age_limit': 18 if metadata.get('explicit') else 0,
'tags': metadata.get('tags'),
'view_count': get_count('view') or int_or_none(media.get('audienceCount')),
'like_count': get_count('like'),
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
}
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': {
'title': 'SPORT',
'id': 'xv4bw',
},
'playlist_mincount': 20,
}]
class DailymotionPlaylistBaseIE(DailymotionBaseInfoExtractor):
_PAGE_SIZE = 100
def _fetch_page(self, playlist_id, authorizaion, page):
def _fetch_page(self, playlist_id, page):
page += 1
videos = self._download_json(
'https://graphql.api.dailymotion.com',
playlist_id, 'Downloading page %d' % page,
data=json.dumps({
'query': '''{
collection(xid: "%s") {
videos(first: %d, page: %d) {
pageInfo {
hasNextPage
nextPage
}
videos = self._call_api(
self._OBJECT_TYPE, playlist_id,
'''videos(allowExplicit: %s, first: %d, page: %d) {
edges {
node {
xid
url
}
}
}
}
}''' % (playlist_id, self._PAGE_SIZE, page)
}).encode(), headers={
'Authorization': authorizaion,
'Origin': 'https://www.dailymotion.com',
})['data']['collection']['videos']
}''' % ('false' if self._FAMILY_FILTER else 'true', self._PAGE_SIZE, page),
'Downloading page %d' % page)['videos']
for edge in videos['edges']:
node = edge['node']
yield self.url_result(
@ -414,86 +345,49 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
api = self._parse_json(self._search_regex(
r'__PLAYER_CONFIG__\s*=\s*({.+?});',
webpage, 'player config'), playlist_id)['context']['api']
auth = self._download_json(
api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
playlist_id, data=urlencode_postdata({
'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
'grant_type': 'client_credentials',
}))
authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
entries = OnDemandPagedList(functools.partial(
self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE)
self._fetch_page, playlist_id), self._PAGE_SIZE)
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage))
entries, playlist_id)
class DailymotionUserIE(DailymotionBaseInfoExtractor):
class DailymotionPlaylistIE(DailymotionPlaylistBaseIE):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': {
'id': 'xv4bw',
},
'playlist_mincount': 20,
}]
_OBJECT_TYPE = 'collection'
class DailymotionUserIE(DailymotionPlaylistBaseIE):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
'title': 'Rémi Gaillard',
},
'playlist_mincount': 100,
'playlist_mincount': 152,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'playlist_mincount': 1000,
'skip': 'Takes too long time',
}, {
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
},
'playlist_mincount': 148,
'params': {
'age_limit': 0,
},
}]
def _extract_entries(self, id):
video_ids = set()
processed_urls = set()
for pagenum in itertools.count(1):
page_url = self._PAGE_TEMPLATE % (id, pagenum)
webpage, urlh = self._download_webpage_handle_no_ff(
page_url, id, 'Downloading page %s' % pagenum)
if urlh.geturl() in processed_urls:
self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
page_url, urlh.geturl()), id)
break
processed_urls.add(urlh.geturl())
for video_id in re.findall(r'data-xid="(.+?)"', webpage):
if video_id not in video_ids:
yield self.url_result(
'http://www.dailymotion.com/video/%s' % video_id,
DailymotionIE.ie_key(), video_id)
video_ids.add(video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(
'https://www.dailymotion.com/user/%s' % user, user)
full_user = unescapeHTML(self._html_search_regex(
r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
webpage, 'user'))
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}
_OBJECT_TYPE = 'channel'

View File

@ -1,154 +0,0 @@
from __future__ import unicode_literals
import base64
import json
import random
import re
from .common import InfoExtractor
from ..aes import (
aes_cbc_decrypt,
aes_cbc_encrypt,
)
from ..compat import compat_b64decode
from ..utils import (
bytes_to_intlist,
bytes_to_long,
extract_attributes,
ExtractorError,
intlist_to_bytes,
js_to_json,
int_or_none,
long_to_bytes,
pkcs1pad,
)
class DaisukiMottoIE(InfoExtractor):
_VALID_URL = r'https?://motto\.daisuki\.net/framewatch/embed/[^/]+/(?P<id>[0-9a-zA-Z]{3})'
_TEST = {
'url': 'http://motto.daisuki.net/framewatch/embed/embedDRAGONBALLSUPERUniverseSurvivalsaga/V2e/760/428',
'info_dict': {
'id': 'V2e',
'ext': 'mp4',
'title': '#117 SHOWDOWN OF LOVE! ANDROIDS VS UNIVERSE 2!!',
'subtitles': {
'mul': [{
'ext': 'ttml',
}],
},
},
'params': {
'skip_download': True, # AES-encrypted HLS stream
},
}
# The public key in PEM format can be found in clientlibs_anime_watch.min.js
_RSA_KEY = (0xc5524c25e8e14b366b3754940beeb6f96cb7e2feef0b932c7659a0c5c3bf173d602464c2df73d693b513ae06ff1be8f367529ab30bf969c5640522181f2a0c51ea546ae120d3d8d908595e4eff765b389cde080a1ef7f1bbfb07411cc568db73b7f521cedf270cbfbe0ddbc29b1ac9d0f2d8f4359098caffee6d07915020077d, 65537)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flashvars = self._parse_json(self._search_regex(
r'(?s)var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
video_id, transform_source=js_to_json)
iv = [0] * 16
data = {}
for key in ('device_cd', 'mv_id', 'ss1_prm', 'ss2_prm', 'ss3_prm', 'ss_id'):
data[key] = flashvars.get(key, '')
encrypted_rtn = None
# Some AES keys are rejected. Try it with different AES keys
for idx in range(5):
aes_key = [random.randint(0, 254) for _ in range(32)]
padded_aeskey = intlist_to_bytes(pkcs1pad(aes_key, 128))
n, e = self._RSA_KEY
encrypted_aeskey = long_to_bytes(pow(bytes_to_long(padded_aeskey), e, n))
init_data = self._download_json(
'http://motto.daisuki.net/fastAPI/bgn/init/',
video_id, query={
's': flashvars.get('s', ''),
'c': flashvars.get('ss3_prm', ''),
'e': url,
'd': base64.b64encode(intlist_to_bytes(aes_cbc_encrypt(
bytes_to_intlist(json.dumps(data)),
aes_key, iv))).decode('ascii'),
'a': base64.b64encode(encrypted_aeskey).decode('ascii'),
}, note='Downloading JSON metadata' + (' (try #%d)' % (idx + 1) if idx > 0 else ''))
if 'rtn' in init_data:
encrypted_rtn = init_data['rtn']
break
self._sleep(5, video_id)
if encrypted_rtn is None:
raise ExtractorError('Failed to fetch init data')
rtn = self._parse_json(
intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
compat_b64decode(encrypted_rtn)),
aes_key, iv)).decode('utf-8').rstrip('\0'),
video_id)
title = rtn['title_str']
formats = self._extract_m3u8_formats(
rtn['play_url'], video_id, ext='mp4', entry_protocol='m3u8_native')
subtitles = {}
caption_url = rtn.get('caption_url')
if caption_url:
# mul: multiple languages
subtitles['mul'] = [{
'url': caption_url,
'ext': 'ttml',
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'subtitles': subtitles,
}
class DaisukiMottoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://motto\.daisuki\.net/(?P<id>information)/'
_TEST = {
'url': 'http://motto.daisuki.net/information/',
'info_dict': {
'title': 'DRAGON BALL SUPER',
},
'playlist_mincount': 117,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = []
for li in re.findall(r'(<li[^>]+?data-product_id="[a-zA-Z0-9]{3}"[^>]+>)', webpage):
attr = extract_attributes(li)
ad_id = attr.get('data-ad_id')
product_id = attr.get('data-product_id')
if ad_id and product_id:
episode_id = attr.get('data-chapter')
entries.append({
'_type': 'url_transparent',
'url': 'http://motto.daisuki.net/framewatch/embed/%s/%s/760/428' % (ad_id, product_id),
'episode_id': episode_id,
'episode_number': int_or_none(episode_id),
'ie_key': 'DaisukiMotto',
})
return self.playlist_result(entries, playlist_title='DRAGON BALL SUPER')

View File

@ -2,25 +2,21 @@
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
int_or_none,
str_to_int,
xpath_text,
unescapeHTML,
)
class DaumIE(InfoExtractor):
class DaumBaseIE(InfoExtractor):
_KAKAO_EMBED_BASE = 'http://tv.kakao.com/embed/player/cliplink/'
class DaumIE(DaumBaseIE):
_VALID_URL = r'https?://(?:(?:m\.)?tvpot\.daum\.net/v/|videofarm\.daum\.net/controller/player/VodPlayer\.swf\?vid=)(?P<id>[^?#&]+)'
IE_NAME = 'daum.net'
@ -36,6 +32,9 @@ class DaumIE(InfoExtractor):
'duration': 2117,
'view_count': int,
'comment_count': int,
'uploader_id': 186139,
'uploader': '콘간지',
'timestamp': 1387310323,
},
}, {
'url': 'http://m.tvpot.daum.net/v/65139429',
@ -44,11 +43,14 @@ class DaumIE(InfoExtractor):
'ext': 'mp4',
'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
'description': 'md5:79794514261164ff27e36a21ad229fc5',
'upload_date': '20150604',
'upload_date': '20150118',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'duration': 154,
'view_count': int,
'comment_count': int,
'uploader': 'MBC 예능',
'uploader_id': 132251,
'timestamp': 1421604228,
},
}, {
'url': 'http://tvpot.daum.net/v/07dXWRka62Y%24',
@ -59,12 +61,15 @@ class DaumIE(InfoExtractor):
'id': 'vwIpVpCQsT8$',
'ext': 'flv',
'title': '01-Korean War ( Trouble on the horizon )',
'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
'description': 'Korean War 01\r\nTrouble on the horizon\r\n전쟁의 먹구름',
'upload_date': '20080223',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'duration': 249,
'view_count': int,
'comment_count': int,
'uploader': '까칠한 墮落始祖 황비홍님의',
'uploader_id': 560824,
'timestamp': 1203770745,
},
}, {
# Requires dte_type=WEB (#9972)
@ -73,60 +78,24 @@ class DaumIE(InfoExtractor):
'info_dict': {
'id': 's3794Uf1NZeZ1qMpGpeqeRU',
'ext': 'mp4',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
'upload_date': '20160611',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
'upload_date': '20170129',
'uploader': '쇼! 음악중심',
'uploader_id': 2653210,
'timestamp': 1485684628,
},
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url))
movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
'Downloading video info', query={'vid': video_id})
formats = []
for format_el in movie_data['output_list']['output_list']:
profile = format_el['profile']
format_query = compat_urllib_parse_urlencode({
'vid': video_id,
'profile': profile,
})
url_doc = self._download_xml(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieLocation.apixml?' + format_query,
video_id, note='Downloading video data for %s format' % profile)
format_url = url_doc.find('result/url').text
formats.append({
'url': format_url,
'format_id': profile,
'width': int_or_none(format_el.get('width')),
'height': int_or_none(format_el.get('height')),
'filesize': int_or_none(format_el.get('filesize')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info.find('TITLE').text,
'formats': formats,
'thumbnail': xpath_text(info, 'THUMB_URL'),
'description': xpath_text(info, 'CONTENTS'),
'duration': int_or_none(xpath_text(info, 'DURATION')),
'upload_date': info.find('REGDTTM').text[:8],
'view_count': str_to_int(xpath_text(info, 'PLAY_CNT')),
'comment_count': str_to_int(xpath_text(info, 'COMMENT_CNT')),
}
if not video_id.isdigit():
video_id += '@my'
return self.url_result(
self._KAKAO_EMBED_BASE + video_id, 'Kakao', video_id)
class DaumClipIE(InfoExtractor):
class DaumClipIE(DaumBaseIE):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:clip/ClipView.(?:do|tv)|mypot/View.do)\?.*?clipid=(?P<id>\d+)'
IE_NAME = 'daum.net:clip'
_URL_TEMPLATE = 'http://tvpot.daum.net/clip/ClipView.do?clipid=%s'
@ -142,6 +111,9 @@ class DaumClipIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'duration': 3868,
'view_count': int,
'uploader': 'GOMeXP',
'uploader_id': 6667,
'timestamp': 1377911092,
},
}, {
'url': 'http://m.tvpot.daum.net/clip/ClipView.tv?clipid=54999425',
@ -154,22 +126,8 @@ class DaumClipIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
clip_info = self._download_json(
'http://tvpot.daum.net/mypot/json/GetClipInfo.do?clipid=%s' % video_id,
video_id, 'Downloading clip info')['clip_bean']
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'http://tvpot.daum.net/v/%s' % clip_info['vid'],
'title': unescapeHTML(clip_info['title']),
'thumbnail': clip_info.get('thumb_url'),
'description': clip_info.get('contents'),
'duration': int_or_none(clip_info.get('duration')),
'upload_date': clip_info.get('up_date')[:8],
'view_count': int_or_none(clip_info.get('play_count')),
'ie_key': 'Daum',
}
return self.url_result(
self._KAKAO_EMBED_BASE + video_id, 'Kakao', video_id)
class DaumListIE(InfoExtractor):

View File

@ -5,14 +5,8 @@ import re
import string
from .discoverygo import DiscoveryGoBaseIE
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
ExtractorError,
try_get,
)
from ..compat import compat_urllib_parse_unquote
from ..utils import ExtractorError
from ..compat import compat_HTTPError
@ -40,15 +34,15 @@ class DiscoveryIE(DiscoveryGoBaseIE):
cookingchanneltv|
motortrend
)
)\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
)\.com/tv-shows/(?P<show_slug>[^/]+)/(?:video|full-episode)s/(?P<id>[^./?#]+)'''
_TESTS = [{
'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry',
'info_dict': {
'id': '5a2d9b4d6b66d17a5026e1fd',
'id': '5a2f35ce6b66d17a5026e29e',
'ext': 'mp4',
'title': 'Dave Foley',
'description': 'md5:4b39bcafccf9167ca42810eb5f28b01f',
'duration': 608,
'title': 'Riding with Matthew Perry',
'description': 'md5:a34333153e79bc4526019a5129e7f878',
'duration': 84,
},
'params': {
'skip_download': True, # requires ffmpeg
@ -59,20 +53,17 @@ class DiscoveryIE(DiscoveryGoBaseIE):
}, {
'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
'only_matching': True,
}, {
# using `show_slug` is important to get the correct video data
'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
_API_BASE_URL = 'https://api.discovery.com/v1/'
def _real_extract(self, url):
site, path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
react_data = self._parse_json(self._search_regex(
r'window\.__reactTransmitPacket\s*=\s*({.+?});',
webpage, 'react data'), display_id)
content_blocks = react_data['layout'][path]['contentBlocks']
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id']
site, show_slug, display_id = re.match(self._VALID_URL, url).groups()
access_token = None
cookies = self._get_cookies(url)
@ -82,27 +73,36 @@ class DiscoveryIE(DiscoveryGoBaseIE):
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)),
video_id, fatal=False) or {}
display_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json(
'https://%s.com/anonymous' % site, display_id, query={
'https://%s.com/anonymous' % site, display_id,
'Downloading token JSON metadata', query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token']
try:
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
try:
video = self._download_json(
self._API_BASE_URL + 'content/videos',
display_id, 'Downloading content JSON metadata',
headers=headers, query={
'embed': 'show.name',
'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags',
'slug': display_id,
'show_slug': show_slug,
})[0]
video_id = video['id']
stream = self._download_json(
'https://api.discovery.com/v1/streaming/video/' + video_id,
display_id, headers=headers)
self._API_BASE_URL + 'streaming/video/' + video_id,
display_id, 'Downloading streaming JSON metadata', headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json(

View File

@ -3,63 +3,38 @@ from __future__ import unicode_literals
import re
from .brightcove import BrightcoveLegacyIE
from .dplay import DPlayIE
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import smuggle_url
class DiscoveryNetworksDeIE(DPlayIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site>discovery|tlc|animalplanet|dmax)\.de/
(?:
.*\#(?P<id>\d+)|
(?:[^/]+/)*videos/(?P<display_id>[^/?#]+)|
programme/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)
)'''
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show)/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)'
_TESTS = [{
'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
'url': 'https://www.tlc.de/programme/breaking-amish/video/die-welt-da-drauen/DCB331270001100',
'info_dict': {
'id': '3235167922001',
'id': '78867',
'ext': 'mp4',
'title': 'Breaking Amish: Die Welt da draußen',
'description': (
'Vier Amische und eine Mennonitin wagen in New York'
' den Sprung in ein komplett anderes Leben. Begleitet sie auf'
' ihrem spannenden Weg.'),
'timestamp': 1396598084,
'upload_date': '20140404',
'uploader_id': '1659832546',
'title': 'Die Welt da draußen',
'description': 'md5:61033c12b73286e409d99a41742ef608',
'timestamp': 1554069600,
'upload_date': '20190331',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'http://www.dmax.de/programme/storage-hunters-uk/videos/storage-hunters-uk-episode-6/',
'url': 'https://www.dmax.de/programme/dmax-highlights/video/tuning-star-sidney-hoffmann-exklusiv-bei-dmax/191023082312316',
'only_matching': True,
}, {
'url': 'http://www.discovery.de/#5332316765001',
'url': 'https://www.dplay.co.uk/show/ghost-adventures/video/hotel-leger-103620/EHD_280313B',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1659832546/default_default/index.html?videoId=%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
alternate_id = mobj.group('alternate_id')
if alternate_id:
self._initialize_geo_bypass({
'countries': ['DE'],
})
return self._get_disco_api_info(
url, '%s/%s' % (mobj.group('programme'), alternate_id),
'sonic-eu1-prod.disco-api.com', mobj.group('site') + 'de')
brightcove_id = mobj.group('id')
if not brightcove_id:
title = mobj.group('title')
webpage = self._download_webpage(url, title)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(
brightcove_legacy_url).query)['@videoPlayer'][0]
return self.url_result(smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['DE']}),
'BrightcoveNew', brightcove_id)
domain, programme, alternate_id = re.match(self._VALID_URL, url).groups()
country = 'GB' if domain == 'dplay.co.uk' else 'DE'
realm = 'questuk' if country == 'GB' else domain.replace('.', '')
return self._get_disco_api_info(
url, '%s/%s' % (programme, alternate_id),
'sonic-eu1-prod.disco-api.com', realm, country)

View File

@ -9,8 +9,8 @@ from ..utils import int_or_none
class DLiveVODIE(InfoExtractor):
IE_NAME = 'dlive:vod'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[a-zA-Z0-9]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://dlive.tv/p/pdp+3mTzOl4WR',
'info_dict': {
'id': '3mTzOl4WR',
@ -20,7 +20,10 @@ class DLiveVODIE(InfoExtractor):
'timestamp': 1562011015,
'uploader_id': 'pdp',
}
}
}, {
'url': 'https://dlive.tv/p/pdpreplay+D-RD-xSZg',
'only_matching': True,
}]
def _real_extract(self, url):
uploader_id, vod_id = re.match(self._VALID_URL, url).groups()

View File

@ -1,74 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
import time
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
ExtractorError,
float_or_none,
int_or_none,
remove_end,
try_get,
unified_strdate,
unified_timestamp,
update_url_query,
urljoin,
USER_AGENTS,
)
class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>www\.(?P<host>dplay\.(?P<country>dk|se|no)))/(?:video(?:er|s)/)?(?P<id>[^/]+/[^/?#]+)'
_VALID_URL = r'''(?x)https?://
(?P<domain>
(?:www\.)?(?P<host>dplay\.(?P<country>dk|fi|jp|se|no))|
(?P<subdomain_country>es|it)\.dplay\.com
)/[^/]+/(?P<id>[^/]+/[^/?#]+)'''
_TESTS = [{
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'url': 'https://www.dplay.se/videos/nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
'info_dict': {
'id': '3172',
'display_id': 'nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet',
'id': '13628',
'display_id': 'nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'duration': 2650,
'timestamp': 1365454320,
'duration': 2649.856,
'timestamp': 1365453720,
'upload_date': '20130408',
'creator': 'Kanal 5 (Home)',
'creator': 'Kanal 5',
'series': 'Nugammalt - 77 händelser som format Sverige',
'season_number': 1,
'episode_number': 1,
'age_limit': 0,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
# geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'url': 'http://www.dplay.dk/videoer/ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
'info_dict': {
'id': '70816',
'display_id': 'mig-og-min-mor/season-6-episode-12',
'id': '104465',
'display_id': 'ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
'ext': 'mp4',
'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
'duration': 2563,
'timestamp': 1429696800,
'upload_date': '20150422',
'creator': 'Kanal 4 (Home)',
'series': 'Mig og min mor',
'season_number': 6,
'episode_number': 12,
'age_limit': 0,
'title': 'Ted Bundy: Mind Of A Monster',
'description': 'md5:8b780f6f18de4dae631668b8a9637995',
'duration': 5290.027,
'timestamp': 1570694400,
'upload_date': '20191010',
'creator': 'ID - Investigation Discovery',
'series': 'Ted Bundy: Mind Of A Monster',
'season_number': 1,
'episode_number': 1,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True,
}, {
# disco-api
'url': 'https://www.dplay.no/videoer/i-kongens-klr/sesong-1-episode-7',
@ -89,19 +83,59 @@ class DPlayIE(InfoExtractor):
'format': 'bestvideo',
'skip_download': True,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://www.dplay.dk/videoer/singleliv/season-5-episode-3',
'url': 'http://it.dplay.com/nove/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij/',
'md5': '2b808ffb00fc47b884a172ca5d13053c',
'info_dict': {
'id': '6918',
'display_id': 'biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij',
'ext': 'mp4',
'title': 'Luigi Di Maio: la psicosi di Stanislawskij',
'description': 'md5:3c7a4303aef85868f867a26f5cc14813',
'thumbnail': r're:^https?://.*\.jpe?g',
'upload_date': '20160524',
'timestamp': 1464076800,
'series': 'Biografie imbarazzanti',
'season_number': 1,
'episode': 'Episode 1',
'episode_number': 1,
},
}, {
'url': 'https://es.dplay.com/dmax/la-fiebre-del-oro/temporada-8-episodio-1/',
'info_dict': {
'id': '21652',
'display_id': 'la-fiebre-del-oro/temporada-8-episodio-1',
'ext': 'mp4',
'title': 'Episodio 1',
'description': 'md5:b9dcff2071086e003737485210675f69',
'thumbnail': r're:^https?://.*\.png',
'upload_date': '20180709',
'timestamp': 1531173540,
'series': 'La fiebre del oro',
'season_number': 8,
'episode': 'Episode 1',
'episode_number': 1,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dplay.fi/videot/shifting-gears-with-aaron-kaufman/episode-16',
'only_matching': True,
}, {
'url': 'https://www.dplay.se/videos/sofias-anglar/sofias-anglar-1001',
'url': 'https://www.dplay.jp/video/gold-rush/24086',
'only_matching': True,
}]
def _get_disco_api_info(self, url, display_id, disco_host, realm):
disco_base = 'https://' + disco_host
def _get_disco_api_info(self, url, display_id, disco_host, realm, country):
geo_countries = [country.upper()]
self._initialize_geo_bypass({
'countries': geo_countries,
})
disco_base = 'https://%s/' % disco_host
token = self._download_json(
'%s/token' % disco_base, display_id, 'Downloading token',
disco_base + 'token', display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
@ -110,17 +144,35 @@ class DPlayIE(InfoExtractor):
'Authorization': 'Bearer ' + token,
}
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
disco_base + 'content/videos/' + display_id, display_id,
headers=headers, query={
'include': 'show'
'fields[channel]': 'name',
'fields[image]': 'height,src,width',
'fields[show]': 'name',
'fields[tag]': 'name',
'fields[video]': 'description,episodeNumber,name,publishStart,seasonNumber,videoDuration',
'include': 'images,primaryChannel,show,tags'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
title = info['name'].strip()
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id, headers=headers)['data']['attributes']['streaming'].items():
try:
streaming = self._download_json(
disco_base + 'playback/videoPlaybackInfo/' + video_id,
display_id, headers=headers)['data']['attributes']['streaming']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
error_code = error.get('code')
if error_code == 'access.denied.geoblocked':
self.raise_geo_restricted(countries=geo_countries)
elif error_code == 'access.denied.missingpackage':
self.raise_login_required()
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
for format_id, format_dict in streaming.items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
@ -142,235 +194,54 @@ class DPlayIE(InfoExtractor):
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
creator = series = None
tags = []
thumbnails = []
included = video.get('included') or []
if isinstance(included, list):
for e in included:
attributes = e.get('attributes')
if not attributes:
continue
e_type = e.get('type')
if e_type == 'channel':
creator = attributes.get('name')
elif e_type == 'image':
src = attributes.get('src')
if src:
thumbnails.append({
'url': src,
'width': int_or_none(attributes.get('width')),
'height': int_or_none(attributes.get('height')),
})
if e_type == 'show':
series = attributes.get('name')
elif e_type == 'tag':
name = attributes.get('name')
if name:
tags.append(name)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'duration': float_or_none(info.get('videoDuration'), 1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'creator': creator,
'tags': tags,
'thumbnails': thumbnails,
'formats': formats,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
domain = mobj.group('domain')
self._initialize_geo_bypass({
'countries': [mobj.group('country').upper()],
})
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'video id', default=None)
if not video_id:
host = mobj.group('host')
return self._get_disco_api_info(
url, display_id, 'disco-api.' + host, host.replace('.', ''))
info = self._download_json(
'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
video_id)['data'][0]
title = info['title']
PROTOCOLS = ('hls', 'hds')
formats = []
def extract_formats(protocol, manifest_url):
if protocol == 'hls':
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
# ourselves. Also fragments' URLs are only served signed for
# Safari user agent.
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
for m3u8_format in m3u8_formats:
m3u8_format.update({
'url': update_url_query(m3u8_format['url'], query),
'http_headers': {
'User-Agent': USER_AGENTS['Safari'],
},
})
formats.extend(m3u8_formats)
elif protocol == 'hds':
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
video_id, f4m_id=protocol, fatal=False))
domain_tld = domain.split('.')[-1]
if domain_tld in ('se', 'dk', 'no'):
for protocol in PROTOCOLS:
# Providing dsc-geo allows to bypass geo restriction in some cases
self._set_cookie(
'secure.dplay.%s' % domain_tld, 'dsc-geo',
json.dumps({
'countryCode': domain_tld.upper(),
'expiry': (time.time() + 20 * 60) * 1000,
}))
stream = self._download_json(
'https://secure.dplay.%s/secure/api/v2/user/authorization/stream/%s?stream_type=%s'
% (domain_tld, video_id, protocol), video_id,
'Downloading %s stream JSON' % protocol, fatal=False)
if stream and stream.get(protocol):
extract_formats(protocol, stream[protocol])
# The last resort is to try direct unsigned hls/hds URLs from info dictionary.
# Sometimes this does work even when secure API with dsc-geo has failed (e.g.
# http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
if not formats:
for protocol in PROTOCOLS:
if info.get(protocol):
extract_formats(protocol, info[protocol])
self._sort_formats(formats)
subtitles = {}
for lang in ('se', 'sv', 'da', 'nl', 'no'):
for format_id in ('web_vtt', 'vtt', 'srt'):
subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
if subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('video_metadata_longDescription'),
'duration': int_or_none(info.get('video_metadata_length'), scale=1000),
'timestamp': int_or_none(info.get('video_publish_date')),
'creator': info.get('video_metadata_homeChannel'),
'series': info.get('video_metadata_show'),
'season_number': int_or_none(info.get('season')),
'episode_number': int_or_none(info.get('episode')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
'subtitles': subtitles,
}
class DPlayItIE(InfoExtractor):
_VALID_URL = r'https?://it\.dplay\.com/[^/]+/[^/]+/(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['IT']
_TEST = {
'url': 'http://it.dplay.com/nove/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij/',
'md5': '2b808ffb00fc47b884a172ca5d13053c',
'info_dict': {
'id': '6918',
'display_id': 'luigi-di-maio-la-psicosi-di-stanislawskij',
'ext': 'mp4',
'title': 'Biografie imbarazzanti: Luigi Di Maio: la psicosi di Stanislawskij',
'description': 'md5:3c7a4303aef85868f867a26f5cc14813',
'thumbnail': r're:^https?://.*\.jpe?g',
'upload_date': '20160524',
'series': 'Biografie imbarazzanti',
'season_number': 1,
'episode': 'Luigi Di Maio: la psicosi di Stanislawskij',
'episode_number': 1,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = remove_end(self._og_search_title(webpage), ' | Dplay')
video_id = None
info = self._search_regex(
r'playback_json\s*:\s*JSON\.parse\s*\(\s*("(?:\\.|[^"\\])+?")',
webpage, 'playback JSON', default=None)
if info:
for _ in range(2):
info = self._parse_json(info, display_id, fatal=False)
if not info:
break
else:
video_id = try_get(info, lambda x: x['data']['id'])
if not info:
info_url = self._search_regex(
(r'playback_json_url\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'url\s*[:=]\s*["\'](?P<url>(?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)'),
webpage, 'info url', group='url')
info_url = urljoin(url, info_url)
video_id = info_url.rpartition('/')[-1]
try:
info = self._download_json(
info_url, display_id, headers={
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
'dplayit_token').value,
'Referer': url,
})
if isinstance(info, compat_str):
info = self._parse_json(info, display_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
if error.get('code') == 'access.denied.geoblocked':
self.raise_geo_restricted(
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
hls_url = info['data']['attributes']['streaming']['hls']['url']
formats = self._extract_m3u8_formats(
hls_url, display_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
series = self._html_search_regex(
r'(?s)<h1[^>]+class=["\'].*?\bshow_title\b.*?["\'][^>]*>(.+?)</h1>',
webpage, 'series', fatal=False)
episode = self._search_regex(
r'<p[^>]+class=["\'].*?\bdesc_ep\b.*?["\'][^>]*>\s*<br/>\s*<b>([^<]+)',
webpage, 'episode', fatal=False)
mobj = re.search(
r'(?s)<span[^>]+class=["\']dates["\'][^>]*>.+?\bS\.(?P<season_number>\d+)\s+E\.(?P<episode_number>\d+)\s*-\s*(?P<upload_date>\d{2}/\d{2}/\d{4})',
webpage)
if mobj:
season_number = int(mobj.group('season_number'))
episode_number = int(mobj.group('episode_number'))
upload_date = unified_strdate(mobj.group('upload_date'))
else:
season_number = episode_number = upload_date = None
return {
'id': compat_str(video_id or display_id),
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'upload_date': upload_date,
'formats': formats,
}
domain = mobj.group('domain').lstrip('www.')
country = mobj.group('country') or mobj.group('subdomain_country')
host = 'disco-api.' + domain if domain.startswith('dplay.') else 'eu2-prod.disco-api.com'
return self._get_disco_api_info(
url, display_id, host, 'dplay' + country, country)

View File

@ -17,6 +17,7 @@ from ..utils import (
float_or_none,
mimetype2ext,
str_or_none,
try_get,
unified_timestamp,
update_url_query,
url_or_none,
@ -24,7 +25,14 @@ from ..utils import (
class DRTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*|
(?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode)/
)
(?P<id>[\da-z_-]+)
'''
_GEO_BYPASS = False
_GEO_COUNTRIES = ['DK']
IE_NAME = 'drtv'
@ -83,6 +91,26 @@ class DRTVIE(InfoExtractor):
}, {
'url': 'https://www.dr.dk/radio/p4kbh/regionale-nyheder-kh4/p4-nyheder-2019-06-26-17-30-9',
'only_matching': True,
}, {
'url': 'https://www.dr.dk/drtv/se/bonderoeven_71769',
'info_dict': {
'id': '00951930010',
'ext': 'mp4',
'title': 'Bonderøven (1:8)',
'description': 'md5:3cf18fc0d3b205745d4505f896af8121',
'timestamp': 1546542000,
'upload_date': '20190103',
'duration': 2576.6,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dr.dk/drtv/episode/bonderoeven_71769',
'only_matching': True,
}, {
'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769',
'only_matching': True,
}]
def _real_extract(self, url):
@ -100,13 +128,32 @@ class DRTVIE(InfoExtractor):
webpage, 'video id', default=None)
if not video_id:
video_id = compat_urllib_parse_unquote(self._search_regex(
video_id = self._search_regex(
r'(urn(?:%3A|:)dr(?:%3A|:)mu(?:%3A|:)programcard(?:%3A|:)[\da-f]+)',
webpage, 'urn'))
webpage, 'urn', default=None)
if video_id:
video_id = compat_urllib_parse_unquote(video_id)
_PROGRAMCARD_BASE = 'https://www.dr.dk/mu-online/api/1.4/programcard'
query = {'expanded': 'true'}
if video_id:
programcard_url = '%s/%s' % (_PROGRAMCARD_BASE, video_id)
else:
programcard_url = _PROGRAMCARD_BASE
page = self._parse_json(
self._search_regex(
r'data\s*=\s*({.+?})\s*(?:;|</script)', webpage,
'data'), '1')['cache']['page']
page = page[list(page.keys())[0]]
item = try_get(
page, (lambda x: x['item'], lambda x: x['entries'][0]['item']),
dict)
video_id = item['customId'].split(':')[-1]
query['productionnumber'] = video_id
data = self._download_json(
'https://www.dr.dk/mu-online/api/1.4/programcard/%s' % video_id,
video_id, 'Downloading video JSON', query={'expanded': 'true'})
programcard_url, video_id, 'Downloading video JSON', query=query)
title = str_or_none(data.get('Title')) or re.sub(
r'\s*\|\s*(?:TV\s*\|\s*DR|DRTV)$', '',

View File

@ -1,20 +1,17 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import (
int_or_none,
qualities,
sanitized_Request,
)
class DumpertIE(InfoExtractor):
_VALID_URL = r'(?P<protocol>https?)://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
_VALID_URL = r'(?P<protocol>https?)://(?:(?:www|legacy)\.)?dumpert\.nl/(?:mediabase|embed|item)/(?P<id>[0-9]+[/_][0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://www.dumpert.nl/mediabase/6646981/951bc60f/',
'url': 'https://www.dumpert.nl/item/6646981_951bc60f',
'md5': '1b9318d7d5054e7dcb9dc7654f21d643',
'info_dict': {
'id': '6646981/951bc60f',
@ -24,46 +21,60 @@ class DumpertIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
}
}, {
'url': 'http://www.dumpert.nl/embed/6675421/dc440fe7/',
'url': 'https://www.dumpert.nl/embed/6675421_dc440fe7',
'only_matching': True,
}, {
'url': 'http://legacy.dumpert.nl/mediabase/6646981/951bc60f',
'only_matching': True,
}, {
'url': 'http://legacy.dumpert.nl/embed/6675421/dc440fe7',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
protocol = mobj.group('protocol')
url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id)
req = sanitized_Request(url)
req.add_header('Cookie', 'nsfw=1; cpc=10')
webpage = self._download_webpage(req, video_id)
files_base64 = self._search_regex(
r'data-files="([^"]+)"', webpage, 'data files')
files = self._parse_json(
compat_b64decode(files_base64).decode('utf-8'),
video_id)
video_id = self._match_id(url).replace('_', '/')
item = self._download_json(
'http://api-live.dumpert.nl/mobile_api/json/info/' + video_id.replace('/', '_'),
video_id)['items'][0]
title = item['title']
media = next(m for m in item['media'] if m.get('mediatype') == 'VIDEO')
quality = qualities(['flv', 'mobile', 'tablet', '720p'])
formats = [{
'url': video_url,
'format_id': format_id,
'quality': quality(format_id),
} for format_id, video_url in files.items() if format_id != 'still']
formats = []
for variant in media.get('variants', []):
uri = variant.get('uri')
if not uri:
continue
version = variant.get('version')
formats.append({
'url': uri,
'format_id': version,
'quality': quality(version),
})
self._sort_formats(formats)
title = self._html_search_meta(
'title', webpage) or self._og_search_title(webpage)
description = self._html_search_meta(
'description', webpage) or self._og_search_description(webpage)
thumbnail = files.get('still') or self._og_search_thumbnail(webpage)
thumbnails = []
stills = item.get('stills') or {}
for t in ('thumb', 'still'):
for s in ('', '-medium', '-large'):
still_id = t + s
still_url = stills.get(still_id)
if not still_url:
continue
thumbnails.append({
'id': still_id,
'url': still_url,
})
stats = item.get('stats') or {}
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats
'description': item.get('description'),
'thumbnails': thumbnails,
'formats': formats,
'duration': int_or_none(media.get('duration')),
'like_count': int_or_none(stats.get('kudos_total')),
'view_count': int_or_none(stats.get('views_total')),
}

View File

@ -19,7 +19,7 @@ from ..utils import (
class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com))/movie/watch/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com|ca))/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
@ -36,6 +36,9 @@ class EinthusanIE(InfoExtractor):
}, {
'url': 'https://einthusan.com/movie/watch/9097/',
'only_matching': True,
}, {
'url': 'https://einthusan.ca/movie/watch/4E9n/?lang=hindi',
'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js

View File

@ -18,10 +18,10 @@ from .acast import (
ACastIE,
ACastChannelIE,
)
from .addanime import AddAnimeIE
from .adn import ADNIE
from .adobeconnect import AdobeConnectIE
from .adobetv import (
AdobeTVEmbedIE,
AdobeTVIE,
AdobeTVShowIE,
AdobeTVChannelIE,
@ -80,7 +80,6 @@ from .awaan import (
)
from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import (
BBCCoUkIE,
@ -104,6 +103,8 @@ from .bild import BildIE
from .bilibili import (
BiliBiliIE,
BiliBiliBangumiIE,
BilibiliAudioIE,
BilibiliAudioAlbumIE,
)
from .biobiochiletv import BioBioChileTVIE
from .bitchute import (
@ -222,13 +223,13 @@ from .comedycentral import (
ComedyCentralTVIE,
ToshIE,
)
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import (
MmsIE,
RtmpIE,
)
from .condenast import CondeNastIE
from .contv import CONtvIE
from .corus import CorusIE
from .cracked import CrackedIE
from .crackle import CrackleIE
@ -253,10 +254,6 @@ from .dailymotion import (
DailymotionPlaylistIE,
DailymotionUserIE,
)
from .daisuki import (
DaisukiMottoIE,
DaisukiMottoPlaylistIE,
)
from .daum import (
DaumIE,
DaumClipIE,
@ -275,10 +272,7 @@ from .douyutv import (
DouyuShowIE,
DouyuTVIE,
)
from .dplay import (
DPlayIE,
DPlayItIE,
)
from .dplay import DPlayIE
from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE
@ -357,7 +351,6 @@ from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
from .fivetv import FiveTVIE
from .flickr import FlickrIE
from .flipagram import FlipagramIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
@ -368,7 +361,10 @@ from .fourtube import (
FuxIE,
)
from .fox import FOXIE
from .fox9 import FOX9IE
from .fox9 import (
FOX9IE,
FOX9NewsIE,
)
from .foxgay import FoxgayIE
from .foxnews import (
FoxNewsIE,
@ -401,10 +397,6 @@ from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
from .gameinformer import GameInformerIE
from .gameone import (
GameOneIE,
GameOnePlaylistIE,
)
from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
@ -420,7 +412,6 @@ from .globo import (
GloboArticleIE,
)
from .go import GoIE
from .go90 import Go90IE
from .godtube import GodTubeIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
@ -429,7 +420,6 @@ from .googlesearch import GoogleSearchIE
from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
@ -461,7 +451,6 @@ from .hungama import (
HungamaSongIE,
)
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import (
IGNIE,
OneUPIE,
@ -520,8 +509,8 @@ from .keezmovies import KeezMoviesIE
from .ketnet import KetnetIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE
from .keek import KeekIE
from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
@ -547,7 +536,6 @@ from .lcp import (
LcpPlayIE,
LcpIE,
)
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lecturio import (
LecturioIE,
@ -599,13 +587,11 @@ from .lynda import (
LyndaCourseIE
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import (
MailRuIE,
MailRuMusicIE,
MailRuMusicSearchIE,
)
from .makertv import MakerTVIE
from .malltv import MallTVIE
from .mangomolo import (
MangomoloVideoIE,
@ -639,17 +625,15 @@ from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE,
)
from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
from .miomio import MioMioIE
from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mit import TechTVMITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import (
MixcloudIE,
MixcloudUserIE,
MixcloudPlaylistIE,
MixcloudStreamIE,
)
from .mlb import MLBIE
from .mnet import MnetIE
@ -671,7 +655,7 @@ from .mtv import (
MTVVideoIE,
MTVServicesEmbeddedIE,
MTVDEIE,
MTV81IE,
MTVJapanIE,
)
from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE
@ -814,10 +798,6 @@ from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openload import (
OpenloadIE,
VerystreamIE,
)
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
@ -892,7 +872,6 @@ from .puhutv import (
PuhuTVSerieIE,
)
from .presstv import PressTVIE
from .promptfile import PromptFileIE
from .prosiebensat1 import ProSiebenSat1IE
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
@ -944,10 +923,6 @@ from .rentv import (
from .restudy import RestudyIE
from .reuters import ReutersIE
from .reverbnation import ReverbNationIE
from .revision3 import (
Revision3EmbedIE,
Revision3IE,
)
from .rice import RICEIE
from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE
@ -992,10 +967,13 @@ from .sbs import SBSIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .scrippsnetworks import ScrippsNetworksWatchIE
from .scte import (
SCTEIE,
SCTECourseIE,
)
from .seeker import SeekerIE
from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE
from .servus import ServusIE
from .sevenplus import SevenPlusIE
from .sexu import SexuIE
@ -1036,6 +1014,7 @@ from .snotr import SnotrIE
from .sohu import SohuIE
from .sonyliv import SonyLIVIE
from .soundcloud import (
SoundcloudEmbedIE,
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
@ -1079,7 +1058,6 @@ from .srmediathek import SRMediathekIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamable import StreamableIE
from .streamango import StreamangoIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
@ -1128,12 +1106,14 @@ from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import (
TeleQuebecIE,
TeleQuebecSquatIE,
TeleQuebecEmissionIE,
TeleQuebecLiveIE,
)
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .tennistv import TennisTVIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .tfo import TFOIE
@ -1186,11 +1166,12 @@ from .tunein import (
)
from .tunepk import TunePkIE
from .turbo import TurboIE
from .tutv import TutvIE
from .tv2 import (
TV2IE,
TV2ArticleIE,
KatsomoIE,
)
from .tv2dk import TV2DKIE
from .tv2hu import TV2HuIE
from .tv4 import TV4IE
from .tv5mondeplus import TV5MondePlusIE
@ -1247,6 +1228,7 @@ from .twitter import (
TwitterCardIE,
TwitterIE,
TwitterAmplifyIE,
TwitterBroadcastIE,
)
from .udemy import (
UdemyIE,
@ -1281,7 +1263,6 @@ from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vessel import VesselIE
from .vesti import VestiIE
from .vevo import (
VevoIE,
@ -1323,7 +1304,6 @@ from .viewlift import (
ViewLiftIE,
ViewLiftEmbedIE,
)
from .viewster import ViewsterIE
from .viidea import ViideaIE
from .vimeo import (
VimeoIE,
@ -1412,7 +1392,6 @@ from .weibo import (
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wsj import (
@ -1426,6 +1405,7 @@ from .xfileshare import XFileShareIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
XHamsterUserIE,
)
from .xiami import (
XiamiSongIE,

View File

@ -334,7 +334,7 @@ class FacebookIE(InfoExtractor):
if not video_data:
server_js_data = self._parse_json(
self._search_regex(
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall|permalink_video_pagelet)',
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_\d+)',
webpage, 'js data', default='{}'),
video_id, transform_source=js_to_json, fatal=False)
video_data = extract_from_jsmods_instances(server_js_data)
@ -379,6 +379,7 @@ class FacebookIE(InfoExtractor):
if not video_data:
raise ExtractorError('Cannot parse data')
subtitles = {}
formats = []
for f in video_data:
format_id = f['stream_type']
@ -402,9 +403,17 @@ class FacebookIE(InfoExtractor):
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
subtitles_src = f[0].get('subtitles_src')
if subtitles_src:
subtitles.setdefault('en', []).append({'url': subtitles_src})
if not formats:
raise ExtractorError('Cannot find video formats')
# Downloads with browser's User-Agent are rate limited. Working around
# with non-browser User-Agent.
for f in formats:
f.setdefault('http_headers', {})['User-Agent'] = 'facebookexternalhit/1.1'
self._sort_formats(formats)
video_title = self._html_search_regex(
@ -442,6 +451,7 @@ class FacebookIE(InfoExtractor):
'timestamp': timestamp,
'thumbnail': thumbnail,
'view_count': view_count,
'subtitles': subtitles,
}
return webpage, info_dict

View File

@ -1,115 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
try_get,
unified_timestamp,
)
class FlipagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://flipagram.com/f/nyvTSJMKId',
'md5': '888dcf08b7ea671381f00fab74692755',
'info_dict': {
'id': 'nyvTSJMKId',
'ext': 'mp4',
'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
'duration': 35.571,
'timestamp': 1461244995,
'upload_date': '20160421',
'uploader': 'kitty juria',
'uploader_id': 'sjuria101',
'creator': 'kitty juria',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'comments': list,
'formats': 'mincount:2',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
video_id)
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, default={})
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')
formats = [{
'url': video['url'],
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': int_or_none(video_data.get('size')),
}]
preview_url = try_get(
flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
if preview_url:
formats.append({
'url': preview_url,
'ext': 'm4a',
'vcodec': 'none',
})
self._sort_formats(formats)
counts = flipagram.get('counts', {})
user = flipagram.get('user', {})
video_data = flipagram.get('video', {})
thumbnails = [{
'url': self._proto_relative_url(cover['url']),
'width': int_or_none(cover.get('width')),
'height': int_or_none(cover.get('height')),
'filesize': int_or_none(cover.get('size')),
} for cover in flipagram.get('covers', []) if cover.get('url')]
# Note that this only retrieves comments that are initially loaded.
# For videos with large amounts of comments, most won't be retrieved.
comments = []
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
text = comment.get('comment')
if not text or not isinstance(text, list):
continue
comments.append({
'author': comment.get('user', {}).get('name'),
'author_id': comment.get('user', {}).get('username'),
'id': comment.get('id'),
'text': text[0],
'timestamp': unified_timestamp(comment.get('created')),
})
return {
'id': video_id,
'title': title,
'description': description,
'duration': float_or_none(flipagram.get('duration'), 1000),
'thumbnails': thumbnails,
'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
'uploader': user.get('name'),
'uploader_id': user.get('username'),
'creator': user.get('name'),
'view_count': int_or_none(counts.get('plays')),
'like_count': int_or_none(counts.get('likes')),
'repost_count': int_or_none(counts.get('reflips')),
'comment_count': int_or_none(counts.get('comments')),
'comments': comments,
'formats': formats,
}

View File

@ -1,13 +1,23 @@
# coding: utf-8
from __future__ import unicode_literals
from .anvato import AnvatoIE
from .common import InfoExtractor
class FOX9IE(AnvatoIE):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/(?:[^/]+/)+(?P<id>\d+)-story'
_TESTS = [{
'url': 'http://www.fox9.com/news/215123287-story',
class FOX9IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/video/(?P<id>\d+)'
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'anvato:anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b:' + video_id,
'Anvato', video_id)
class FOX9NewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/news/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.fox9.com/news/black-bear-in-tree-draws-crowd-in-downtown-duluth-minnesota',
'md5': 'd6e1b2572c3bab8a849c9103615dd243',
'info_dict': {
'id': '314473',
@ -21,22 +31,11 @@ class FOX9IE(AnvatoIE):
'categories': ['News', 'Sports'],
'tags': ['news', 'video'],
},
}, {
'url': 'http://www.fox9.com/news/investigators/214070684-story',
'only_matching': True,
}]
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._parse_json(
self._search_regex(
r"this\.videosJson\s*=\s*'(\[.+?\])';",
webpage, 'anvato playlist'),
video_id)[0]['video']
return self._get_anvato_videos(
'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b',
video_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
anvato_id = self._search_regex(
r'anvatoId\s*:\s*[\'"](\d+)', webpage, 'anvato id')
return self.url_result('https://www.fox9.com/video/' + anvato_id, 'FOX9')

View File

@ -1,134 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
xpath_with_ns,
parse_iso8601,
float_or_none,
int_or_none,
)
NAMESPACE_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
# URL prefix to download the mp4 files directly instead of streaming via rtmp
# Credits go to XBox-Maniac
# http://board.jdownloader.org/showpost.php?p=185835&postcount=31
RAW_MP4_URL = 'http://cdn.riptide-mtvn.com/'
class GameOneIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameone\.de/tv/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://www.gameone.de/tv/288',
'md5': '136656b7fb4c9cb4a8e2d500651c499b',
'info_dict': {
'id': '288',
'ext': 'mp4',
'title': 'Game One - Folge 288',
'duration': 1238,
'thumbnail': 'http://s3.gameone.de/gameone/assets/video_metas/teaser_images/000/643/636/big/640x360.jpg',
'description': 'FIFA-Pressepokal 2014, Star Citizen, Kingdom Come: Deliverance, Project Cars, Schöner Trants Nerdquiz Folge 2 Runde 1',
'age_limit': 16,
'upload_date': '20140513',
'timestamp': 1399980122,
}
},
{
'url': 'http://gameone.de/tv/220',
'md5': '5227ca74c4ae6b5f74c0510a7c48839e',
'info_dict': {
'id': '220',
'ext': 'mp4',
'upload_date': '20120918',
'description': 'Jet Set Radio HD, Tekken Tag Tournament 2, Source Filmmaker',
'timestamp': 1347971451,
'title': 'Game One - Folge 220',
'duration': 896.62,
'age_limit': 16,
}
}
]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage, secure=False)
description = self._html_search_meta('description', webpage)
age_limit = int(
self._search_regex(
r'age=(\d+)',
self._html_search_meta(
'age-de-meta-label',
webpage),
'age_limit',
'0'))
mrss_url = self._search_regex(r'mrss=([^&]+)', og_video, 'mrss')
mrss = self._download_xml(mrss_url, video_id, 'Downloading mrss')
title = mrss.find('.//item/title').text
thumbnail = mrss.find('.//item/image').get('url')
timestamp = parse_iso8601(mrss.find('.//pubDate').text, delimiter=' ')
content = mrss.find(xpath_with_ns('.//media:content', NAMESPACE_MAP))
content_url = content.get('url')
content = self._download_xml(
content_url,
video_id,
'Downloading media:content')
rendition_items = content.findall('.//rendition')
duration = float_or_none(rendition_items[0].get('duration'))
formats = [
{
'url': re.sub(r'.*/(r2)', RAW_MP4_URL + r'\1', r.find('./src').text),
'width': int_or_none(r.get('width')),
'height': int_or_none(r.get('height')),
'tbr': int_or_none(r.get('bitrate')),
}
for r in rendition_items
]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'description': description,
'age_limit': age_limit,
'timestamp': timestamp,
}
class GameOnePlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameone\.de(?:/tv)?/?$'
IE_NAME = 'gameone:playlist'
_TEST = {
'url': 'http://www.gameone.de/tv',
'info_dict': {
'title': 'GameOne',
},
'playlist_mincount': 294,
}
def _real_extract(self, url):
webpage = self._download_webpage('http://www.gameone.de/tv', 'TV')
max_id = max(map(int, re.findall(r'<a href="/tv/(\d+)"', webpage)))
entries = [
self.url_result('http://www.gameone.de/tv/%d' %
video_id, 'GameOne')
for video_id in range(max_id, 0, -1)]
return {
'_type': 'playlist',
'title': 'GameOne',
'entries': entries,
}

View File

@ -77,11 +77,10 @@ from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .vessel import VesselIE
from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE
from .soundcloud import SoundcloudIE
from .soundcloud import SoundcloudEmbedIE
from .tunein import TuneInBaseIE
from .vbox7 import Vbox7IE
from .dbtv import DBTVIE
@ -89,10 +88,6 @@ from .piksel import PikselIE
from .videa import VideaIE
from .twentymin import TwentyMinutenIE
from .ustream import UstreamIE
from .openload import (
OpenloadIE,
VerystreamIE,
)
from .videopress import VideoPressIE
from .rutube import RutubeIE
from .limelight import LimelightBaseIE
@ -119,6 +114,8 @@ from .foxnews import FoxNewsIE
from .viqeo import ViqeoIE
from .expressen import ExpressenIE
from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE
from .kinja import KinjaEmbedIE
class GenericIE(InfoExtractor):
@ -1487,16 +1484,18 @@ class GenericIE(InfoExtractor):
'timestamp': 1432570283,
},
},
# OnionStudios embed
# Kinja embed
{
'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537',
'info_dict': {
'id': '2855',
'id': '106351',
'ext': 'mp4',
'title': 'Dont Understand Bitcoin? This Man Will Mumble An Explanation At You',
'description': 'Migrated from OnionStudios',
'thumbnail': r're:^https?://.*\.jpe?g$',
'uploader': 'ClickHole',
'uploader_id': 'clickhole',
'uploader': 'clickhole',
'upload_date': '20150527',
'timestamp': 1432744860,
}
},
# SnagFilms embed
@ -2075,6 +2074,22 @@ class GenericIE(InfoExtractor):
},
'playlist_count': 6,
},
{
# Squarespace video embed, 2019-08-28
'url': 'http://ootboxford.com',
'info_dict': {
'id': 'Tc7b_JGdZfw',
'title': 'Out of the Blue, at Childish Things 10',
'ext': 'mp4',
'description': 'md5:a83d0026666cf5ee970f8bd1cfd69c7f',
'uploader_id': 'helendouglashouse',
'uploader': 'Helen & Douglas House',
'upload_date': '20140328',
},
'params': {
'skip_download': True,
},
},
{
# Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
@ -2395,6 +2410,12 @@ class GenericIE(InfoExtractor):
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage)
# Unescape squarespace embeds to be detected by generic extractor,
# see https://github.com/ytdl-org/youtube-dl/issues/21294
webpage = re.sub(
r'<div[^>]+class=[^>]*?\bsqs-video-wrapper\b[^>]*>',
lambda x: unescapeHTML(x.group(0)), webpage)
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
@ -2469,11 +2490,6 @@ class GenericIE(InfoExtractor):
if tp_urls:
return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
# Look for Vessel embeds
vessel_urls = VesselIE._extract_urls(webpage)
if vessel_urls:
return self.playlist_from_matches(vessel_urls, video_id, video_title, ie=VesselIE.ie_key())
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@ -2611,9 +2627,9 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'VK')
# Look for embedded Odnoklassniki player
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Odnoklassniki')
odnoklassniki_url = OdnoklassnikiIE._extract_url(webpage)
if odnoklassniki_url:
return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
# Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
@ -2732,9 +2748,9 @@ class GenericIE(InfoExtractor):
return self.url_result(myvi_url)
# Look for embedded soundcloud player
soundcloud_urls = SoundcloudIE._extract_urls(webpage)
soundcloud_urls = SoundcloudEmbedIE._extract_urls(webpage)
if soundcloud_urls:
return self.playlist_from_matches(soundcloud_urls, video_id, video_title, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
return self.playlist_from_matches(soundcloud_urls, video_id, video_title, getter=unescapeHTML)
# Look for tunein player
tunein_urls = TuneInBaseIE._extract_urls(webpage)
@ -2877,6 +2893,12 @@ class GenericIE(InfoExtractor):
if senate_isvp_url:
return self.url_result(senate_isvp_url, 'SenateISVP')
# Look for Kinja embeds
kinja_embed_urls = KinjaEmbedIE._extract_urls(webpage, url)
if kinja_embed_urls:
return self.playlist_from_matches(
kinja_embed_urls, video_id, video_title)
# Look for OnionStudios embeds
onionstudios_url = OnionStudiosIE._extract_url(webpage)
if onionstudios_url:
@ -2946,10 +2968,14 @@ class GenericIE(InfoExtractor):
# Look for Mangomolo embeds
mobj = re.search(
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//
(?:
admin\.mangomolo\.com/analytics/index\.php/customers/embed|
player\.mangomolo\.com/v1
)/
(?:
video\?.*?\bid=(?P<video_id>\d+)|
index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
(?:index|live)\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
).+?)\1''', webpage)
if mobj is not None:
info = {
@ -3018,18 +3044,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
twentymin_urls, video_id, video_title, ie=TwentyMinutenIE.ie_key())
# Look for Openload embeds
openload_urls = OpenloadIE._extract_urls(webpage)
if openload_urls:
return self.playlist_from_matches(
openload_urls, video_id, video_title, ie=OpenloadIE.ie_key())
# Look for Verystream embeds
verystream_urls = VerystreamIE._extract_urls(webpage)
if verystream_urls:
return self.playlist_from_matches(
verystream_urls, video_id, video_title, ie=VerystreamIE.ie_key())
# Look for VideoPress embeds
videopress_urls = VideoPressIE._extract_urls(webpage)
if videopress_urls:

View File

@ -11,7 +11,7 @@ from ..utils import (
class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
_VALID_URL = r'https?://(?:(?:www|giant|thumbs)\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#\.]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
@ -53,6 +53,12 @@ class GfycatIE(InfoExtractor):
}, {
'url': 'https://gfycat.com/acceptablehappygoluckyharborporpoise-baseball',
'only_matching': True
}, {
'url': 'https://thumbs.gfycat.com/acceptablehappygoluckyharborporpoise-size_restricted.gif',
'only_matching': True
}, {
'url': 'https://giant.gfycat.com/acceptablehappygoluckyharborporpoise.mp4',
'only_matching': True
}]
def _real_extract(self, url):

View File

@ -96,21 +96,31 @@ class GloboIE(InfoExtractor):
video = self._download_json(
'http://api.globovideos.com/videos/%s/playlist' % video_id,
video_id)['videos'][0]
if video.get('encrypted') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
title = video['title']
formats = []
subtitles = {}
for resource in video['resources']:
resource_id = resource.get('_id')
resource_url = resource.get('url')
if not resource_id or not resource_url:
resource_type = resource.get('type')
if not resource_url or (resource_type == 'media' and not resource_id) or resource_type not in ('subtitle', 'media'):
continue
if resource_type == 'subtitle':
subtitles.setdefault(resource.get('language') or 'por', []).append({
'url': resource_url,
})
continue
security = self._download_json(
'http://security.video.globo.com/videos/%s/hash' % video_id,
video_id, 'Downloading security hash for %s' % resource_id, query={
'player': 'flash',
'version': '17.0.0.132',
'player': 'desktop',
'version': '5.19.1',
'resource_id': resource_id,
})
@ -123,18 +133,23 @@ class GloboIE(InfoExtractor):
continue
hash_code = security_hash[:2]
received_time = security_hash[2:12]
received_random = security_hash[12:22]
received_md5 = security_hash[22:]
sign_time = compat_str(int(received_time) + 86400)
padding = '%010d' % random.randint(1, 10000000000)
if hash_code in ('04', '14'):
received_time = security_hash[3:13]
received_md5 = security_hash[24:]
hash_prefix = security_hash[:23]
elif hash_code in ('02', '12', '03', '13'):
received_time = security_hash[2:12]
received_md5 = security_hash[22:]
padding += '1'
hash_prefix = '05' + security_hash[:22]
md5_data = (received_md5 + sign_time + padding + '0xFF01DD').encode()
padded_sign_time = compat_str(int(received_time) + 86400) + padding
md5_data = (received_md5 + padded_sign_time + '0xAC10FD').encode()
signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
signed_hash = hash_code + received_time + received_random + sign_time + padding + signed_md5
signed_hash = hash_prefix + padded_sign_time + signed_md5
signed_url = '%s?h=%s&k=html5&a=%s&u=%s' % (resource_url, signed_hash, 'F' if video.get('subscriber_only') else 'A', security.get('user') or '')
signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats(
signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
@ -164,7 +179,8 @@ class GloboIE(InfoExtractor):
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats
'formats': formats,
'subtitles': subtitles,
}

View File

@ -40,8 +40,17 @@ class GoIE(AdobePassIE):
'resource_id': 'Disney',
}
}
_VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
% '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
_VALID_URL = r'''(?x)
https?://
(?:
(?:(?P<sub_domain>%s)\.)?go|
(?P<sub_domain_2>abc|freeform|disneynow)
)\.com/
(?:
(?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)|
(?:[^/]+/)*(?P<display_id>[^/?\#]+)
)
''' % '|'.join(list(_SITE_INFO.keys()))
_TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
'info_dict': {
@ -54,6 +63,7 @@ class GoIE(AdobePassIE):
# m3u8 download
'skip_download': True,
},
'skip': 'This content is no longer available.',
}, {
'url': 'http://watchdisneyxd.go.com/doraemon',
'info_dict': {
@ -61,6 +71,34 @@ class GoIE(AdobePassIE):
'id': 'SH55574025',
},
'playlist_mincount': 51,
}, {
'url': 'http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood',
'info_dict': {
'id': 'VDKA3609139',
'ext': 'mp4',
'title': 'This Guilty Blood',
'description': 'md5:f18e79ad1c613798d95fdabfe96cd292',
'age_limit': 14,
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet',
'info_dict': {
'id': 'VDKA13435179',
'ext': 'mp4',
'title': 'The Bet',
'description': 'md5:c66de8ba2e92c6c5c113c3ade84ab404',
'age_limit': 14,
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True,
@ -95,10 +133,13 @@ class GoIE(AdobePassIE):
if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id)
video_id = self._search_regex(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)', webpage, 'video id',
default=video_id)
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
), webpage, 'video id', default=video_id)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',

View File

@ -1,149 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
)
class Go90IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?go90\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://www.go90.com/videos/84BUqjLpf9D',
'md5': 'efa7670dbbbf21a7b07b360652b24a32',
'info_dict': {
'id': '84BUqjLpf9D',
'ext': 'mp4',
'title': 'Daily VICE - Inside The Utah Coalition Against Pornography Convention',
'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.',
'timestamp': 1491868800,
'upload_date': '20170411',
'age_limit': 14,
}
}, {
'url': 'https://www.go90.com/embed/261MflWkD3N',
'only_matching': True,
}]
_GEO_BYPASS = False
def _real_extract(self, url):
video_id = self._match_id(url)
try:
headers = self.geo_verification_headers()
headers.update({
'Content-Type': 'application/json; charset=utf-8',
})
video_data = self._download_json(
'https://www.go90.com/api/view/items/' + video_id, video_id,
headers=headers, data=b'{"client":"web","device_type":"pc"}')
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
message = self._parse_json(e.cause.read().decode(), None)['error']['message']
if 'region unavailable' in message:
self.raise_geo_restricted(countries=['US'])
raise ExtractorError(message, expected=True)
raise
if video_data.get('requires_drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
main_video_asset = video_data['main_video_asset']
episode_number = int_or_none(video_data.get('episode_number'))
series = None
season = None
season_id = None
season_number = None
for metadata in video_data.get('__children', {}).get('Item', {}).values():
if metadata.get('type') == 'show':
series = metadata.get('title')
elif metadata.get('type') == 'season':
season = metadata.get('title')
season_id = metadata.get('id')
season_number = int_or_none(metadata.get('season_number'))
title = episode = video_data.get('title') or series
if series and series != title:
title = '%s - %s' % (series, title)
thumbnails = []
formats = []
subtitles = {}
for asset in video_data.get('assets'):
if asset.get('id') == main_video_asset:
for source in asset.get('sources', []):
source_location = source.get('location')
if not source_location:
continue
source_type = source.get('type')
if source_type == 'hls':
m3u8_formats = self._extract_m3u8_formats(
source_location, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
for f in m3u8_formats:
mobj = re.search(r'/hls-(\d+)-(\d+)K', f['url'])
if mobj:
height, tbr = mobj.groups()
height = int_or_none(height)
f.update({
'height': f.get('height') or height,
'width': f.get('width') or int_or_none(height / 9.0 * 16.0 if height else None),
'tbr': f.get('tbr') or int_or_none(tbr),
})
formats.extend(m3u8_formats)
elif source_type == 'dash':
formats.extend(self._extract_mpd_formats(
source_location, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'format_id': source.get('name'),
'url': source_location,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('bitrate')),
})
for caption in asset.get('caption_metadata', []):
caption_url = caption.get('source_url')
if not caption_url:
continue
subtitles.setdefault(caption.get('language', 'en'), []).append({
'url': caption_url,
'ext': determine_ext(caption_url, 'vtt'),
})
elif asset.get('type') == 'image':
asset_location = asset.get('location')
if not asset_location:
continue
thumbnails.append({
'url': asset_location,
'width': int_or_none(asset.get('width')),
'height': int_or_none(asset.get('height')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video_data.get('short_description'),
'like_count': int_or_none(video_data.get('like_count')),
'timestamp': parse_iso8601(video_data.get('released_at')),
'series': series,
'episode': episode,
'season': season,
'season_id': season_id,
'season_number': season_number,
'episode_number': episode_number,
'subtitles': subtitles,
'age_limit': parse_age_limit(video_data.get('rating')),
}

View File

@ -1,33 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hark\.com/clips/(?P<id>.+?)-.+'
_TEST = {
'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
'md5': '6783a58491b47b92c7c1af5a77d4cbee',
'info_dict': {
'id': 'mmbzyhkgny',
'ext': 'mp3',
'title': 'Obama: \'Beyond The Afghan Theater, We Only Target Al Qaeda\' on May 23, 2013',
'description': 'President Barack Obama addressed the nation live on May 23, 2013 in a speech aimed at addressing counter-terrorism policies including the use of drone strikes, detainees at Guantanamo Bay prison facility, and American citizens who are terrorists.',
'duration': 11,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://www.hark.com/clips/%s.json' % video_id, video_id)
return {
'id': video_id,
'url': data['url'],
'title': data['name'],
'description': data.get('description'),
'thumbnail': data.get('image_original'),
'duration': data.get('duration'),
}

View File

@ -105,8 +105,7 @@ class HeiseIE(InfoExtractor):
webpage, default=None) or self._html_search_meta(
'description', webpage)
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
def _make_kaltura_result(kaltura_url):
return {
'_type': 'url_transparent',
'url': smuggle_url(kaltura_url, {'source_url': url}),
@ -115,6 +114,16 @@ class HeiseIE(InfoExtractor):
'description': description,
}
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
return _make_kaltura_result(kaltura_url)
kaltura_id = self._search_regex(
r'entry-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'kaltura id',
default=None, group='id')
if kaltura_id:
return _make_kaltura_result('kaltura:2238431:%s' % kaltura_id)
yt_urls = YoutubeIE._extract_urls(webpage)
if yt_urls:
return self.playlist_from_matches(

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import hashlib
import hmac
import re
import time
import uuid
@ -117,6 +118,7 @@ class HotStarIE(HotStarBaseIE):
if video_data.get('drmProtected'):
raise ExtractorError('This video is DRM protected.', expected=True)
headers = {'Referer': url}
formats = []
geo_restricted = False
playback_sets = self._call_api_v2('h/v2/play', video_id)['playBackSets']
@ -126,6 +128,8 @@ class HotStarIE(HotStarBaseIE):
format_url = url_or_none(playback_set.get('playbackUrl'))
if not format_url:
continue
format_url = re.sub(
r'(?<=//staragvod)(\d)', r'web\1', format_url)
tags = str_or_none(playback_set.get('tagsCombination')) or ''
if tags and 'encryption:plain' not in tags:
continue
@ -133,10 +137,12 @@ class HotStarIE(HotStarBaseIE):
try:
if 'package:hls' in tags or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls'))
format_url, video_id, 'mp4',
entry_protocol='m3u8_native',
m3u8_id='hls', headers=headers))
elif 'package:dash' in tags or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash'))
format_url, video_id, mpd_id='dash', headers=headers))
elif ext == 'f4m':
# produce broken files
pass
@ -154,6 +160,9 @@ class HotStarIE(HotStarBaseIE):
self.raise_geo_restricted(countries=['IN'])
self._sort_formats(formats)
for f in formats:
f.setdefault('http_headers', {}).update(headers)
return {
'id': video_id,
'title': title,

View File

@ -1,85 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
get_element_by_id,
remove_end,
)
class IconosquareIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:iconosquare\.com|statigr\.am)/p/(?P<id>[^/]+)'
_TEST = {
'url': 'http://statigr.am/p/522207370455279102_24101272',
'md5': '6eb93b882a3ded7c378ee1d6884b1814',
'info_dict': {
'id': '522207370455279102_24101272',
'ext': 'mp4',
'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
'description': 'md5:644406a9ec27457ed7aa7a9ebcd4ce3d',
'timestamp': 1376471991,
'upload_date': '20130814',
'uploader': 'aguynamedpatrick',
'uploader_id': '24101272',
'comment_count': int,
'like_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media = self._parse_json(
get_element_by_id('mediaJson', webpage),
video_id)
formats = [{
'url': f['url'],
'format_id': format_id,
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height'))
} for format_id, f in media['videos'].items()]
self._sort_formats(formats)
title = remove_end(self._og_search_title(webpage), ' - via Iconosquare')
timestamp = int_or_none(media.get('created_time') or media.get('caption', {}).get('created_time'))
description = media.get('caption', {}).get('text')
uploader = media.get('user', {}).get('username')
uploader_id = media.get('user', {}).get('id')
comment_count = int_or_none(media.get('comments', {}).get('count'))
like_count = int_or_none(media.get('likes', {}).get('count'))
thumbnails = [{
'url': t['url'],
'id': thumbnail_id,
'width': int_or_none(t.get('width')),
'height': int_or_none(t.get('height'))
} for thumbnail_id, t in media.get('images', {}).items()]
comments = [{
'id': comment.get('id'),
'text': comment['text'],
'timestamp': int_or_none(comment.get('created_time')),
'author': comment.get('from', {}).get('full_name'),
'author_id': comment.get('from', {}).get('username'),
} for comment in media.get('comments', {}).get('data', []) if 'text' in comment]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnails': thumbnails,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'comment_count': comment_count,
'like_count': like_count,
'formats': formats,
'comments': comments,
}

View File

@ -22,7 +22,7 @@ from ..utils import (
class InstagramIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv)/(?P<id>[^/?#&]+))'
_TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
@ -92,6 +92,9 @@ class InstagramIE(InfoExtractor):
}, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True,
}, {
'url': 'https://www.instagram.com/tv/aye83DjauH/',
'only_matching': True,
}]
@staticmethod

View File

@ -1,15 +1,13 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import (
determine_ext,
int_or_none,
xpath_text,
)
class InternetVideoArchiveIE(InfoExtractor):
@ -20,7 +18,7 @@ class InternetVideoArchiveIE(InfoExtractor):
'info_dict': {
'id': '194487',
'ext': 'mp4',
'title': 'KICK-ASS 2',
'title': 'Kick-Ass 2',
'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
},
'params': {
@ -33,68 +31,34 @@ class InternetVideoArchiveIE(InfoExtractor):
def _build_json_url(query):
return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
@staticmethod
def _build_xml_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_parse_qs(query)
video_id = query_dic['publishedid'][0]
if '/player/' in url:
configuration = self._download_json(url, video_id)
# There are multiple videos in the playlist whlie only the first one
# matches the video played in browsers
video_info = configuration['playlist'][0]
title = video_info['title']
formats = []
for source in video_info['sources']:
file_url = source['file']
if determine_ext(file_url) == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
file_url = m3u8_formats[0]['url']
formats.extend(self._extract_f4m_formats(
file_url.replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
file_url.replace('.m3u8', '.mpd'),
video_id, mpd_id='dash', fatal=False))
else:
a_format = {
'url': file_url,
}
if source.get('label') and source['label'][-4:] == ' kbs':
tbr = int_or_none(source['label'][:-4])
a_format.update({
'tbr': tbr,
'format_id': 'http-%d' % tbr,
})
formats.append(a_format)
self._sort_formats(formats)
description = video_info.get('description')
thumbnail = video_info.get('image')
else:
configuration = self._download_xml(url, video_id)
formats = [{
'url': xpath_text(configuration, './file', 'file URL', fatal=True),
}]
thumbnail = xpath_text(configuration, './image', 'thumbnail')
title = 'InternetVideoArchive video %s' % video_id
description = None
query = compat_parse_qs(compat_urlparse.urlparse(url).query)
video_id = query['publishedid'][0]
data = self._download_json(
'https://video.internetvideoarchive.net/videojs7/videojs7.ivasettings.ashx',
video_id, data=json.dumps({
'customerid': query['customerid'][0],
'publishedid': video_id,
}).encode())
title = data['Title']
formats = self._extract_m3u8_formats(
data['VideoUrl'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
file_url = formats[0]['url']
if '.ism/' in file_url:
replace_url = lambda x: re.sub(r'\.ism/[^?]+', '.ism/' + x, file_url)
formats.extend(self._extract_f4m_formats(
replace_url('.f4m'), video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
replace_url('.mpd'), video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_ism_formats(
replace_url('Manifest'), video_id, ism_id='mss', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
'thumbnail': data.get('PosterUrl'),
'description': data.get('Description'),
}

View File

@ -1,8 +1,9 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
import re
import sys
from .common import InfoExtractor
from ..utils import (
@ -18,6 +19,8 @@ class IviIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ivi\.(?:ru|tv)/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
_LIGHT_KEY = b'\xf1\x02\x32\xb7\xbc\x5c\x7a\xe8\xf7\x96\xc1\x33\x2b\x27\xa1\x8c'
_LIGHT_URL = 'https://api.ivi.ru/light/'
_TESTS = [
# Single movie
@ -80,48 +83,96 @@ class IviIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
data = {
data = json.dumps({
'method': 'da.content.get',
'params': [
video_id, {
'site': 's183',
'site': 's%d',
'referrer': 'http://www.ivi.ru/watch/%s' % video_id,
'contentid': video_id
}
]
}
})
video_json = self._download_json(
'http://api.digitalaccess.ru/api/json/', video_id,
'Downloading video JSON', data=json.dumps(data))
bundled = hasattr(sys, 'frozen')
if 'error' in video_json:
error = video_json['error']
origin = error['origin']
if origin == 'NotAllowedForLocation':
self.raise_geo_restricted(
msg=error['message'], countries=self._GEO_COUNTRIES)
elif origin == 'NoRedisValidData':
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
raise ExtractorError(
'Unable to download video %s: %s' % (video_id, error['message']),
expected=True)
for site in (353, 183):
content_data = (data % site).encode()
if site == 353:
if bundled:
continue
try:
from Cryptodome.Cipher import Blowfish
from Cryptodome.Hash import CMAC
pycryptodomex_found = True
except ImportError:
pycryptodomex_found = False
continue
timestamp = (self._download_json(
self._LIGHT_URL, video_id,
'Downloading timestamp JSON', data=json.dumps({
'method': 'da.timestamp.get',
'params': []
}).encode(), fatal=False) or {}).get('result')
if not timestamp:
continue
query = {
'ts': timestamp,
'sign': CMAC.new(self._LIGHT_KEY, timestamp.encode() + content_data, Blowfish).hexdigest(),
}
else:
query = {}
video_json = self._download_json(
self._LIGHT_URL, video_id,
'Downloading video JSON', data=content_data, query=query)
error = video_json.get('error')
if error:
origin = error.get('origin')
message = error.get('message') or error.get('user_message')
extractor_msg = 'Unable to download video %s'
if origin == 'NotAllowedForLocation':
self.raise_geo_restricted(message, self._GEO_COUNTRIES)
elif origin == 'NoRedisValidData':
extractor_msg = 'Video %s does not exist'
elif site == 353:
continue
elif bundled:
raise ExtractorError(
'This feature does not work from bundled exe. Run youtube-dl from sources.',
expected=True)
elif not pycryptodomex_found:
raise ExtractorError(
'pycryptodomex not found. Please install it.',
expected=True)
elif message:
extractor_msg += ': ' + message
raise ExtractorError(extractor_msg % video_id, expected=True)
else:
break
result = video_json['result']
title = result['title']
quality = qualities(self._KNOWN_FORMATS)
formats = [{
'url': x['url'],
'format_id': x.get('content_format'),
'quality': quality(x.get('content_format')),
} for x in result['files'] if x.get('url')]
formats = []
for f in result.get('files', []):
f_url = f.get('url')
content_format = f.get('content_format')
if not f_url or '-MDRM-' in content_format or '-FPS-' in content_format:
continue
formats.append({
'url': f_url,
'format_id': content_format,
'quality': quality(content_format),
'filesize': int_or_none(f.get('size_in_bytes')),
})
self._sort_formats(formats)
title = result['title']
duration = int_or_none(result.get('duration'))
compilation = result.get('compilation')
episode = title if compilation else None
@ -158,7 +209,7 @@ class IviIE(InfoExtractor):
'episode_number': episode_number,
'thumbnails': thumbnails,
'description': description,
'duration': duration,
'duration': int_or_none(result.get('duration')),
'formats': formats,
}

View File

@ -1,38 +1,26 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import hashlib
import random
from ..compat import compat_urlparse
from ..compat import compat_str
from .common import InfoExtractor
from ..utils import parse_duration
from ..utils import (
clean_html,
int_or_none,
try_get,
)
class JamendoBaseIE(InfoExtractor):
def _extract_meta(self, webpage, fatal=True):
title = self._og_search_title(
webpage, default=None) or self._search_regex(
r'<title>([^<]+)', webpage,
'title', default=None)
if title:
title = self._search_regex(
r'(.+?)\s*\|\s*Jamendo Music', title, 'title', default=None)
if not title:
title = self._html_search_meta(
'name', webpage, 'title', fatal=fatal)
mobj = re.search(r'(.+) - (.+)', title or '')
artist, second = mobj.groups() if mobj else [None] * 2
return title, artist, second
class JamendoIE(JamendoBaseIE):
class JamendoIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
licensing\.jamendo\.com/[^/]+|
(?:www\.)?jamendo\.com
)
/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)
/track/(?P<id>[0-9]+)(?:/(?P<display_id>[^/?#&]+))?
'''
_TESTS = [{
'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
@ -45,7 +33,9 @@ class JamendoIE(JamendoBaseIE):
'artist': 'Maya Filipič',
'track': 'Stories from Emona I',
'duration': 210,
'thumbnail': r're:^https?://.*\.jpg'
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1217438117,
'upload_date': '20080730',
}
}, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
@ -53,15 +43,20 @@ class JamendoIE(JamendoBaseIE):
}]
def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url)
track_id = mobj.group('id')
display_id = mobj.group('display_id')
track_id, display_id = self._VALID_URL_RE.match(url).groups()
webpage = self._download_webpage(
'https://www.jamendo.com/track/%s/%s' % (track_id, display_id),
display_id)
title, artist, track = self._extract_meta(webpage)
'https://www.jamendo.com/track/' + track_id, track_id)
models = self._parse_json(self._html_search_regex(
r"data-bundled-models='([^']+)",
webpage, 'bundled models'), track_id)
track = models['track']['models'][0]
title = track_name = track['name']
get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
artist = get_model('artist')
artist_name = artist.get('name')
if artist_name:
title = '%s - %s' % (artist_name, title)
album = get_model('album')
formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@ -77,31 +72,58 @@ class JamendoIE(JamendoBaseIE):
))]
self._sort_formats(formats)
thumbnail = self._html_search_meta(
'image', webpage, 'thumbnail', fatal=False)
duration = parse_duration(self._search_regex(
r'<span[^>]+itemprop=["\']duration["\'][^>]+content=["\'](.+?)["\']',
webpage, 'duration', fatal=False))
urls = []
thumbnails = []
for _, covers in track.get('cover', {}).items():
for cover_id, cover_url in covers.items():
if not cover_url or cover_url in urls:
continue
urls.append(cover_url)
size = int_or_none(cover_id.lstrip('size'))
thumbnails.append({
'id': cover_id,
'url': cover_url,
'width': size,
'height': size,
})
tags = []
for tag in track.get('tags', []):
tag_name = tag.get('name')
if not tag_name:
continue
tags.append(tag_name)
stats = track.get('stats') or {}
return {
'id': track_id,
'display_id': display_id,
'thumbnail': thumbnail,
'thumbnails': thumbnails,
'title': title,
'duration': duration,
'artist': artist,
'track': track,
'formats': formats
'description': track.get('description'),
'duration': int_or_none(track.get('duration')),
'artist': artist_name,
'track': track_name,
'album': album.get('name'),
'formats': formats,
'license': '-'.join(track.get('licenseCC', [])) or None,
'timestamp': int_or_none(track.get('dateCreated')),
'view_count': int_or_none(stats.get('listenedAll')),
'like_count': int_or_none(stats.get('favorited')),
'average_rating': int_or_none(stats.get('averageNote')),
'tags': tags,
}
class JamendoAlbumIE(JamendoBaseIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)'
class JamendoAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': {
'id': '121486',
'title': 'Shearer - Duck On Cover'
'title': 'Duck On Cover',
'description': 'md5:c2920eaeef07d7af5b96d7c64daf1239',
},
'playlist': [{
'md5': 'e1a2fcb42bda30dfac990212924149a8',
@ -111,6 +133,8 @@ class JamendoAlbumIE(JamendoBaseIE):
'title': 'Shearer - Warmachine',
'artist': 'Shearer',
'track': 'Warmachine',
'timestamp': 1368089771,
'upload_date': '20130509',
}
}, {
'md5': '1f358d7b2f98edfe90fd55dac0799d50',
@ -120,6 +144,8 @@ class JamendoAlbumIE(JamendoBaseIE):
'title': 'Shearer - Without Your Ghost',
'artist': 'Shearer',
'track': 'Without Your Ghost',
'timestamp': 1368089771,
'upload_date': '20130509',
}
}],
'params': {
@ -127,24 +153,35 @@ class JamendoAlbumIE(JamendoBaseIE):
}
}
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url)
album_id = mobj.group('id')
album_id = self._match_id(url)
album = self._call_api('album', album_id)
album_name = album.get('name')
webpage = self._download_webpage(url, mobj.group('display_id'))
entries = []
for track in album.get('tracks', []):
track_id = track.get('id')
if not track_id:
continue
track_id = compat_str(track_id)
entries.append({
'_type': 'url_transparent',
'url': 'https://www.jamendo.com/track/' + track_id,
'ie_key': JamendoIE.ie_key(),
'id': track_id,
'album': album_name,
})
title, artist, album = self._extract_meta(webpage, fatal=False)
entries = [{
'_type': 'url_transparent',
'url': compat_urlparse.urljoin(url, m.group('path')),
'ie_key': JamendoIE.ie_key(),
'id': self._search_regex(
r'/track/(\d+)', m.group('path'), 'track id', default=None),
'artist': artist,
'album': album,
} for m in re.finditer(
r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
webpage)]
return self.playlist_result(entries, album_id, title)
return self.playlist_result(
entries, album_id, album_name,
clean_html(try_get(album, lambda x: x['description']['en'], compat_str)))

View File

@ -7,7 +7,7 @@ from .common import InfoExtractor
class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|video)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
'md5': 'fa8899fa601eb7c83a64e9d568bdf325',

View File

@ -6,14 +6,15 @@ from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
strip_or_none,
unified_timestamp,
update_url_query,
)
class KakaoIE(InfoExtractor):
_VALID_URL = r'https?://tv\.kakao\.com/channel/(?P<channel>\d+)/cliplink/(?P<id>\d+)'
_API_BASE = 'http://tv.kakao.com/api/v1/ft/cliplinks'
_VALID_URL = r'https?://(?:play-)?tv\.kakao\.com/(?:channel/\d+|embed/player)/cliplink/(?P<id>\d+|[^?#&]+@my)'
_API_BASE_TMPL = 'http://tv.kakao.com/api/v1/ft/cliplinks/%s/'
_TESTS = [{
'url': 'http://tv.kakao.com/channel/2671005/cliplink/301965083',
@ -36,7 +37,7 @@ class KakaoIE(InfoExtractor):
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
'uploader_id': 2653210,
'uploader': ' 음악중심',
'uploader': '! 음악중심',
'timestamp': 1485684628,
'upload_date': '20170129',
}
@ -44,6 +45,8 @@ class KakaoIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id.rstrip('@my')
api_base = self._API_BASE_TMPL % video_id
player_header = {
'Referer': update_url_query(
@ -55,20 +58,23 @@ class KakaoIE(InfoExtractor):
})
}
QUERY_COMMON = {
query = {
'player': 'monet_html5',
'referer': url,
'uuid': '',
'service': 'kakao_tv',
'section': '',
'dteType': 'PC',
'fields': ','.join([
'-*', 'tid', 'clipLink', 'displayTitle', 'clip', 'title',
'description', 'channelId', 'createTime', 'duration', 'playCount',
'likeCount', 'commentCount', 'tagList', 'channel', 'name',
'clipChapterThumbnailList', 'thumbnailUrl', 'timeInSec', 'isDefault',
'videoOutputList', 'width', 'height', 'kbps', 'profile', 'label'])
}
query = QUERY_COMMON.copy()
query['fields'] = 'clipLink,clip,channel,hasPlusFriend,-service,-tagList'
impress = self._download_json(
'%s/%s/impress' % (self._API_BASE, video_id),
video_id, 'Downloading video info',
api_base + 'impress', display_id, 'Downloading video info',
query=query, headers=player_header)
clip_link = impress['clipLink']
@ -76,32 +82,22 @@ class KakaoIE(InfoExtractor):
title = clip.get('title') or clip_link.get('displayTitle')
tid = impress.get('tid', '')
query = QUERY_COMMON.copy()
query.update({
'tid': tid,
'profile': 'HIGH',
})
raw = self._download_json(
'%s/%s/raw' % (self._API_BASE, video_id),
video_id, 'Downloading video formats info',
query=query, headers=player_header)
query['tid'] = impress.get('tid', '')
formats = []
for fmt in raw.get('outputList', []):
for fmt in clip.get('videoOutputList', []):
try:
profile_name = fmt['profile']
if profile_name == 'AUDIO':
continue
query.update({
'profile': profile_name,
'fields': '-*,url',
})
fmt_url_json = self._download_json(
'%s/%s/raw/videolocation' % (self._API_BASE, video_id),
video_id,
api_base + 'raw/videolocation', display_id,
'Downloading video URL for profile %s' % profile_name,
query={
'service': 'kakao_tv',
'section': '',
'tid': tid,
'profile': profile_name
}, headers=player_header, fatal=False)
query=query, headers=player_header, fatal=False)
if fmt_url_json is None:
continue
@ -113,7 +109,8 @@ class KakaoIE(InfoExtractor):
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'format_note': fmt.get('label'),
'filesize': int_or_none(fmt.get('filesize'))
'filesize': int_or_none(fmt.get('filesize')),
'tbr': int_or_none(fmt.get('kbps')),
})
except KeyError:
pass
@ -134,9 +131,9 @@ class KakaoIE(InfoExtractor):
})
return {
'id': video_id,
'id': display_id,
'title': title,
'description': clip.get('description'),
'description': strip_or_none(clip.get('description')),
'uploader': clip_link.get('channel', {}).get('name'),
'uploader_id': clip_link.get('channelId'),
'thumbnails': thumbs,
@ -146,4 +143,5 @@ class KakaoIE(InfoExtractor):
'like_count': int_or_none(clip.get('likeCount')),
'comment_count': int_or_none(clip.get('commentCount')),
'formats': formats,
'tags': clip.get('tagList'),
}

View File

@ -151,14 +151,15 @@ class KalturaIE(InfoExtractor):
if mobj:
embed_info = mobj.groupdict()
for k, v in embed_info.items():
embed_info[k] = v.strip()
if v:
embed_info[k] = v.strip()
url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
escaped_pid = re.escape(embed_info['partner_id'])
service_url = re.search(
r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
service_mobj = re.search(
r'<script[^>]+src=(["\'])(?P<id>(?:https?:)?//(?:(?!\1).)+)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
webpage)
if service_url:
url = smuggle_url(url, {'service_url': service_url.group(1)})
if service_mobj:
url = smuggle_url(url, {'service_url': service_mobj.group('id')})
return url
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):

View File

@ -1,39 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class KeekIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?keek\.com/keek/(?P<id>\w+)'
IE_NAME = 'keek'
_TEST = {
'url': 'https://www.keek.com/keek/NODfbab',
'md5': '9b0636f8c0f7614afa4ea5e4c6e57e83',
'info_dict': {
'id': 'NODfbab',
'ext': 'mp4',
'title': 'md5:35d42050a3ece241d5ddd7fdcc6fd896',
'uploader': 'ytdl',
'uploader_id': 'eGT5bab',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
return {
'id': video_id,
'url': self._og_search_video_url(webpage),
'ext': 'mp4',
'title': self._og_search_description(webpage).strip(),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader': self._search_regex(
r'data-username=(["\'])(?P<uploader>.+?)\1', webpage,
'uploader', fatal=False, group='uploader'),
'uploader_id': self._search_regex(
r'data-user-id=(["\'])(?P<uploader_id>.+?)\1', webpage,
'uploader id', fatal=False, group='uploader_id'),
}

View File

@ -0,0 +1,221 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
int_or_none,
parse_iso8601,
strip_or_none,
try_get,
unescapeHTML,
urljoin,
)
class KinjaEmbedIE(InfoExtractor):
IENAME = 'kinja:embed'
_DOMAIN_REGEX = r'''(?:[^.]+\.)?
(?:
avclub|
clickhole|
deadspin|
gizmodo|
jalopnik|
jezebel|
kinja|
kotaku|
lifehacker|
splinternews|
the(?:inventory|onion|root|takeout)
)\.com'''
_COMMON_REGEX = r'''/
(?:
ajax/inset|
embed/video
)/iframe\?.*?\bid='''
_VALID_URL = r'''(?x)https?://%s%s
(?P<type>
fb|
imgur|
instagram|
jwp(?:layer)?-video|
kinjavideo|
mcp|
megaphone|
ooyala|
soundcloud(?:-playlist)?|
tumblr-post|
twitch-stream|
twitter|
ustream-channel|
vimeo|
vine|
youtube-(?:list|video)
)-(?P<id>[^&]+)''' % (_DOMAIN_REGEX, _COMMON_REGEX)
_TESTS = [{
'url': 'https://kinja.com/ajax/inset/iframe?id=fb-10103303356633621',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=kinjavideo-100313',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=megaphone-PPY1300931075',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=ooyala-xzMXhleDpopuT0u1ijt_qZj3Va-34pEX%2FZTIxYmJjZDM2NWYzZDViZGRiOWJjYzc5',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=soundcloud-128574047',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=soundcloud-playlist-317413750',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=tumblr-post-160130699814-daydreams-at-midnight',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=twitch-stream-libratus_extra',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=twitter-1068875942473404422',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=ustream-channel-10414700',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=vimeo-120153502',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=vine-5BlvV5qqPrD',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=youtube-list-BCQ3KyrPjgA/PLE6509247C270A72E',
'only_matching': True,
}, {
'url': 'https://kinja.com/ajax/inset/iframe?id=youtube-video-00QyL0AgPAE',
'only_matching': True,
}]
_JWPLATFORM_PROVIDER = ('cdn.jwplayer.com/v2/media/', 'JWPlatform')
_PROVIDER_MAP = {
'fb': ('facebook.com/video.php?v=', 'Facebook'),
'imgur': ('imgur.com/', 'Imgur'),
'instagram': ('instagram.com/p/', 'Instagram'),
'jwplayer-video': _JWPLATFORM_PROVIDER,
'jwp-video': _JWPLATFORM_PROVIDER,
'megaphone': ('player.megaphone.fm/', 'Generic'),
'ooyala': ('player.ooyala.com/player.js?embedCode=', 'Ooyala'),
'soundcloud': ('api.soundcloud.com/tracks/', 'Soundcloud'),
'soundcloud-playlist': ('api.soundcloud.com/playlists/', 'SoundcloudPlaylist'),
'tumblr-post': ('%s.tumblr.com/post/%s', 'Tumblr'),
'twitch-stream': ('twitch.tv/', 'TwitchStream'),
'twitter': ('twitter.com/i/cards/tfw/v1/', 'TwitterCard'),
'ustream-channel': ('ustream.tv/embed/', 'Ustream'),
'vimeo': ('vimeo.com/', 'Vimeo'),
'vine': ('vine.co/v/', 'Vine'),
'youtube-list': ('youtube.com/embed/%s?list=%s', 'YoutubePlaylist'),
'youtube-video': ('youtube.com/embed/', 'Youtube'),
}
@staticmethod
def _extract_urls(webpage, url):
return [urljoin(url, unescapeHTML(mobj.group('url'))) for mobj in re.finditer(
r'(?x)<iframe[^>]+?src=(?P<q>["\'])(?P<url>(?:(?:https?:)?//%s)?%s(?:(?!\1).)+)\1' % (KinjaEmbedIE._DOMAIN_REGEX, KinjaEmbedIE._COMMON_REGEX),
webpage)]
def _real_extract(self, url):
video_type, video_id = re.match(self._VALID_URL, url).groups()
provider = self._PROVIDER_MAP.get(video_type)
if provider:
video_id = compat_urllib_parse_unquote(video_id)
if video_type == 'tumblr-post':
video_id, blog = video_id.split('-', 1)
result_url = provider[0] % (blog, video_id)
elif video_type == 'youtube-list':
video_id, playlist_id = video_id.split('/')
result_url = provider[0] % (video_id, playlist_id)
else:
if video_type == 'ooyala':
video_id = video_id.split('/')[0]
result_url = provider[0] + video_id
return self.url_result('http://' + result_url, provider[1])
if video_type == 'kinjavideo':
data = self._download_json(
'https://kinja.com/api/core/video/views/videoById',
video_id, query={'videoId': video_id})['data']
title = data['title']
formats = []
for k in ('signedPlaylist', 'streaming'):
m3u8_url = data.get(k + 'Url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
thumbnail = None
poster = data.get('poster') or {}
poster_id = poster.get('id')
if poster_id:
thumbnail = 'https://i.kinja-img.com/gawker-media/image/upload/%s.%s' % (poster_id, poster.get('format') or 'jpg')
return {
'id': video_id,
'title': title,
'description': strip_or_none(data.get('description')),
'formats': formats,
'tags': data.get('tags'),
'timestamp': int_or_none(try_get(
data, lambda x: x['postInfo']['publishTimeMillis']), 1000),
'thumbnail': thumbnail,
'uploader': data.get('network'),
}
else:
video_data = self._download_json(
'https://api.vmh.univision.com/metadata/v1/content/' + video_id,
video_id)['videoMetadata']
iptc = video_data['photoVideoMetadataIPTC']
title = iptc['title']['en']
fmg = video_data.get('photoVideoMetadata_fmg') or {}
tvss_domain = fmg.get('tvssDomain') or 'https://auth.univision.com'
data = self._download_json(
tvss_domain + '/api/v3/video-auth/url-signature-tokens',
video_id, query={'mcpids': video_id})['data'][0]
formats = []
rendition_url = data.get('renditionUrl')
if rendition_url:
formats = self._extract_m3u8_formats(
rendition_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
fallback_rendition_url = data.get('fallbackRenditionUrl')
if fallback_rendition_url:
formats.append({
'format_id': 'fallback',
'tbr': int_or_none(self._search_regex(
r'_(\d+)\.mp4', fallback_rendition_url,
'bitrate', default=None)),
'url': fallback_rendition_url,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': try_get(iptc, lambda x: x['cloudinaryLink']['link'], compat_str),
'uploader': fmg.get('network'),
'duration': int_or_none(iptc.get('fileDuration')),
'formats': formats,
'description': try_get(iptc, lambda x: x['description']['en'], compat_str),
'timestamp': parse_iso8601(iptc.get('dateReleased')),
}

View File

@ -20,7 +20,7 @@ class LA7IE(InfoExtractor):
'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
'md5': '8b613ffc0c4bf9b9e377169fc19c214c',
'info_dict': {
'id': 'inccool8-02-10-2015-163722',
'id': '0_42j6wd36',
'ext': 'mp4',
'title': 'Inc.Cool8',
'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto atletico',
@ -57,7 +57,7 @@ class LA7IE(InfoExtractor):
return {
'_type': 'url_transparent',
'url': smuggle_url('kaltura:103:%s' % player_data['vid'], {
'service_url': 'http://kdam.iltrovatore.it',
'service_url': 'http://nkdam.iltrovatore.it',
}),
'id': video_id,
'title': player_data['title'],

View File

@ -1,33 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class LearnrIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?learnr\.pro/view/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.learnr.pro/view/video/51624-web-development-tutorial-for-beginners-1-how-to-build-webpages-with-html-css-javascript',
'md5': '3719fdf0a68397f49899e82c308a89de',
'info_dict': {
'id': '51624',
'ext': 'mp4',
'title': 'Web Development Tutorial for Beginners (#1) - How to build webpages with HTML, CSS, Javascript',
'description': 'md5:b36dbfa92350176cdf12b4d388485503',
'uploader': 'LearnCode.academy',
'uploader_id': 'learncodeacademy',
'upload_date': '20131021',
},
'add_ie': ['Youtube'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
return {
'_type': 'url_transparent',
'url': self._search_regex(
r"videoId\s*:\s*'([^']+)'", webpage, 'youtube id'),
'id': video_id,
}

View File

@ -326,7 +326,7 @@ class LetvCloudIE(InfoExtractor):
elif play_json.get('code'):
raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
else:
raise ExtractorError('Letv cloud returned an unknwon error')
raise ExtractorError('Letv cloud returned an unknown error')
def b64decode(s):
return compat_b64decode(s).decode('utf-8')

View File

@ -5,24 +5,27 @@ import re
from .common import InfoExtractor
from ..utils import (
clean_html,
compat_str,
int_or_none,
unified_strdate,
parse_iso8601,
)
class LnkGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lnkgo\.(?:alfa\.)?lt/visi-video/(?P<show>[^/]+)/ziurek-(?P<id>[A-Za-z0-9-]+)'
_VALID_URL = r'https?://(?:www\.)?lnk(?:go)?\.(?:alfa\.)?lt/(?:visi-video/[^/]+|video)/(?P<id>[A-Za-z0-9-]+)(?:/(?P<episode_id>\d+))?'
_TESTS = [{
'url': 'http://lnkgo.alfa.lt/visi-video/yra-kaip-yra/ziurek-yra-kaip-yra-162',
'url': 'http://www.lnkgo.lt/visi-video/aktualai-pratesimas/ziurek-putka-trys-klausimai',
'info_dict': {
'id': '46712',
'id': '10809',
'ext': 'mp4',
'title': 'Yra kaip yra',
'upload_date': '20150107',
'description': 'md5:d82a5e36b775b7048617f263a0e3475e',
'age_limit': 7,
'duration': 3019,
'thumbnail': r're:^https?://.*\.jpg$'
'title': "Put'ka: Trys Klausimai",
'upload_date': '20161216',
'description': 'Seniai matytas Putka užduoda tris klausimėlius. Pabandykime surasti atsakymus.',
'age_limit': 18,
'duration': 117,
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1481904000,
},
'params': {
'skip_download': True, # HLS download
@ -30,20 +33,21 @@ class LnkGoIE(InfoExtractor):
}, {
'url': 'http://lnkgo.alfa.lt/visi-video/aktualai-pratesimas/ziurek-nerdas-taiso-kompiuteri-2',
'info_dict': {
'id': '47289',
'id': '10467',
'ext': 'mp4',
'title': 'Nėrdas: Kompiuterio Valymas',
'upload_date': '20150113',
'description': 'md5:7352d113a242a808676ff17e69db6a69',
'age_limit': 18,
'duration': 346,
'thumbnail': r're:^https?://.*\.jpg$'
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1421164800,
},
'params': {
'skip_download': True, # HLS download
},
}, {
'url': 'http://www.lnkgo.lt/visi-video/aktualai-pratesimas/ziurek-putka-trys-klausimai',
'url': 'https://lnk.lt/video/neigalieji-tv-bokste/37413',
'only_matching': True,
}]
_AGE_LIMITS = {
@ -51,66 +55,34 @@ class LnkGoIE(InfoExtractor):
'N-14': 14,
'S': 18,
}
_M3U8_TEMPL = 'https://vod.lnk.lt/lnk_vod/lnk/lnk/%s:%s/playlist.m3u8%s'
def _real_extract(self, url):
display_id = self._match_id(url)
display_id, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
url, display_id, 'Downloading player webpage')
video_id = self._search_regex(
r'data-ep="([^"]+)"', webpage, 'video ID')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
upload_date = unified_strdate(self._search_regex(
r'class="[^"]*meta-item[^"]*air-time[^"]*">.*?<strong>([^<]+)</strong>', webpage, 'upload date', fatal=False))
thumbnail_w = int_or_none(
self._og_search_property('image:width', webpage, 'thumbnail width', fatal=False))
thumbnail_h = int_or_none(
self._og_search_property('image:height', webpage, 'thumbnail height', fatal=False))
thumbnail = {
'url': self._og_search_thumbnail(webpage),
}
if thumbnail_w and thumbnail_h:
thumbnail.update({
'width': thumbnail_w,
'height': thumbnail_h,
})
config = self._parse_json(self._search_regex(
r'episodePlayer\((\{.*?\}),\s*\{', webpage, 'sources'), video_id)
if config.get('pGeo'):
self.report_warning(
'This content might not be available in your country due to copyright reasons')
formats = [{
'format_id': 'hls',
'ext': 'mp4',
'url': config['EpisodeVideoLink_HLS'],
}]
m = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<play_path>.+)$', config['EpisodeVideoLink'])
if m:
formats.append({
'format_id': 'rtmp',
'ext': 'flv',
'url': m.group('url'),
'play_path': m.group('play_path'),
'page_url': url,
})
video_info = self._download_json(
'https://lnk.lt/api/main/video-page/%s/%s/false' % (display_id, video_id or '0'),
display_id)['videoConfig']['videoInfo']
video_id = compat_str(video_info['id'])
title = video_info['title']
prefix = 'smil' if video_info.get('isQualityChangeAvailable') else 'mp4'
formats = self._extract_m3u8_formats(
self._M3U8_TEMPL % (prefix, video_info['videoUrl'], video_info.get('secureTokenParams') or ''),
video_id, 'mp4', 'm3u8_native')
self._sort_formats(formats)
poster_image = video_info.get('posterImage')
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnails': [thumbnail],
'duration': int_or_none(config.get('VideoTime')),
'description': description,
'age_limit': self._AGE_LIMITS.get(config.get('PGRating'), 0),
'upload_date': upload_date,
'thumbnail': 'https://lnk.lt/all-images/' + poster_image if poster_image else None,
'duration': int_or_none(video_info.get('duration')),
'description': clean_html(video_info.get('htmlDescription')),
'age_limit': self._AGE_LIMITS.get(video_info.get('pgRating'), 0),
'timestamp': parse_iso8601(video_info.get('airDate')),
'view_count': int_or_none(video_info.get('viewsCount')),
}

View File

@ -1,42 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import ExtractorError
class MacGameStoreIE(InfoExtractor):
IE_NAME = 'macgamestore'
IE_DESC = 'MacGameStore trailers'
_VALID_URL = r'https?://(?:www\.)?macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
_TEST = {
'url': 'http://www.macgamestore.com/mediaviewer.php?trailer=2450',
'md5': '8649b8ea684b6666b4c5be736ecddc61',
'info_dict': {
'id': '2450',
'ext': 'm4v',
'title': 'Crow',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, 'Downloading trailer page')
if '>Missing Media<' in webpage:
raise ExtractorError(
'Trailer %s does not exist' % video_id, expected=True)
video_title = self._html_search_regex(
r'<title>MacGameStore: (.*?) Trailer</title>', webpage, 'title')
video_url = self._html_search_regex(
r'(?s)<div\s+id="video-player".*?href="([^"]+)"\s*>',
webpage, 'video URL')
return {
'id': video_id,
'url': video_url,
'title': video_title
}

View File

@ -1,32 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MakerTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?maker\.tv/(?:[^/]+/)*video|makerplayer\.com/embed/maker)/(?P<id>[a-zA-Z0-9]{12})'
_TEST = {
'url': 'http://www.maker.tv/video/Fh3QgymL9gsc',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
'info_dict': {
'id': 'Fh3QgymL9gsc',
'ext': 'mp4',
'title': 'Maze Runner: The Scorch Trials Official Movie Review',
'description': 'md5:11ff3362d7ef1d679fdb649f6413975a',
'upload_date': '20150918',
'timestamp': 1442549540,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
jwplatform_id = self._search_regex(r'jw_?id="([^"]+)"', webpage, 'jwplatform id')
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'jwplatform:%s' % jwplatform_id,
'ie_key': 'JWPlatform',
}

View File

@ -10,18 +10,21 @@ from ..utils import int_or_none
class MangomoloBaseIE(InfoExtractor):
_BASE_REGEX = r'https?://(?:admin\.mangomolo\.com/analytics/index\.php/customers/embed/|player\.mangomolo\.com/v1/)'
def _get_real_id(self, page_id):
return page_id
def _real_extract(self, url):
page_id = self._get_real_id(self._match_id(url))
webpage = self._download_webpage(url, page_id)
webpage = self._download_webpage(
'https://player.mangomolo.com/v1/%s?%s' % (self._TYPE, url.split('?')[1]), page_id)
hidden_inputs = self._hidden_inputs(webpage)
m3u8_entry_protocol = 'm3u8' if self._IS_LIVE else 'm3u8_native'
format_url = self._html_search_regex(
[
r'file\s*:\s*"(https?://[^"]+?/playlist\.m3u8)',
r'(?:file|src)\s*:\s*"(https?://[^"]+?/playlist\.m3u8)',
r'<a[^>]+href="(rtsp://[^"]+)"'
], webpage, 'format url')
formats = self._extract_wowza_formats(
@ -39,14 +42,16 @@ class MangomoloBaseIE(InfoExtractor):
class MangomoloVideoIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:video'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/video\?.*?\bid=(?P<id>\d+)'
_TYPE = 'video'
IE_NAME = 'mangomolo:' + _TYPE
_VALID_URL = MangomoloBaseIE._BASE_REGEX + r'video\?.*?\bid=(?P<id>\d+)'
_IS_LIVE = False
class MangomoloLiveIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:live'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/index\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
_TYPE = 'live'
IE_NAME = 'mangomolo:' + _TYPE
_VALID_URL = MangomoloBaseIE._BASE_REGEX + r'(live|index)\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
_IS_LIVE = True
def _get_real_id(self, page_id):

View File

@ -27,7 +27,7 @@ class MediasetIE(ThePlatformBaseIE):
(?:video|on-demand)/(?:[^/]+/)+[^/]+_|
player/index\.html\?.*?\bprogramGuid=
)
)(?P<id>[0-9A-Z]{16})
)(?P<id>[0-9A-Z]{16,})
'''
_TESTS = [{
# full episode
@ -62,7 +62,6 @@ class MediasetIE(ThePlatformBaseIE):
'uploader': 'Canale 5',
'uploader_id': 'C5',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
# clip
'url': 'https://www.mediasetplay.mediaset.it/video/gogglebox/un-grande-classico-della-commedia-sexy_FAFU000000661680',
@ -78,6 +77,18 @@ class MediasetIE(ThePlatformBaseIE):
}, {
'url': 'mediaset:FAFU000000665924',
'only_matching': True,
}, {
'url': 'https://www.mediasetplay.mediaset.it/video/mediasethaacuoreilfuturo/palmieri-alicudi-lisola-dei-tre-bambini-felici--un-decreto-per-alicudi-e-tutte-le-microscuole_FD00000000102295',
'only_matching': True,
}, {
'url': 'https://www.mediasetplay.mediaset.it/video/cherryseason/anticipazioni-degli-episodi-del-23-ottobre_F306837101005C02',
'only_matching': True,
}, {
'url': 'https://www.mediasetplay.mediaset.it/video/tg5/ambiente-onda-umana-per-salvare-il-pianeta_F309453601079D01',
'only_matching': True,
}, {
'url': 'https://www.mediasetplay.mediaset.it/video/grandefratellovip/benedetta-una-doccia-gelata_F309344401044C135',
'only_matching': True,
}]
@staticmethod
@ -109,6 +120,11 @@ class MediasetIE(ThePlatformBaseIE):
entries.append(embed_url)
return entries
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
for video in smil.findall(self._xpath_ns('.//video', namespace)):
video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
return super()._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
def _real_extract(self, url):
guid = self._match_id(url)
tp_path = 'PR1GhC/media/guid/2702976343/' + guid
@ -118,14 +134,15 @@ class MediasetIE(ThePlatformBaseIE):
subtitles = {}
first_e = None
for asset_type in ('SD', 'HD'):
for f in ('MPEG4', 'MPEG-DASH', 'M3U', 'ISM'):
# TODO: fixup ISM+none manifest URLs
for f in ('MPEG4', 'MPEG-DASH+none', 'M3U+none'):
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query('http://link.theplatform.%s/s/%s' % (self._TP_TLD, tp_path), {
'mbr': 'true',
'formats': f,
'assetTypes': asset_type,
}), guid, 'Downloading %s %s SMIL data' % (f, asset_type))
}), guid, 'Downloading %s %s SMIL data' % (f.split('+')[0], asset_type))
except ExtractorError as e:
if not first_e:
first_e = e

View File

@ -82,6 +82,7 @@ class MGTVIE(InfoExtractor):
'http_headers': {
'Referer': url,
},
'format_note': stream.get('name'),
})
self._sort_formats(formats)

View File

@ -1,70 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
parse_filesize,
sanitized_Request,
urlencode_postdata,
)
class MinhatecaIE(InfoExtractor):
_VALID_URL = r'https?://minhateca\.com\.br/[^?#]+,(?P<id>[0-9]+)\.'
_TEST = {
'url': 'http://minhateca.com.br/pereba/misc/youtube-dl+test+video,125848331.mp4(video)',
'info_dict': {
'id': '125848331',
'ext': 'mp4',
'title': 'youtube-dl test video',
'thumbnail': r're:^https?://.*\.jpg$',
'filesize_approx': 1530000,
'duration': 9,
'view_count': int,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
token = self._html_search_regex(
r'<input name="__RequestVerificationToken".*?value="([^"]+)"',
webpage, 'request token')
token_data = [
('fileId', video_id),
('__RequestVerificationToken', token),
]
req = sanitized_Request(
'http://minhateca.com.br/action/License/Download',
data=urlencode_postdata(token_data))
req.add_header('Content-Type', 'application/x-www-form-urlencoded')
data = self._download_json(
req, video_id, note='Downloading metadata')
video_url = data['redirectUrl']
title_str = self._html_search_regex(
r'<h1.*?>(.*?)</h1>', webpage, 'title')
title, _, ext = title_str.rpartition('.')
filesize_approx = parse_filesize(self._html_search_regex(
r'<p class="fileSize">(.*?)</p>',
webpage, 'file size approximation', fatal=False))
duration = parse_duration(self._html_search_regex(
r'(?s)<p class="fileLeng[ht][th]">.*?class="bold">(.*?)<',
webpage, 'duration', fatal=False))
view_count = int_or_none(self._html_search_regex(
r'<p class="downloadsCounter">([0-9]+)</p>',
webpage, 'view count', fatal=False))
return {
'id': video_id,
'url': video_url,
'title': title,
'ext': ext,
'filesize_approx': filesize_approx,
'duration': duration,
'view_count': view_count,
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -65,30 +65,6 @@ class TechTVMITIE(InfoExtractor):
}
class MITIE(TechTVMITIE):
IE_NAME = 'video.mit.edu'
_VALID_URL = r'https?://video\.mit\.edu/watch/(?P<title>[^/]+)'
_TEST = {
'url': 'http://video.mit.edu/watch/the-government-is-profiling-you-13222/',
'md5': '7db01d5ccc1895fc5010e9c9e13648da',
'info_dict': {
'id': '21783',
'ext': 'mp4',
'title': 'The Government is Profiling You',
'description': 'md5:ad5795fe1e1623b73620dbfd47df9afd',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
embed_url = self._search_regex(
r'<iframe .*?src="(.+?)"', webpage, 'embed url')
return self.url_result(embed_url)
class OCWMITIE(InfoExtractor):
IE_NAME = 'ocw.mit.edu'
_VALID_URL = r'^https?://ocw\.mit\.edu/courses/(?P<topic>[a-z0-9\-]+)'

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals
import functools
import itertools
import re
@ -11,28 +10,37 @@ from ..compat import (
compat_ord,
compat_str,
compat_urllib_parse_unquote,
compat_urlparse,
compat_zip
)
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
OnDemandPagedList,
str_to_int,
parse_iso8601,
strip_or_none,
try_get,
urljoin,
)
class MixcloudIE(InfoExtractor):
class MixcloudBaseIE(InfoExtractor):
def _call_api(self, object_type, object_fields, display_id, username, slug=None):
lookup_key = object_type + 'Lookup'
return self._download_json(
'https://www.mixcloud.com/graphql', display_id, query={
'query': '''{
%s(lookup: {username: "%s"%s}) {
%s
}
}''' % (lookup_key, username, ', slug: "%s"' % slug if slug else '', object_fields)
})['data'][lookup_key]
class MixcloudIE(MixcloudBaseIE):
_VALID_URL = r'https?://(?:(?:www|beta|m)\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
IE_NAME = 'mixcloud'
_TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
'info_dict': {
'id': 'dholbach-cryptkeeper',
'id': 'dholbach_cryptkeeper',
'ext': 'm4a',
'title': 'Cryptkeeper',
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
@ -40,11 +48,13 @@ class MixcloudIE(InfoExtractor):
'uploader_id': 'dholbach',
'thumbnail': r're:https?://.*\.jpg',
'view_count': int,
'timestamp': 1321359578,
'upload_date': '20111115',
},
}, {
'url': 'http://www.mixcloud.com/gillespeterson/caribou-7-inch-vinyl-mix-chat/',
'info_dict': {
'id': 'gillespeterson-caribou-7-inch-vinyl-mix-chat',
'id': 'gillespeterson_caribou-7-inch-vinyl-mix-chat',
'ext': 'mp3',
'title': 'Caribou 7 inch Vinyl Mix & Chat',
'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
@ -52,11 +62,14 @@ class MixcloudIE(InfoExtractor):
'uploader_id': 'gillespeterson',
'thumbnail': 're:https?://.*',
'view_count': int,
'timestamp': 1422987057,
'upload_date': '20150203',
},
}, {
'url': 'https://beta.mixcloud.com/RedLightRadio/nosedrip-15-red-light-radio-01-18-2016/',
'only_matching': True,
}]
_DECRYPTION_KEY = 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'
@staticmethod
def _decrypt_xor_cipher(key, ciphertext):
@ -66,176 +79,193 @@ class MixcloudIE(InfoExtractor):
for ch, k in compat_zip(ciphertext, itertools.cycle(key))])
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader = mobj.group(1)
cloudcast_name = mobj.group(2)
track_id = compat_urllib_parse_unquote('-'.join((uploader, cloudcast_name)))
username, slug = re.match(self._VALID_URL, url).groups()
username, slug = compat_urllib_parse_unquote(username), compat_urllib_parse_unquote(slug)
track_id = '%s_%s' % (username, slug)
webpage = self._download_webpage(url, track_id)
cloudcast = self._call_api('cloudcast', '''audioLength
comments(first: 100) {
edges {
node {
comment
created
user {
displayName
username
}
}
}
totalCount
}
description
favorites {
totalCount
}
featuringArtistList
isExclusive
name
owner {
displayName
url
username
}
picture(width: 1024, height: 1024) {
url
}
plays
publishDate
reposts {
totalCount
}
streamInfo {
dashUrl
hlsUrl
url
}
tags {
tag {
name
}
}''', track_id, username, slug)
# Legacy path
encrypted_play_info = self._search_regex(
r'm-play-info="([^"]+)"', webpage, 'play info', default=None)
title = cloudcast['name']
if encrypted_play_info is not None:
# Decode
encrypted_play_info = compat_b64decode(encrypted_play_info)
else:
# New path
full_info_json = self._parse_json(self._html_search_regex(
r'<script id="relay-data" type="text/x-mixcloud">([^<]+)</script>',
webpage, 'play info'), 'play info')
for item in full_info_json:
item_data = try_get(
item, lambda x: x['cloudcast']['data']['cloudcastLookup'],
dict)
if try_get(item_data, lambda x: x['streamInfo']['url']):
info_json = item_data
break
else:
raise ExtractorError('Failed to extract matching stream info')
stream_info = cloudcast['streamInfo']
formats = []
message = self._html_search_regex(
r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
webpage, 'error message', default=None)
js_url = self._search_regex(
r'<script[^>]+\bsrc=["\"](https://(?:www\.)?mixcloud\.com/media/(?:js2/www_js_4|js/www)\.[^>]+\.js)',
webpage, 'js url')
js = self._download_webpage(js_url, track_id, 'Downloading JS')
# Known plaintext attack
if encrypted_play_info:
kps = ['{"stream_url":']
kpa_target = encrypted_play_info
else:
kps = ['https://', 'http://']
kpa_target = compat_b64decode(info_json['streamInfo']['url'])
for kp in kps:
partial_key = self._decrypt_xor_cipher(kpa_target, kp)
for quote in ["'", '"']:
key = self._search_regex(
r'{0}({1}[^{0}]*){0}'.format(quote, re.escape(partial_key)),
js, 'encryption key', default=None)
if key is not None:
break
else:
for url_key in ('url', 'hlsUrl', 'dashUrl'):
format_url = stream_info.get(url_key)
if not format_url:
continue
break
else:
raise ExtractorError('Failed to extract encryption key')
decrypted = self._decrypt_xor_cipher(
self._DECRYPTION_KEY, compat_b64decode(format_url))
if url_key == 'hlsUrl':
formats.extend(self._extract_m3u8_formats(
decrypted, track_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif url_key == 'dashUrl':
formats.extend(self._extract_mpd_formats(
decrypted, track_id, mpd_id='dash', fatal=False))
else:
formats.append({
'format_id': 'http',
'url': decrypted,
'downloader_options': {
# Mixcloud starts throttling at >~5M
'http_chunk_size': 5242880,
},
})
if encrypted_play_info is not None:
play_info = self._parse_json(self._decrypt_xor_cipher(key, encrypted_play_info), 'play info')
if message and 'stream_url' not in play_info:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
song_url = play_info['stream_url']
formats = [{
'format_id': 'normal',
'url': song_url
}]
if not formats and cloudcast.get('isExclusive'):
self.raise_login_required()
title = self._html_search_regex(r'm-title="([^"]+)"', webpage, 'title')
thumbnail = self._proto_relative_url(self._html_search_regex(
r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail', fatal=False))
uploader = self._html_search_regex(
r'm-owner-name="([^"]+)"', webpage, 'uploader', fatal=False)
uploader_id = self._search_regex(
r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
description = self._og_search_description(webpage)
view_count = str_to_int(self._search_regex(
[r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
r'/listeners/?">([0-9,.]+)</a>',
r'(?:m|data)-tooltip=["\']([\d,.]+) plays'],
webpage, 'play count', default=None))
self._sort_formats(formats)
else:
title = info_json['name']
thumbnail = urljoin(
'https://thumbnailer.mixcloud.com/unsafe/600x600/',
try_get(info_json, lambda x: x['picture']['urlRoot'], compat_str))
uploader = try_get(info_json, lambda x: x['owner']['displayName'])
uploader_id = try_get(info_json, lambda x: x['owner']['username'])
description = try_get(info_json, lambda x: x['description'])
view_count = int_or_none(try_get(info_json, lambda x: x['plays']))
comments = []
for edge in (try_get(cloudcast, lambda x: x['comments']['edges']) or []):
node = edge.get('node') or {}
text = strip_or_none(node.get('comment'))
if not text:
continue
user = node.get('user') or {}
comments.append({
'author': user.get('displayName'),
'author_id': user.get('username'),
'text': text,
'timestamp': parse_iso8601(node.get('created')),
})
stream_info = info_json['streamInfo']
formats = []
tags = []
for t in cloudcast.get('tags'):
tag = try_get(t, lambda x: x['tag']['name'], compat_str)
if not tag:
tags.append(tag)
def decrypt_url(f_url):
for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'):
decrypted_url = self._decrypt_xor_cipher(k, f_url)
if re.search(r'^https?://[0-9a-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
return decrypted_url
get_count = lambda x: int_or_none(try_get(cloudcast, lambda y: y[x]['totalCount']))
for url_key in ('url', 'hlsUrl', 'dashUrl'):
format_url = stream_info.get(url_key)
if not format_url:
continue
decrypted = decrypt_url(compat_b64decode(format_url))
if not decrypted:
continue
if url_key == 'hlsUrl':
formats.extend(self._extract_m3u8_formats(
decrypted, track_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif url_key == 'dashUrl':
formats.extend(self._extract_mpd_formats(
decrypted, track_id, mpd_id='dash', fatal=False))
else:
formats.append({
'format_id': 'http',
'url': decrypted,
'downloader_options': {
# Mixcloud starts throttling at >~5M
'http_chunk_size': 5242880,
},
})
self._sort_formats(formats)
owner = cloudcast.get('owner') or {}
return {
'id': track_id,
'title': title,
'formats': formats,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'description': cloudcast.get('description'),
'thumbnail': try_get(cloudcast, lambda x: x['picture']['url'], compat_str),
'uploader': owner.get('displayName'),
'timestamp': parse_iso8601(cloudcast.get('publishDate')),
'uploader_id': owner.get('username'),
'uploader_url': owner.get('url'),
'duration': int_or_none(cloudcast.get('audioLength')),
'view_count': int_or_none(cloudcast.get('plays')),
'like_count': get_count('favorites'),
'repost_count': get_count('reposts'),
'comment_count': get_count('comments'),
'comments': comments,
'tags': tags,
'artist': ', '.join(cloudcast.get('featuringArtistList') or []) or None,
}
class MixcloudPlaylistBaseIE(InfoExtractor):
_PAGE_SIZE = 24
class MixcloudPlaylistBaseIE(MixcloudBaseIE):
def _get_cloudcast(self, node):
return node
def _find_urls_in_page(self, page):
for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
yield self.url_result(
compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
MixcloudIE.ie_key())
def _get_playlist_title(self, title, slug):
return title
def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
real_page_number = real_page_number or current_page + 1
return self._download_webpage(
'https://www.mixcloud.com/%s/' % path, video_id,
note='Download %s (page %d)' % (page_name, current_page + 1),
errnote='Unable to download %s' % page_name,
query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
headers={'X-Requested-With': 'XMLHttpRequest'})
def _real_extract(self, url):
username, slug = re.match(self._VALID_URL, url).groups()
username = compat_urllib_parse_unquote(username)
if not slug:
slug = 'uploads'
else:
slug = compat_urllib_parse_unquote(slug)
playlist_id = '%s_%s' % (username, slug)
def _tracks_page_func(self, page, video_id, page_name, current_page):
resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
is_playlist_type = self._ROOT_TYPE == 'playlist'
playlist_type = 'items' if is_playlist_type else slug
list_filter = ''
for item in self._find_urls_in_page(resp):
yield item
has_next_page = True
entries = []
while has_next_page:
playlist = self._call_api(
self._ROOT_TYPE, '''%s
%s
%s(first: 100%s) {
edges {
node {
%s
}
}
pageInfo {
endCursor
hasNextPage
}
}''' % (self._TITLE_KEY, self._DESCRIPTION_KEY, playlist_type, list_filter, self._NODE_TEMPLATE),
playlist_id, username, slug if is_playlist_type else None)
def _get_user_description(self, page_content):
return self._html_search_regex(
r'<div[^>]+class="profile-bio"[^>]*>(.+?)</div>',
page_content, 'user description', fatal=False)
items = playlist.get(playlist_type) or {}
for edge in items.get('edges', []):
cloudcast = self._get_cloudcast(edge.get('node') or {})
cloudcast_url = cloudcast.get('url')
if not cloudcast_url:
continue
entries.append(self.url_result(
cloudcast_url, MixcloudIE.ie_key(), cloudcast.get('slug')))
page_info = items['pageInfo']
has_next_page = page_info['hasNextPage']
list_filter = ', after: "%s"' % page_info['endCursor']
return self.playlist_result(
entries, playlist_id,
self._get_playlist_title(playlist[self._TITLE_KEY], slug),
playlist.get(self._DESCRIPTION_KEY))
class MixcloudUserIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
_VALID_URL = r'https?://(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/(?P<type>uploads|favorites|listens|stream)?/?$'
IE_NAME = 'mixcloud:user'
_TESTS = [{
@ -243,68 +273,58 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:def36060ac8747b3aabca54924897e47',
'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
},
'playlist_mincount': 11,
'playlist_mincount': 36,
}, {
'url': 'http://www.mixcloud.com/dholbach/uploads/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:def36060ac8747b3aabca54924897e47',
'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
},
'playlist_mincount': 11,
'playlist_mincount': 36,
}, {
'url': 'http://www.mixcloud.com/dholbach/favorites/',
'info_dict': {
'id': 'dholbach_favorites',
'title': 'Daniel Holbach (favorites)',
'description': 'md5:def36060ac8747b3aabca54924897e47',
'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
# 'params': {
# 'playlist_items': '1-100',
# },
'playlist_mincount': 396,
}, {
'url': 'http://www.mixcloud.com/dholbach/listens/',
'info_dict': {
'id': 'dholbach_listens',
'title': 'Daniel Holbach (listens)',
'description': 'md5:def36060ac8747b3aabca54924897e47',
'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
},
'params': {
'playlist_items': '1-100',
# 'params': {
# 'playlist_items': '1-100',
# },
'playlist_mincount': 1623,
'skip': 'Large list',
}, {
'url': 'https://www.mixcloud.com/FirstEar/stream/',
'info_dict': {
'id': 'FirstEar_stream',
'title': 'First Ear (stream)',
'description': 'Curators of good music\r\n\r\nfirstearmusic.com',
},
'playlist_mincount': 100,
'playlist_mincount': 271,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
list_type = mobj.group('type')
_TITLE_KEY = 'displayName'
_DESCRIPTION_KEY = 'biog'
_ROOT_TYPE = 'user'
_NODE_TEMPLATE = '''slug
url'''
# if only a profile URL was supplied, default to download all uploads
if list_type is None:
list_type = 'uploads'
video_id = '%s_%s' % (user_id, list_type)
profile = self._download_webpage(
'https://www.mixcloud.com/%s/' % user_id, video_id,
note='Downloading user profile',
errnote='Unable to download user profile')
username = self._og_search_title(profile)
description = self._get_user_description(profile)
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
self._PAGE_SIZE)
return self.playlist_result(
entries, video_id, '%s (%s)' % (username, list_type), description)
def _get_playlist_title(self, title, slug):
return '%s (%s)' % (title, slug)
class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
@ -312,87 +332,20 @@ class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
IE_NAME = 'mixcloud:playlist'
_TESTS = [{
'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
'info_dict': {
'id': 'RedBullThre3style_tokyo-finalists-2015',
'title': 'National Champions 2015',
'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
},
'playlist_mincount': 16,
}, {
'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
playlist_id = mobj.group('playlist')
video_id = '%s_%s' % (user_id, playlist_id)
webpage = self._download_webpage(
url, user_id,
note='Downloading playlist page',
errnote='Unable to download playlist page')
title = self._html_search_regex(
r'<a[^>]+class="parent active"[^>]*><b>\d+</b><span[^>]*>([^<]+)',
webpage, 'playlist title',
default=None) or self._og_search_title(webpage, fatal=False)
description = self._get_user_description(webpage)
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
self._PAGE_SIZE)
return self.playlist_result(entries, video_id, title, description)
class MixcloudStreamIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
IE_NAME = 'mixcloud:stream'
_TEST = {
'url': 'https://www.mixcloud.com/FirstEar/stream/',
'info_dict': {
'id': 'FirstEar',
'title': 'First Ear',
'description': 'Curators of good music\nfirstearmusic.com',
'id': 'maxvibes_jazzcat-on-ness-radio',
'title': 'Ness Radio sessions',
},
'playlist_mincount': 192,
}
'playlist_mincount': 59,
}]
_TITLE_KEY = 'name'
_DESCRIPTION_KEY = 'description'
_ROOT_TYPE = 'playlist'
_NODE_TEMPLATE = '''cloudcast {
slug
url
}'''
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(url, user_id)
entries = []
prev_page_url = None
def _handle_page(page):
entries.extend(self._find_urls_in_page(page))
return self._search_regex(
r'm-next-page-url="([^"]+)"', page,
'next page URL', default=None)
next_page_url = _handle_page(webpage)
for idx in itertools.count(0):
if not next_page_url or prev_page_url == next_page_url:
break
prev_page_url = next_page_url
current_page = int(self._search_regex(
r'\?page=(\d+)', next_page_url, 'next page number'))
next_page_url = _handle_page(self._fetch_tracks_page(
'%s/stream' % user_id, user_id, 'stream', idx,
real_page_number=current_page))
username = self._og_search_title(webpage)
description = self._get_user_description(webpage)
return self.playlist_result(entries, user_id, username, description)
def _get_cloudcast(self, node):
return node.get('cloudcast') or {}

View File

@ -14,20 +14,27 @@ from ..utils import (
class MSNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/criminal-minds-shemar-moore-shares-a-touching-goodbye-message/vp-BBqQYNE',
'md5': '8442f66c116cbab1ff7098f986983458',
'url': 'https://www.msn.com/en-in/money/video/7-ways-to-get-rid-of-chest-congestion/vi-BBPxU6d',
'md5': '087548191d273c5c55d05028f8d2cbcd',
'info_dict': {
'id': 'BBqQYNE',
'display_id': 'criminal-minds-shemar-moore-shares-a-touching-goodbye-message',
'id': 'BBPxU6d',
'display_id': '7-ways-to-get-rid-of-chest-congestion',
'ext': 'mp4',
'title': 'Criminal Minds - Shemar Moore Shares A Touching Goodbye Message',
'description': 'md5:e8e89b897b222eb33a6b5067a8f1bc25',
'duration': 104,
'uploader': 'CBS Entertainment',
'uploader_id': 'IT0X5aoJ6bJgYerJXSDCgFmYPB1__54v',
'title': 'Seven ways to get rid of chest congestion',
'description': '7 Ways to Get Rid of Chest Congestion',
'duration': 88,
'uploader': 'Health',
'uploader_id': 'BBPrMqa',
},
}, {
# Article, multiple Dailymotion Embeds
'url': 'https://www.msn.com/en-in/money/sports/hottest-football-wags-greatest-footballers-turned-managers-and-more/ar-BBpc7Nl',
'info_dict': {
'id': 'BBpc7Nl',
},
'playlist_mincount': 4,
}, {
'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf',
'only_matching': True,
@ -41,75 +48,124 @@ class MSNIE(InfoExtractor):
}, {
'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-raped-woman-comment/vi-AAhvzW6',
'only_matching': True,
}, {
# Vidible(AOL) Embed
'url': 'https://www.msn.com/en-us/money/other/jupiter-is-about-to-come-so-close-you-can-see-its-moons-with-binoculars/vi-AACqsHR',
'only_matching': True,
}, {
# Dailymotion Embed
'url': 'https://www.msn.com/es-ve/entretenimiento/watch/winston-salem-paire-refait-des-siennes-en-perdant-sa-raquette-au-service/vp-AAG704L',
'only_matching': True,
}, {
# YouTube Embed
'url': 'https://www.msn.com/en-in/money/news/meet-vikram-%E2%80%94-chandrayaan-2s-lander/vi-AAGUr0v',
'only_matching': True,
}, {
# NBCSports Embed
'url': 'https://www.msn.com/en-us/money/football_nfl/week-13-preview-redskins-vs-panthers/vi-BBXsCDb',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
display_id, page_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
video = self._parse_json(
self._search_regex(
r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1',
webpage, 'video data', default='{}', group='data'),
display_id, transform_source=unescapeHTML)
entries = []
for _, metadata in re.findall(r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1', webpage):
video = self._parse_json(unescapeHTML(metadata), display_id)
if not video:
provider_id = video.get('providerId')
player_name = video.get('playerName')
if player_name and provider_id:
entry = None
if player_name == 'AOL':
if provider_id.startswith('http'):
provider_id = self._search_regex(
r'https?://delivery\.vidible\.tv/video/redirect/([0-9a-f]{24})',
provider_id, 'vidible id')
entry = self.url_result(
'aol-video:' + provider_id, 'Aol', provider_id)
elif player_name == 'Dailymotion':
entry = self.url_result(
'https://www.dailymotion.com/video/' + provider_id,
'Dailymotion', provider_id)
elif player_name == 'YouTube':
entry = self.url_result(
provider_id, 'Youtube', provider_id)
elif player_name == 'NBCSports':
entry = self.url_result(
'http://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/' + provider_id,
'NBCSportsVPlayer', provider_id)
if entry:
entries.append(entry)
continue
video_id = video['uuid']
title = video['title']
formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
if 'format=m3u8-aapl' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False))
elif 'format=mpd-time-csf' in format_url:
formats.extend(self._extract_mpd_formats(
format_url, display_id, 'dash', fatal=False))
elif '.ism' in format_url:
if format_url.endswith('.ism'):
format_url += '/manifest'
formats.extend(self._extract_ism_formats(
format_url, display_id, 'mss', fatal=False))
else:
format_id = file_.get('formatCode')
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': format_id,
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
'vbr': int_or_none(self._search_regex(r'_(\d+)\.mp4', format_url, 'vbr', default=None)),
'preference': 1 if format_id == '1001' else None,
})
self._sort_formats(formats)
subtitles = {}
for file_ in video.get('files', []):
format_url = file_.get('url')
format_code = file_.get('formatCode')
if not format_url or not format_code:
continue
if compat_str(format_code) == '3100':
subtitles.setdefault(file_.get('culture', 'en'), []).append({
'ext': determine_ext(format_url, 'ttml'),
'url': format_url,
})
entries.append({
'id': video_id,
'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats,
})
if not entries:
error = unescapeHTML(self._search_regex(
r'data-error=(["\'])(?P<error>.+?)\1',
webpage, 'error', group='error'))
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
title = video['title']
formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
if 'm3u8' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
m3u8_formats = self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats)
elif determine_ext(format_url) == 'ism':
formats.extend(self._extract_ism_formats(
format_url + '/Manifest', display_id, 'mss', fatal=False))
else:
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': 'http',
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
})
self._sort_formats(formats)
subtitles = {}
for file_ in video.get('files', []):
format_url = file_.get('url')
format_code = file_.get('formatCode')
if not format_url or not format_code:
continue
if compat_str(format_code) == '3100':
subtitles.setdefault(file_.get('culture', 'en'), []).append({
'ext': determine_ext(format_url, 'ttml'),
'url': format_url,
})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats,
}
return self.playlist_result(entries, page_id)

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@ -349,33 +350,29 @@ class MTVIE(MTVServicesInfoExtractor):
}]
class MTV81IE(InfoExtractor):
IE_NAME = 'mtv81'
_VALID_URL = r'https?://(?:www\.)?mtv81\.com/videos/(?P<id>[^/?#.]+)'
class MTVJapanIE(MTVServicesInfoExtractor):
IE_NAME = 'mtvjapan'
_VALID_URL = r'https?://(?:www\.)?mtvjapan\.com/videos/(?P<id>[0-9a-z]+)'
_TEST = {
'url': 'http://www.mtv81.com/videos/artist-to-watch/the-godfather-of-japanese-hip-hop-segment-1/',
'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
'url': 'http://www.mtvjapan.com/videos/prayht/fresh-info-cadillac-escalade',
'info_dict': {
'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
'id': 'bc01da03-6fe5-4284-8880-f291f4e368f5',
'ext': 'mp4',
'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
'timestamp': 1468846800,
'upload_date': '20160718',
'title': '【Fresh Info】Cadillac ESCALADE Sport Edition',
},
'params': {
'skip_download': True,
},
}
_GEO_COUNTRIES = ['JP']
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
def _extract_mgid(self, webpage):
return self._search_regex(
r'getTheVideo\((["\'])(?P<id>mgid:.+?)\1', webpage,
'mgid', group='id')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mgid = self._extract_mgid(webpage)
return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)
def _get_feed_query(self, uri):
return {
'arcEp': 'mtvjapan.com',
'mgid': uri,
}
class MTVVideoIE(MTVServicesInfoExtractor):
@ -425,14 +422,14 @@ class MTVVideoIE(MTVServicesInfoExtractor):
class MTVDEIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv.de'
_VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:artists|shows|news)/(?:[^/]+/)*(?P<id>\d+)-[^/#?]+/*(?:[#?].*)?$'
_VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:musik/videoclips|folgen|news)/(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.mtv.de/artists/10571-cro/videos/61131-traum',
'url': 'http://www.mtv.de/musik/videoclips/2gpnv7/Traum',
'info_dict': {
'id': 'music_video-a50bc5f0b3aa4b3190aa',
'ext': 'flv',
'title': 'MusicVideo_cro-traum',
'description': 'Cro - Traum',
'id': 'd5d472bc-f5b7-11e5-bffd-a4badb20dab5',
'ext': 'mp4',
'title': 'Traum',
'description': 'Traum',
},
'params': {
# rtmp download
@ -441,11 +438,12 @@ class MTVDEIE(MTVServicesInfoExtractor):
'skip': 'Blocked at Travis CI',
}, {
# mediagen URL without query (e.g. http://videos.mtvnn.com/mediagen/e865da714c166d18d6f80893195fcb97)
'url': 'http://www.mtv.de/shows/933-teen-mom-2/staffeln/5353/folgen/63565-enthullungen',
'url': 'http://www.mtv.de/folgen/6b1ylu/teen-mom-2-enthuellungen-S5-F1',
'info_dict': {
'id': 'local_playlist-f5ae778b9832cc837189',
'ext': 'flv',
'title': 'Episode_teen-mom-2_shows_season-5_episode-1_full-episode_part1',
'id': '1e5a878b-31c5-11e7-a442-0e40cf2fc285',
'ext': 'mp4',
'title': 'Teen Mom 2',
'description': 'md5:dc65e357ef7e1085ed53e9e9d83146a7',
},
'params': {
# rtmp download
@ -453,7 +451,7 @@ class MTVDEIE(MTVServicesInfoExtractor):
},
'skip': 'Blocked at Travis CI',
}, {
'url': 'http://www.mtv.de/news/77491-mtv-movies-spotlight-pixels-teil-3',
'url': 'http://www.mtv.de/news/glolix/77491-mtv-movies-spotlight--pixels--teil-3',
'info_dict': {
'id': 'local_playlist-4e760566473c4c8c5344',
'ext': 'mp4',
@ -466,25 +464,11 @@ class MTVDEIE(MTVServicesInfoExtractor):
},
'skip': 'Das Video kann zur Zeit nicht abgespielt werden.',
}]
_GEO_COUNTRIES = ['DE']
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
playlist = self._parse_json(
self._search_regex(
r'window\.pagePlaylist\s*=\s*(\[.+?\]);\n', webpage, 'page playlist'),
video_id)
def _mrss_url(item):
return item['mrss'] + item.get('mrssvars', '')
# news pages contain single video in playlist with different id
if len(playlist) == 1:
return self._get_videos_info_from_url(_mrss_url(playlist[0]), video_id)
for item in playlist:
item_id = item.get('id')
if item_id and compat_str(item_id) == video_id:
return self._get_videos_info_from_url(_mrss_url(item), video_id)
def _get_feed_query(self, uri):
return {
'arcEp': 'mtv.de',
'mgid': uri,
}

View File

@ -1,73 +1,56 @@
# coding: utf-8
from __future__ import unicode_literals
import os.path
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
xpath_text,
)
class MySpassIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?myspass\.de/.*'
_VALID_URL = r'https?://(?:www\.)?myspass\.de/([^/]+/)*(?P<id>\d+)'
_TEST = {
'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
'md5': '0b49f4844a068f8b33f4b7c88405862b',
'info_dict': {
'id': '11741',
'ext': 'mp4',
'description': 'Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?',
'title': 'Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2',
'description': 'Wer kann in die Fußstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?',
'title': '17.02.2013 - Die Highlights, Teil 2',
},
}
def _real_extract(self, url):
META_DATA_URL_TEMPLATE = 'http://www.myspass.de/myspass/includes/apps/video/getvideometadataxml.php?id=%s'
video_id = self._match_id(url)
# video id is the last path element of the URL
# usually there is a trailing slash, so also try the second but last
url_path = compat_urllib_parse_urlparse(url).path
url_parent_path, video_id = os.path.split(url_path)
if not video_id:
_, video_id = os.path.split(url_parent_path)
# get metadata
metadata_url = META_DATA_URL_TEMPLATE % video_id
metadata = self._download_xml(
metadata_url, video_id, transform_source=lambda s: s.strip())
'http://www.myspass.de/myspass/includes/apps/video/getvideometadataxml.php?id=' + video_id,
video_id)
# extract values from metadata
url_flv_el = metadata.find('url_flv')
if url_flv_el is None:
raise ExtractorError('Unable to extract download url')
video_url = url_flv_el.text
title_el = metadata.find('title')
if title_el is None:
raise ExtractorError('Unable to extract title')
title = title_el.text
format_id_el = metadata.find('format_id')
if format_id_el is None:
format = 'mp4'
else:
format = format_id_el.text
description_el = metadata.find('description')
if description_el is not None:
description = description_el.text
else:
description = None
imagePreview_el = metadata.find('imagePreview')
if imagePreview_el is not None:
thumbnail = imagePreview_el.text
else:
thumbnail = None
title = xpath_text(metadata, 'title', fatal=True)
video_url = xpath_text(metadata, 'url_flv', 'download url', True)
video_id_int = int(video_id)
for group in re.search(r'/myspass2009/\d+/(\d+)/(\d+)/(\d+)/', video_url).groups():
group_int = int(group)
if group_int > video_id_int:
video_url = video_url.replace(
group, compat_str(group_int // video_id_int))
return {
'id': video_id,
'url': video_url,
'title': title,
'format': format,
'thumbnail': thumbnail,
'description': description,
'thumbnail': xpath_text(metadata, 'imagePreview'),
'description': xpath_text(metadata, 'description'),
'duration': parse_duration(xpath_text(metadata, 'duration')),
'series': xpath_text(metadata, 'format'),
'season_number': int_or_none(xpath_text(metadata, 'season')),
'season_id': xpath_text(metadata, 'season_id'),
'episode': title,
'episode_number': int_or_none(xpath_text(metadata, 'episode')),
}

View File

@ -9,10 +9,13 @@ from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
js_to_json,
parse_duration,
smuggle_url,
try_get,
unified_timestamp,
update_url_query,
int_or_none,
)
@ -85,27 +88,41 @@ class NBCIE(AdobePassIE):
permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + compat_urllib_parse_unquote(permalink)
response = self._download_json(
'https://api.nbc.com/v3/videos', video_id, query={
'filter[permalink]': permalink,
'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
'fields[shows]': 'shortTitle',
'include': 'show.shortTitle',
'https://friendship.nbc.co/v2/graphql', video_id, query={
'query': '''{
page(name: "%s", platform: web, type: VIDEO, userId: "0") {
data {
... on VideoPageData {
description
episodeNumber
keywords
locked
mpxAccountId
mpxGuid
rating
seasonNumber
secondaryTitle
seriesShortTitle
}
}
}
}''' % permalink,
})
video_data = response['data'][0]['attributes']
video_data = response['data']['page']['data']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
video_id = video_data['guid']
title = video_data['title']
if video_data.get('entitlement') == 'auth':
video_id = video_data['mpxGuid']
title = video_data['secondaryTitle']
if video_data.get('locked'):
resource = self._get_mvpd_resource(
'nbcentertainment', title, video_id,
video_data.get('vChipRating'))
video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query(
'http://link.theplatform.com/s/NnzsPC/media/guid/2410887629/' + video_id,
'http://link.theplatform.com/s/NnzsPC/media/guid/%s/%s' % (video_data.get('mpxAccountId') or '2410887629', video_id),
query), {'force_smil_url': True})
return {
'_type': 'url_transparent',
@ -117,7 +134,7 @@ class NBCIE(AdobePassIE):
'season_number': int_or_none(video_data.get('seasonNumber')),
'episode_number': int_or_none(video_data.get('episodeNumber')),
'episode': title,
'series': try_get(response, lambda x: x['included'][0]['attributes']['shortTitle']),
'series': video_data.get('seriesShortTitle'),
'ie_key': 'ThePlatform',
}
@ -272,13 +289,12 @@ class NBCNewsIE(ThePlatformIE):
_TESTS = [
{
'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
'md5': 'af1adfa51312291a017720403826bb64',
'md5': 'cf4bc9e6ce0130f00f545d80ecedd4bf',
'info_dict': {
'id': '269389891880',
'ext': 'mp4',
'title': 'How Twitter Reacted To The Snowden Interview',
'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
'uploader': 'NBCU-NEWS',
'timestamp': 1401363060,
'upload_date': '20140529',
},
@ -296,28 +312,26 @@ class NBCNewsIE(ThePlatformIE):
},
{
'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
'md5': '73135a2e0ef819107bbb55a5a9b2a802',
'md5': '8eb831eca25bfa7d25ddd83e85946548',
'info_dict': {
'id': '394064451844',
'ext': 'mp4',
'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
'timestamp': 1423104900,
'uploader': 'NBCU-NEWS',
'upload_date': '20150205',
},
},
{
'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
'md5': 'a49e173825e5fcd15c13fc297fced39d',
'md5': '4a8c4cec9e1ded51060bdda36ff0a5c0',
'info_dict': {
'id': '529953347624',
'id': 'n431456',
'ext': 'mp4',
'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
'description': 'md5:c8be487b2d80ff0594c005add88d8351',
'title': "Volkswagen U.S. Chief: We 'Totally Screwed Up'",
'description': 'md5:d22d1281a24f22ea0880741bb4dd6301',
'upload_date': '20150922',
'timestamp': 1442917800,
'uploader': 'NBCU-NEWS',
},
},
{
@ -330,7 +344,6 @@ class NBCNewsIE(ThePlatformIE):
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
'upload_date': '20160420',
'timestamp': 1461152093,
'uploader': 'NBCU-NEWS',
},
},
{
@ -344,7 +357,6 @@ class NBCNewsIE(ThePlatformIE):
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1406937606,
'upload_date': '20140802',
'uploader': 'NBCU-NEWS',
},
},
{
@ -360,20 +372,61 @@ class NBCNewsIE(ThePlatformIE):
def _real_extract(self, url):
video_id = self._match_id(url)
if not video_id.isdigit():
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id)
data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.+});', webpage,
'bootstrap json'), video_id)
video_id = data['article']['content'][0]['primaryMedia']['video']['mpxMetadata']['id']
data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.+});', webpage,
'bootstrap json'), video_id, js_to_json)
video_data = try_get(data, lambda x: x['video']['current'], dict)
if not video_data:
video_data = data['article']['content'][0]['primaryMedia']['video']
title = video_data['headline']['primary']
formats = []
for va in video_data.get('videoAssets', []):
public_url = va.get('publicUrl')
if not public_url:
continue
if '://link.theplatform.com/' in public_url:
public_url = update_url_query(public_url, {'format': 'redirect'})
format_id = va.get('format')
if format_id == 'M3U':
formats.extend(self._extract_m3u8_formats(
public_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
continue
tbr = int_or_none(va.get('bitrate'), 1000)
if tbr:
format_id += '-%d' % tbr
formats.append({
'format_id': format_id,
'url': public_url,
'width': int_or_none(va.get('width')),
'height': int_or_none(va.get('height')),
'tbr': tbr,
'ext': 'mp4',
})
self._sort_formats(formats)
subtitles = {}
closed_captioning = video_data.get('closedCaptioning')
if closed_captioning:
for cc_url in closed_captioning.values():
if not cc_url:
continue
subtitles.setdefault('en', []).append({
'url': cc_url,
})
return {
'_type': 'url_transparent',
'id': video_id,
# http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {'byId': video_id}),
'ie_key': 'ThePlatformFeed',
'title': title,
'description': try_get(video_data, lambda x: x['description']['primary']),
'thumbnail': try_get(video_data, lambda x: x['primaryImage']['url']['primary']),
'duration': parse_duration(video_data.get('duration')),
'timestamp': unified_timestamp(video_data.get('datePublished')),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -108,7 +108,7 @@ class NexxIE(InfoExtractor):
@staticmethod
def _extract_domain_id(webpage):
mobj = re.search(
r'<script\b[^>]+\bsrc=["\'](?:https?:)?//require\.nexx(?:\.cloud|cdn\.com)/(?P<id>\d+)',
r'<script\b[^>]+\bsrc=["\'](?:https?:)?//(?:require|arc)\.nexx(?:\.cloud|cdn\.com)/(?:sdk/)?(?P<id>\d+)',
webpage)
return mobj.group('id') if mobj else None
@ -123,7 +123,7 @@ class NexxIE(InfoExtractor):
domain_id = NexxIE._extract_domain_id(webpage)
if domain_id:
for video_id in re.findall(
r'(?is)onPLAYReady.+?_play\.init\s*\(.+?\s*,\s*["\']?(\d+)',
r'(?is)onPLAYReady.+?_play\.(?:init|(?:control\.)?addPlayer)\s*\(.+?\s*,\s*["\']?(\d+)',
webpage):
entries.append(
'https://api.nexx.cloud/v3/%s/videos/byid/%s'
@ -295,13 +295,23 @@ class NexxIE(InfoExtractor):
video = None
def find_video(result):
if isinstance(result, dict):
return result
elif isinstance(result, list):
vid = int(video_id)
for v in result:
if try_get(v, lambda x: x['general']['ID'], int) == vid:
return v
return None
response = self._download_json(
'https://arc.nexx.cloud/api/video/%s.json' % video_id,
video_id, fatal=False)
if response and isinstance(response, dict):
result = response.get('result')
if result and isinstance(result, dict):
video = result
if result:
video = find_video(result)
# not all videos work via arc, e.g. nexx:741:1269984
if not video:
@ -348,7 +358,7 @@ class NexxIE(InfoExtractor):
request_token = hashlib.md5(
''.join((op, domain_id, secret)).encode('utf-8')).hexdigest()
video = self._call_api(
result = self._call_api(
domain_id, 'videos/%s/%s' % (op, video_id), video_id, data={
'additionalfields': 'language,channel,actors,studio,licenseby,slug,subtitle,teaser,description',
'addInteractionOptions': '1',
@ -363,6 +373,7 @@ class NexxIE(InfoExtractor):
'X-Request-CID': cid,
'X-Request-Token': request_token,
})
video = find_video(result)
general = video['general']
title = general['title']
@ -399,8 +410,8 @@ class NexxIE(InfoExtractor):
class NexxEmbedIE(InfoExtractor):
_VALID_URL = r'https?://embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?P<id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'https?://embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?:video/)?(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://embed.nexx.cloud/748/KC1614647Z27Y7T?autoplay=1',
'md5': '16746bfc28c42049492385c989b26c4a',
'info_dict': {
@ -409,7 +420,6 @@ class NexxEmbedIE(InfoExtractor):
'title': 'Nervenkitzel Achterbahn',
'alt_title': 'Karussellbauer in Deutschland',
'description': 'md5:ffe7b1cc59a01f585e0569949aef73cc',
'release_year': 2005,
'creator': 'SPIEGEL TV',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 2761,
@ -420,7 +430,10 @@ class NexxEmbedIE(InfoExtractor):
'format': 'bestvideo',
'skip_download': True,
},
}
}, {
'url': 'https://embed.nexx.cloud/11888/video/DSRTO7UVOX06S7',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):

View File

@ -10,6 +10,18 @@ class NhkVodIE(InfoExtractor):
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
# clip
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999011/',
'md5': '256a1be14f48d960a7e61e2532d95ec3',
'info_dict': {
'id': 'a95j5iza',
'ext': 'mp4',
'title': "Dining with the Chef - Chef Saito's Family recipe: MENCHI-KATSU",
'description': 'md5:5aee4a9f9d81c26281862382103b0ea5',
'timestamp': 1565965194,
'upload_date': '20190816',
},
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}, {
@ -19,7 +31,7 @@ class NhkVodIE(InfoExtractor):
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}]
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sodesdlist/v7/episode/%s/%s/all%s.json'
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7/episode/%s/%s/all%s.json'
def _real_extract(self, url):
lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
@ -28,7 +40,10 @@ class NhkVodIE(InfoExtractor):
is_video = m_type == 'video'
episode = self._download_json(
self._API_URL_TEMPLATE % ('v' if is_video else 'r', episode_id, lang, '/all' if is_video else ''),
self._API_URL_TEMPLATE % (
'v' if is_video else 'r',
'clip' if episode_id[:4] == '9999' else 'esd',
episode_id, lang, '/all' if is_video else ''),
episode_id, query={'apikey': 'EJfK8jdS57GqlupFgAfAAwr573q01y6k'})['data']['episodes'][0]
title = episode.get('sub_title_clean') or episode['sub_title']
@ -60,8 +75,8 @@ class NhkVodIE(InfoExtractor):
if is_video:
info.update({
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:' + episode['vod_id'],
'ie_key': 'Piksel',
'url': 'https://player.piksel.com/v/refid/nhkworld/prefid/' + episode['vod_id'],
})
else:
audio = episode['audio']

View File

@ -85,7 +85,8 @@ class NickBrIE(MTVServicesInfoExtractor):
https?://
(?:
(?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br|
(?:www\.)?nickjr\.[a-z]{2}
(?:www\.)?nickjr\.[a-z]{2}|
(?:www\.)?nickelodeonjunior\.fr
)
/(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?\#.]+)
'''
@ -101,6 +102,9 @@ class NickBrIE(MTVServicesInfoExtractor):
}, {
'url': 'http://www.nickjr.de/blaze-und-die-monster-maschinen/videos/f6caaf8f-e4e8-4cc1-b489-9380d6dcd059/',
'only_matching': True,
}, {
'url': 'http://www.nickelodeonjunior.fr/paw-patrol-la-pat-patrouille/videos/episode-401-entier-paw-patrol/',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -45,7 +45,11 @@ class NineNowIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
page_data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.*?});', webpage,
'page data'), display_id)
'page data', default='{}'), display_id, fatal=False)
if not page_data:
page_data = self._parse_json(self._parse_json(self._search_regex(
r'window\.__data\s*=\s*JSON\.parse\s*\(\s*(".+?")\s*\)\s*;',
webpage, 'page data'), display_id), display_id)
for kind in ('episode', 'clip'):
current_key = page_data.get(kind, {}).get(

View File

@ -5,13 +5,12 @@ import re
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import unescapeHTML
class NintendoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nintendo\.com/games/detail/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?nintendo\.com/(?:games/detail|nintendo-direct)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nintendo.com/games/detail/yEiAzhU2eQI1KZ7wOHhngFoAHc1FpHwj',
'url': 'https://www.nintendo.com/games/detail/duck-hunt-wii-u/',
'info_dict': {
'id': 'MzMmticjp0VPzO3CCj4rmFOuohEuEWoW',
'ext': 'flv',
@ -28,7 +27,19 @@ class NintendoIE(InfoExtractor):
'id': 'tokyo-mirage-sessions-fe-wii-u',
'title': 'Tokyo Mirage Sessions ♯FE',
},
'playlist_count': 3,
'playlist_count': 4,
}, {
'url': 'https://www.nintendo.com/nintendo-direct/09-04-2019/',
'info_dict': {
'id': 'J2bXdmaTE6fe3dWJTPcc7m23FNbc_A1V',
'ext': 'mp4',
'title': 'Switch_ROS_ND0904-H264.mov',
'duration': 2324.758,
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}]
def _real_extract(self, url):
@ -39,8 +50,11 @@ class NintendoIE(InfoExtractor):
entries = [
OoyalaIE._build_url_result(m.group('code'))
for m in re.finditer(
r'class=(["\'])embed-video\1[^>]+data-video-code=(["\'])(?P<code>(?:(?!\2).)+)\2',
webpage)]
r'data-(?:video-id|directVideoId)=(["\'])(?P<code>(?:(?!\1).)+)\1', webpage)]
title = self._html_search_regex(
r'(?s)<(?:span|div)[^>]+class="(?:title|wrapper)"[^>]*>.*?<h1>(.+?)</h1>',
webpage, 'title', fatal=False)
return self.playlist_result(
entries, page_id, unescapeHTML(self._og_search_title(webpage, fatal=False)))
entries, page_id, title)

View File

@ -25,9 +25,14 @@ class NonkTubeIE(NuevoBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._extract_nuevo(
'https://www.nonktube.com/media/nuevo/econfig.php?key=%s'
% video_id, video_id)
webpage = self._download_webpage(url, video_id)
info['age_limit'] = 18
title = self._og_search_title(webpage)
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
info.update({
'id': video_id,
'title': title,
'age_limit': 18,
})
return info

View File

@ -406,7 +406,7 @@ class NRKTVSerieBaseIE(InfoExtractor):
def _extract_series(self, webpage, display_id, fatal=True):
config = self._parse_json(
self._search_regex(
(r'INITIAL_DATA_*\s*=\s*({.+?})\s*;',
(r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
display_id, fatal=False)

View File

@ -23,8 +23,8 @@ class NRLTVIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
q_data = self._parse_json(self._search_regex(
r"(?s)q-data='({.+?})'", webpage, 'player data'), display_id)
q_data = self._parse_json(self._html_search_regex(
r'(?s)q-data="({.+?})"', webpage, 'player data'), display_id)
ooyala_id = q_data['videoId']
return self.url_result(
'ooyala:' + ooyala_id, 'Ooyala', ooyala_id, q_data.get('title'))

View File

@ -3,9 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
xpath_text,
int_or_none,
strip_or_none,
unescapeHTML,
xpath_text,
)
@ -47,10 +48,10 @@ class NTVRuIE(InfoExtractor):
'duration': 1496,
},
}, {
'url': 'http://www.ntv.ru/kino/Koma_film',
'md5': 'f825770930937aa7e5aca0dc0d29319a',
'url': 'https://www.ntv.ru/kino/Koma_film/m70281/o336036/video/',
'md5': 'e9c7cde24d9d3eaed545911a04e6d4f4',
'info_dict': {
'id': '1007609',
'id': '1126480',
'ext': 'mp4',
'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»',
@ -68,6 +69,10 @@ class NTVRuIE(InfoExtractor):
'thumbnail': r're:^http://.*\.jpg',
'duration': 2590,
},
}, {
# Schemeless file URL
'url': 'https://www.ntv.ru/video/1797442',
'only_matching': True,
}]
_VIDEO_ID_REGEXES = [
@ -96,37 +101,31 @@ class NTVRuIE(InfoExtractor):
'http://www.ntv.ru/vi%s/' % video_id,
video_id, 'Downloading video XML')
title = clean_html(xpath_text(player, './data/title', 'title', fatal=True))
description = clean_html(xpath_text(player, './data/description', 'description'))
title = strip_or_none(unescapeHTML(xpath_text(player, './data/title', 'title', fatal=True)))
video = player.find('./data/video')
video_id = xpath_text(video, './id', 'video id')
thumbnail = xpath_text(video, './splash', 'thumbnail')
duration = int_or_none(xpath_text(video, './totaltime', 'duration'))
view_count = int_or_none(xpath_text(video, './views', 'view count'))
token = self._download_webpage(
'http://stat.ntv.ru/services/access/token',
video_id, 'Downloading access token')
formats = []
for format_id in ['', 'hi', 'webm']:
file_ = video.find('./%sfile' % format_id)
if file_ is None:
file_ = xpath_text(video, './%sfile' % format_id)
if not file_:
continue
size = video.find('./%ssize' % format_id)
if file_.startswith('//'):
file_ = self._proto_relative_url(file_)
elif not file_.startswith('http'):
file_ = 'http://media.ntv.ru/vod/' + file_
formats.append({
'url': 'http://media2.ntv.ru/vod/%s&tok=%s' % (file_.text, token),
'filesize': int_or_none(size.text if size is not None else None),
'url': file_,
'filesize': int_or_none(xpath_text(video, './%ssize' % format_id)),
})
self._sort_formats(formats)
return {
'id': video_id,
'id': xpath_text(video, './id'),
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'description': strip_or_none(unescapeHTML(xpath_text(player, './data/description'))),
'thumbnail': xpath_text(video, './splash'),
'duration': int_or_none(xpath_text(video, './totaltime')),
'view_count': int_or_none(xpath_text(video, './views')),
'formats': formats,
}

View File

@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
@ -121,6 +123,13 @@ class OdnoklassnikiIE(InfoExtractor):
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
start_time = int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('fromTime', [None])[0])

View File

@ -20,6 +20,8 @@ from ..utils import (
class OnetBaseIE(InfoExtractor):
_URL_BASE_RE = r'https?://(?:(?:www\.)?onet\.tv|onet100\.vod\.pl)/[a-z]/'
def _search_mvp_id(self, webpage):
return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
@ -45,7 +47,7 @@ class OnetBaseIE(InfoExtractor):
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
for format_type, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
@ -56,21 +58,31 @@ class OnetBaseIE(InfoExtractor):
if not video_url:
continue
ext = determine_ext(video_url)
if format_id == 'ism':
if format_id.startswith('ism'):
formats.extend(self._extract_ism_formats(
video_url, video_id, 'mss', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
elif format_id.startswith('hls'):
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
http_f = {
'url': video_url,
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
}
if format_type == 'audio':
http_f['vcodec'] = 'none'
else:
http_f.update({
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'vbr': float_or_none(f.get('video_bitrate')),
})
formats.append(http_f)
self._sort_formats(formats)
meta = video.get('meta', {})
@ -105,12 +117,12 @@ class OnetMVPIE(OnetBaseIE):
class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
_VALID_URL = OnetBaseIE._URL_BASE_RE + r'[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv'
_TEST = {
_TESTS = [{
'url': 'http://onet.tv/k/openerfestival/open-er-festival-2016-najdziwniejsze-wymagania-gwiazd/qbpyqc',
'md5': 'e3ffbf47590032ac3f27249204173d50',
'md5': '436102770fb095c75b8bb0392d3da9ff',
'info_dict': {
'id': 'qbpyqc',
'display_id': 'open-er-festival-2016-najdziwniejsze-wymagania-gwiazd',
@ -120,7 +132,10 @@ class OnetIE(OnetBaseIE):
'upload_date': '20160705',
'timestamp': 1467721580,
},
}
}, {
'url': 'https://onet100.vod.pl/k/openerfestival/open-er-festival-2016-najdziwniejsze-wymagania-gwiazd/qbpyqc',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -140,18 +155,21 @@ class OnetIE(OnetBaseIE):
class OnetChannelIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/(?P<id>[a-z]+)(?:[?#]|$)'
_VALID_URL = OnetBaseIE._URL_BASE_RE + r'(?P<id>[a-z]+)(?:[?#]|$)'
IE_NAME = 'onet.tv:channel'
_TEST = {
_TESTS = [{
'url': 'http://onet.tv/k/openerfestival',
'info_dict': {
'id': 'openerfestival',
'title': 'Open\'er Festival Live',
'description': 'Dziękujemy, że oglądaliście transmisje. Zobaczcie nasze relacje i wywiady z artystami.',
'title': "Open'er Festival",
'description': "Tak było na Open'er Festival 2016! Oglądaj nasze reportaże i wywiady z artystami.",
},
'playlist_mincount': 46,
}
'playlist_mincount': 35,
}, {
'url': 'https://onet100.vod.pl/k/openerfestival',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id = self._match_id(url)
@ -173,7 +191,7 @@ class OnetChannelIE(OnetBaseIE):
'Downloading channel %s - add --no-playlist to just download video %s' % (
channel_id, video_name))
matches = re.findall(
r'<a[^>]+href=[\'"](https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/[0-9a-z-]+/[0-9a-z]+)',
r'<a[^>]+href=[\'"](%s[a-z]+/[0-9a-z-]+/[0-9a-z]+)' % self._URL_BASE_RE,
webpage)
entries = [
self.url_result(video_link, OnetIE.ie_key())

View File

@ -4,12 +4,8 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
float_or_none,
mimetype2ext,
)
from ..compat import compat_str
from ..utils import js_to_json
class OnionStudiosIE(InfoExtractor):
@ -17,14 +13,16 @@ class OnionStudiosIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.onionstudios.com/videos/hannibal-charges-forward-stops-for-a-cocktail-2937',
'md5': '719d1f8c32094b8c33902c17bcae5e34',
'md5': '5a118d466d62b5cd03647cf2c593977f',
'info_dict': {
'id': '2937',
'id': '3459881',
'ext': 'mp4',
'title': 'Hannibal charges forward, stops for a cocktail',
'description': 'md5:545299bda6abf87e5ec666548c6a9448',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'The A.V. Club',
'uploader_id': 'the-av-club',
'uploader': 'a.v. club',
'upload_date': '20150619',
'timestamp': 1434728546,
},
}, {
'url': 'http://www.onionstudios.com/embed?id=2855&autoplay=true',
@ -44,38 +42,12 @@ class OnionStudiosIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://www.onionstudios.com/video/%s.json' % video_id, video_id)
title = video_data['title']
formats = []
for source in video_data.get('sources', []):
source_url = source.get('url')
if not source_url:
continue
ext = mimetype2ext(source.get('content_type')) or determine_ext(source_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
tbr = int_or_none(source.get('bitrate'))
formats.append({
'format_id': ext + ('-%d' % tbr if tbr else ''),
'url': source_url,
'width': int_or_none(source.get('width')),
'tbr': tbr,
'ext': ext,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': video_data.get('poster_url'),
'uploader': video_data.get('channel_name'),
'uploader_id': video_data.get('channel_slug'),
'duration': float_or_none(video_data.get('duration', 1000)),
'tags': video_data.get('tags'),
'formats': formats,
}
webpage = self._download_webpage(
'http://onionstudios.com/embed/dc94dc2899fe644c0e7241fa04c1b732.js',
video_id)
mcp_id = compat_str(self._parse_json(self._search_regex(
r'window\.mcpMapping\s*=\s*({.+?});', webpage,
'MCP Mapping'), video_id, js_to_json)[video_id]['mcp_id'])
return self.url_result(
'http://kinja.com/ajax/inset/iframe?id=mcp-' + mcp_id,
'KinjaEmbed', mcp_id)

Some files were not shown because too many files have changed in this diff Show More