Compare commits

..

170 Commits

Author SHA1 Message Date
Sergey M․
cefe42c412 release 2018.12.09 2018-12-09 23:11:32 +07:00
Sergey M․
24cc64254c [ChangeLog] Actualize
[ci skip]
2018-12-09 23:08:16 +07:00
Sergey M․
9e02c2c704 [YoutubeDLCookieJar] Add test for keeping session cookies 2018-12-09 22:57:00 +07:00
Sergey M․
5ee7ae5c75 [teachable] Add support for teachable based platform sites (closes #5451, closes #18150, closes #18272) 2018-12-09 22:36:36 +07:00
Remita Amine
3ad6dabd33 [aenetworks] add support for History Vault(closes #18460) 2018-12-09 10:04:00 +01:00
Remita Amine
5f47a60c5d [imgur] improve gallery and album detection and extraction(closes #9133)(closes #16577)(closes #17223)(closes #18404) 2018-12-09 09:35:17 +01:00
Sergey M․
1bab343704 [YoutubeDL] Introduce YoutubeDLCookieJar and clarify the rationale behind session cookies (closes #12929) 2018-12-09 06:47:49 +07:00
aegamesi
1d88b3e6e6 [YoutubeDL] Recognize expires=0 as session cookies and send session cookies with requests 2018-12-09 06:05:37 +07:00
Sergey M․
9235b5091c [iprima] Relax _VALID_URL (closes #18453) 2018-12-09 00:01:11 +07:00
Remita Amine
c3c098dcf2 [hotstar] fix video data extraction(closes #18386) 2018-12-07 18:52:01 +01:00
Alexander Seiler
8c5879715f [ard:mediathek] Fix title and description extraction (closes #18349) 2018-12-07 03:41:02 +07:00
ealgase
ebb0449049 [xvideos] Switch to HTTPS (closes #18422) 2018-12-07 03:36:08 +07:00
Sergey M․
dfe0a3a9d2 [lecturio] Add extractor (closes #18405) 2018-12-07 03:27:11 +07:00
Sergey M․
c976873c5b [nrktv:series] Add support for extra materials 2018-12-07 00:54:58 +07:00
Sergey M․
15699ec8b0 [nrktv:season,series] Fix extraction and update tests (closes #17159, closes #17258) 2018-12-07 00:49:24 +07:00
Sergey M․
33cc1ea586 [nrktv] Relax _VALID_URL (closes #18304, closes #18387) 2018-12-07 00:00:06 +07:00
v-delta
ae9d77dab5 [yourporn] Fix extraction (closes #18424) 2018-12-06 23:24:35 +07:00
Remita Amine
8bb0c9cc16 [tbs] fix info extraction(fixes #18403) 2018-12-05 07:03:00 +01:00
Remita Amine
5547014ad9 [gamespot] add support reviews URLs 2018-12-02 20:01:56 +01:00
Sergey M․
ab896fa894 release 2018.12.03 2018-12-03 00:10:20 +07:00
Sergey M․
1fa59a928e [ChangeLog] Actualize
[ci skip]
2018-12-03 00:06:54 +07:00
Sergey M․
ce18a19be9 [tiktok] Improve extraction and add support for user pages (closes #18135) 2018-12-02 02:42:56 +07:00
Ken Swenson
1ead840d2c [tiktok] Add extractor (closes #18108) 2018-12-02 02:42:56 +07:00
Alexander Seiler
aa374bc78e [utils] Fix random_birthday to generate existing dates only 2018-12-02 00:05:15 +07:00
Sergey M․
3430ff9b07 [pornhub] Use actual URL host for requests (closes #18359) 2018-12-01 16:45:51 +07:00
Hakim Boyles
f012823082 [lynda] Fix authentication (closes #18158) 2018-11-30 01:20:27 +07:00
Jimm Stout
16597c2f94 [gfycat] Update API endpoint (closes #18333) 2018-11-30 01:07:07 +07:00
Sergey M․
adbbdefc81 [hotstar] Add support for alternative app state layout (closes #18320) 2018-11-30 00:48:15 +07:00
Alexander Seiler
053e5b12b2 [azmedien] Fix extraction (closes #18334) 2018-11-30 00:12:18 +07:00
Remita Amine
d9df8f120b [vimeo] extract VHX subtitles 2018-11-28 20:13:36 +01:00
Remita Amine
ca01d17884 [vimeo] Add support for VHX(Vimeo OTT)(#14835) 2018-11-28 19:53:45 +01:00
Alexander Seiler
d19600df07 [joj] Fix extraction (closes #18280) 2018-11-24 22:14:27 +07:00
Sergey M․
641e86e3cf [wistia] Add support for fast.wistia.com (closes #18287) 2018-11-24 21:47:41 +07:00
Sergey M․
6864855eb1 [tests] Fix invalid escape sequences 2018-11-23 00:43:42 +07:00
Sergey M․
d861a9d581 release 2018.11.23 2018-11-23 00:16:45 +07:00
Sergey M․
66173211c4 [ChangeLog] Actualize
[ci skip]
2018-11-23 00:14:43 +07:00
Remita Amine
6f2883a2df [mixcloud] base64 decode before decryption 2018-11-21 23:25:38 +01:00
Remita Amine
560020da30 [mixcloud] fallback to hardcoded decryption key(closes #18016) 2018-11-21 23:21:05 +01:00
Sergey M․
305ce767d5 [travis] Add python 3.8-dev build 2018-11-22 02:34:35 +07:00
Sergey M․
157eef3e63 [setup.py] Add python 3.8 classifier 2018-11-22 02:08:41 +07:00
Sergey M․
bd2d553c7b [travis] Add python 3.7 build 2018-11-22 02:01:39 +07:00
Sergey M․
af60e81e3c [setup.py] Add more relevant classifiers 2018-11-22 02:01:39 +07:00
Remita Amine
a843464a7e [nbc] fix NBCNews article extraction(closes #16194) 2018-11-21 12:10:06 +01:00
Remita Amine
6866f24494 [foxsports] update test 2018-11-21 12:08:46 +01:00
Remita Amine
4e33e0792a [loc] update test 2018-11-21 12:00:50 +01:00
Remita Amine
35328915b5 [foxsports] fix extraction(closes #17543) 2018-11-21 09:46:36 +01:00
Remita Amine
6c882aa899 [loc] relax _VALID_URL regex and improve formats extraction 2018-11-21 09:46:36 +01:00
Sergey M․
183417a50f [ciscolive:search] Add support for pagination 2018-11-21 06:10:43 +07:00
Sergey M․
6a6d7f0641 [ciscolive] Fix issues and improve extraction (closes #17984) 2018-11-21 06:10:39 +07:00
Austin de Coup-Crank
05bd5e9c77 [ciscolive] Add extractor 2018-11-21 06:10:30 +07:00
Alexander Seiler
15ed5a2784 [nzz] Relax kaltura regex 2018-11-21 02:50:40 +07:00
Remita Amine
2e1280ed43 [sixplay] fix format extraction 2018-11-19 18:15:51 +01:00
Remita Amine
8578ea4dcb [bitchute] use _html_search_regex for title extraction 2018-11-18 16:15:27 +01:00
Remita Amine
9b27a78a88 [kaltura] limit requested MediaEntry fields 2018-11-18 16:15:27 +01:00
Sergey M․
964b989dc8 [americastestkitchen] Add support for zype embeds (closes #18225) 2018-11-18 20:45:25 +07:00
Sergey M․
f97c099131 [pornhub] Move test to correct place 2018-11-18 11:14:46 +07:00
Sergey M․
1febf99da1 [pornhub] Add pornhub.net alias 2018-11-18 06:26:08 +07:00
Sergey M․
4167148fa4 [nova:embed] Fix extraction (closes #18222) 2018-11-18 01:11:10 +07:00
Sergey M․
5bb0479269 release 2018.11.18 2018-11-18 00:11:54 +07:00
Sergey M․
02df855e13 [ChangeLog] Actualize
[ci skip]
2018-11-18 00:07:40 +07:00
Sergey M․
006374e3ae [wwe] Fix issues, extract subtitles and add support for playlists (closes #14781, closes #17450) 2018-11-17 23:59:52 +07:00
mttronc
11d19ff503 [wwe] Add extractor 2018-11-17 23:59:46 +07:00
aviperes
a640c4d226 [vk] Detect geo restriction 2018-11-17 20:59:13 +07:00
Sergey M․
d0058c76d5 [openload] Use original host during extraction (closes #18211) 2018-11-17 16:59:20 +07:00
NeroBurner
0919cd4d01 [atvat] Fix extraction (closes #18041) 2018-11-17 00:18:50 +07:00
Sergey M․
2599956c9f [rte] Add support for new API endpoint (closes #18206) 2018-11-17 00:07:59 +07:00
Sergey M․
9b9b3501c5 [tnaflixnetwork:embed] Fix extraction (closes #18205) 2018-11-16 22:55:35 +07:00
Sergey M․
730c0d12a0 [picarto] Extract more metadata (closes #16518) 2018-11-11 16:17:59 +07:00
Patrick Griffis
f17a24a6df [picarto] Use API and add token support
This is just more reliable than trying to extract
it from the page itself.
2018-11-11 16:17:56 +07:00
Sergey M․
83852e57bf [zype] Add extractor (closes #18143) 2018-11-11 00:44:49 +07:00
Sergey M․
96a91b1551 [vivo] Fix extraction (closes #18139) 2018-11-10 23:37:27 +07:00
Sergey M․
cab26223bf [ruutu] Update API endpoint (closes #18138) 2018-11-10 15:26:04 +07:00
Sergey M․
532782ade1 release 2018.11.07 2018-11-07 01:38:25 +07:00
Sergey M․
f81d44aab6 [ChangeLog] Actualize
[ci skip]
2018-11-07 09:58:08 +07:00
Sergey M․
2511eee215 [youtube] Add another JS signature function name regex (closes #18091, closes #18093, closes #18094) 2018-11-07 09:55:59 +07:00
Remita Amine
0df514f07e [facebook] fix tahoe request(closes #17171) 2018-11-06 21:22:27 +01:00
Sergey M․
432cd48410 [cliphinter] Fix extraction (closes #18083) 2018-11-06 23:29:42 +07:00
Sergey M․
c0345b825f [youtube:playlist] Add support for invidio.us (closes #18077) 2018-11-05 19:08:39 +07:00
Sergey M․
2004e2210b [osnateltv] Update host 2018-11-05 17:09:57 +07:00
Sergey M․
16d896b2a7 [zattoo] Arrange API hosts for derived extractors (closes #18035) 2018-11-05 15:52:46 +07:00
Sergey M․
22e07ce502 [README.md] Improve documentation on safe metadata extraction and add more examples 2018-11-05 00:11:36 +07:00
Sergey M․
dbdaaa231a [youtube] Add fallback metadata extraction from videoDetails (closes #18052) 2018-11-03 06:26:16 +07:00
Sergey M․
38c32dbf19 release 2018.11.03 2018-11-03 02:57:48 +07:00
Sergey M․
a085410936 [ChangeLog] Actualize
[ci skip]
2018-11-03 02:56:14 +07:00
Sergey M․
6895ea4d3f [laola1tv:embed] Set correct stream access URL scheme (closes #16341) 2018-11-03 02:45:37 +07:00
Alexander Seiler
faac1c1f70 [ehftv] Add extractor (closes #15408) 2018-11-03 02:45:32 +07:00
Sergey M․
573531dcfb [azmedien] Simplify (closes #17746) 2018-11-03 01:33:42 +07:00
Alexander Seiler
da56fb631f [azmedien] Adopt to major site redesign (closes #17745) 2018-11-03 01:33:36 +07:00
Xiao Di Guan
95e42d7336 [extractor/common] Ensure response handle is not prematurely closed before it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447) 2018-11-03 01:18:20 +07:00
Sergey M․
cf0db4d997 [twitcasting] Improve extraction and fix issues (closes #17981) 2018-11-03 00:28:25 +07:00
sichuan-pepper
036f905161 [twitcasting] Add extractor 2018-11-03 00:28:21 +07:00
Sergey M․
4b6aca17cc [orf:tvthek] Improve extraction and remove unused code (closes #17956, closes #18024) 2018-11-02 23:47:23 +07:00
Sebastian Haas
c620694c97 [orf:tvthek] Fix extraction (closes #17737)
use _extract_m3u8_formats and _extract_f4m_formats helper functions
closes #17737
2018-11-02 23:47:17 +07:00
yonaikerlol
061ea3a776 [openload] Add support for oload.fun 2018-11-02 23:08:41 +07:00
Sergey M․
c70ba664f1 [njpwworld] Fix authentication (closes #17427) 2018-11-01 01:40:02 +07:00
Sergey M․
f16679e843 [cnbc:video] Fix _VALID_URL (#17110) 2018-10-30 04:57:51 +07:00
Remita Amine
b14475724b [linkedin:learning:course] use url_transparent type for playlist entries 2018-10-29 21:49:12 +01:00
Remita Amine
aa7e974a2a [linkedin:learning] Add new extractor(closes #13545) 2018-10-29 19:28:29 +01:00
Sergey M․
9aac22c195 [theplatform] Improve error detection (#13222) 2018-10-30 00:22:44 +07:00
Sergey M․
94db1f7f3b [cnbc] Simplify extraction (closes #14280, closes #17110) 2018-10-29 23:55:55 +07:00
gfabiano
ffa7b2bfee [cbnc] Add support for new URL schema (closes #14193) 2018-10-29 23:54:14 +07:00
Sergey M․
2943397e87 [aparat] Improve extraction and extract more metadata (closes #17445, closes #18008) 2018-10-29 23:32:45 +07:00
Ali Irani
9c4a83a1be [aparat] Fix extraction 2018-10-29 23:31:24 +07:00
Sergey M․
9ff558f67f release 2018.10.29 2018-10-29 00:39:29 +07:00
Sergey M․
c2fe21efaa [ChangeLog] Actualize
[ci skip]
2018-10-29 00:38:06 +07:00
Sergey M․
476cf548e1 [sportbox] Improve extraction, add support for matchtv.ru and fix video id (closes #17978) 2018-10-29 00:21:50 +07:00
Sergey M․
bebef10909 [extractor/common] Add validation for JSON-LD URLs 2018-10-29 00:21:45 +07:00
Alexey Trofimov
4c237ab787 [sportbox] Fix extraction 2018-10-29 00:21:40 +07:00
Sergey M․
a1d1c63678 [screencast] Improve extraction (closes #14617, closes #17990) 2018-10-28 23:26:30 +07:00
sichuan-pepper
1fafb32984 [screencast] Fix extraction (closes #14590) 2018-10-28 23:26:30 +07:00
yonaikerlol
c901cc38e5 [openload] Add support for oload.icu 2018-10-28 22:51:29 +07:00
Sergey M․
022218f2f0 [ivi] Add support for ivi.tv 2018-10-28 22:49:10 +07:00
Sergey M․
08c7d3dade [crunchyroll] Improve extraction failsafeness (closes #17991) 2018-10-28 22:12:54 +07:00
Remita Amine
5e733b066a [dailymail] fix format extraction(closes #17976) 2018-10-26 05:41:57 +01:00
Remita Amine
7d9e858132 [viewster] reduce format requests 2018-10-26 05:40:49 +01:00
Remita Amine
b99b0bcfa0 [cwtv] handle api errors(closes #17905) 2018-10-17 06:22:56 +01:00
Sergey M․
baeabf7742 [rutube] Use geo verification headers (closes #17897) 2018-10-16 23:19:44 +07:00
Remita Amine
582797d780 [brightcove] remove unused variable 2018-10-15 20:47:12 +01:00
Remita Amine
160c2773f6 [brightcove:legacy] add another fall back to brightcove:new 2018-10-15 18:41:57 +01:00
Remita Amine
ee5fe42e44 [brightcove:legacy] fall back to brightcove:new(#13912) 2018-10-15 17:54:38 +01:00
Remita Amine
f0ee386851 [tv3] remove extractor(closes #10461)(closes #15339) 2018-10-15 16:26:29 +01:00
Remita Amine
a94e7c195e [ted] fix extraction for http and rtmp formats(closes #5941)(closes #17572)(closes #17894) 2018-10-15 11:52:01 +01:00
yonaikerlol
5d90a8a5f3 [openload] Add support for oload.cc 2018-10-07 20:05:45 +07:00
Remita Amine
19a352854f [patreon] extract post_file url(#17792) 2018-10-05 22:45:04 +01:00
Remita Amine
c9d891f19a [patreon] fix extraction(closes #14502)(closes #10471) 2018-10-05 20:11:01 +01:00
Sergey M․
d96f976b0c release 2018.10.05 2018-10-05 02:31:30 +07:00
Sergey M․
2e7ed29e34 [ChangeLog] Actualize
[ci skip]
2018-10-05 02:29:52 +07:00
Sergey M․
21c1a00dd7 [pluralsight] Improve authentication (closes #17762) 2018-10-05 02:27:14 +07:00
Sergey M․
0082f44a08 [dailymotion] Improve metadata extraction (closes #17706) 2018-10-05 02:03:25 +07:00
Enes
f60b9803a4 [dailymotion] Fix extraction (closes #17699) 2018-10-05 02:03:16 +07:00
Remita Amine
d98cb62e55 [crunchyroll] switch to HTTPS for RpcApi(closes #17749) 2018-10-02 19:43:06 +01:00
Remita Amine
05e7c184da [hotstar] fix extraction in python 2(closes #17696) 2018-10-02 06:07:43 +01:00
Sergey M․
66d106f270 [philharmoniedeparis] Fix extraction and add support for pad.philharmoniedeparis.fr (closes #17705) 2018-10-01 23:29:49 +07:00
Sergey M․
3c7da54c92 [jamendo] Add support for licensing.jamendo.com (closes #17724) 2018-10-01 22:05:18 +07:00
Enes
9795d93316 [openload] Add support for oload.cloud (closes #17710) 2018-10-01 21:48:59 +07:00
Sergey M․
365343131d [pluralsight] Fix subtitles extraction (closes #17726, closes #17728) 2018-10-01 21:45:24 +07:00
Sergey M․
85fa80d5f9 [vimeo] Add another config regex (closes #17690) 2018-10-01 21:13:43 +07:00
Remita Amine
245cbb33bc [spike] fix Paramount Network extraction(closes #17677) 2018-09-28 15:13:25 +01:00
Remita Amine
85cd69adcb [hotstar] fix extraction(closes #14694)(closes #14931)(closes #17637) 2018-09-26 08:13:36 +01:00
Sergey M․
4c89a675dd release 2018.09.26 2018-09-26 11:58:25 +07:00
Sergey M․
3d3499742c [ChangeLog] Actualize
[ci skip]
2018-09-26 11:56:15 +07:00
Sergey M․
c17e100b96 [pluralsight] Fix subtitles extraction (closes #17671) 2018-09-26 09:27:40 +07:00
Sergey M․
8fd12a0831 [mediaset] Improve embed support (closes #17668) 2018-09-26 05:38:41 +07:00
Sergey M․
60ce0c67fd [README.md] Document channel meta fields for output template 2018-09-25 23:43:41 +07:00
Sergey M․
cd5a74a28e [youtube] Add support for invidio.us (closes #17613) 2018-09-24 00:14:49 +07:00
Sergey M․
f6d7f7b474 [zattoo] Add support for more zattoo platform sites 2018-09-23 23:35:37 +07:00
Alexander Seiler
21160a1792 [zattoo] Fix extraction (closes #17175) 2018-09-23 21:34:47 +07:00
Sergey M․
4ac73fc170 [popcorntv] Remove debug output 2018-09-19 22:16:43 +07:00
Leonardo Taccari
28fcb7b061 [raiplay:playlist] Remove a debug leftover print() 2018-09-19 09:48:39 +07:00
Sergey M․
3a9c928426 release 2018.09.18 2018-09-18 01:46:36 +07:00
Sergey M․
d9b1cec171 [ChangeLog] Actualize
[ci skip]
2018-09-18 01:44:55 +07:00
Sergey M․
e504b09070 [adobepass] Don't pollute default headers dict 2018-09-17 22:15:27 +07:00
Sergey M․
c8f6ab8c38 [udemy] Don't pollute default headers dict 2018-09-17 22:14:53 +07:00
Sergey M․
e2f61598be [twitch] Don't pollute default headers dict 2018-09-17 22:14:28 +07:00
Sergey M․
c11485162b [youtube] Don't pollute default query dict (closes #17593) 2018-09-17 22:13:39 +07:00
Sergey M․
1084563eaa [crunchyroll] Prefer hardsubless formats and formats in locale language 2018-09-16 23:54:58 +07:00
Sergey M․
d9b0d118ad [vrv] Make format ids deterministic 2018-09-16 23:54:58 +07:00
Remita Amine
8b40c92724 [vimeo] redirect to feature url only in the case of a trailer(closes #14591) 2018-09-15 06:31:15 +01:00
Sergey M․
3661ebf2b6 [pornhub] Extract upload date (closes #17574) 2018-09-15 02:04:43 +07:00
Sergey M․
0e7b8d3eac [extractor/common] Fix typos 2018-09-15 01:53:01 +07:00
Sergey M․
127103b643 [porntube] Extract channel meta fields 2018-09-15 01:25:32 +07:00
Sergey M․
d03beddf0f [vimeo] Extract channel meta fields 2018-09-15 01:24:48 +07:00
Sergey M․
dd4c449219 [youtube] Extract channel meta fields (closes #9676, closes #12939) 2018-09-15 01:24:26 +07:00
Sergey M․
6f1f59f39c [extractor/common] Introduce channel meta fields 2018-09-15 01:23:36 +07:00
Sergey M․
15bf2ca0da [porntube] Fix extraction (closes #17541) 2018-09-15 00:50:36 +07:00
Sergey M․
0f2aa0dcaa [asiancrush] Fix extraction (closes #15630) 2018-09-14 23:56:03 +07:00
Sergey M․
db348e8849 [twitch:clips] Extend _VALID_URL (closes #17559) 2018-09-14 23:26:38 +07:00
Sergey M․
f5b0175349 [vzaar] Add support for HLS 2018-09-11 02:41:05 +07:00
Sergey M․
79facb2773 [tube8] Fix metadata extraction (closes #17520) 2018-09-11 02:32:01 +07:00
Sergey M․
96dbf70de6 [eporner] Extract JSON-LD (closes #17519) 2018-09-11 02:32:01 +07:00
103 changed files with 3261 additions and 1134 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.09.10*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.09.10**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.12.09*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.12.09**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.09.10
[debug] youtube-dl version 2018.12.09
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -15,6 +15,18 @@ env:
- YTDL_TEST_SET=download
matrix:
include:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
fast_finish: true

View File

@@ -296,5 +296,26 @@ title = self._search_regex(
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples
##### Safely extract optional description from parsed JSON
```python
description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
```
##### Safely extract more optional metadata
```python
video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
description = video.get('summary')
duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```

187
ChangeLog
View File

@@ -1,3 +1,190 @@
version 2018.12.09
Core
* [YoutubeDL] Keep session cookies in cookie file between runs
* [YoutubeDL] Recognize session cookies with expired set to 0 (#12929)
Extractors
+ [teachable] Add support for teachable platform sites (#5451, #18150, #18272)
+ [aenetworks] Add support for historyvault.com (#18460)
* [imgur] Improve gallery and album detection and extraction (#9133, #16577,
#17223, #18404)
* [iprima] Relax URL regular expression (#18453)
* [hotstar] Fix video data extraction (#18386)
* [ard:mediathek] Fix title and description extraction (#18349, #18371)
* [xvideos] Switch to HTTPS (#18422, #18427)
+ [lecturio] Add support for lecturio.com (#18405)
+ [nrktv:series] Add support for extra materials
* [nrktv:season,series] Fix extraction (#17159, #17258)
* [nrktv] Relax URL regular expression (#18304, #18387)
* [yourporn] Fix extraction (#18424, #18425)
* [tbs] Fix info extraction (#18403)
+ [gamespot] Add support for review URLs
version 2018.12.03
Core
* [utils] Fix random_birthday to generate existing dates only (#18284)
Extractors
+ [tiktok] Add support for tiktok.com (#18108, #18135)
* [pornhub] Use actual URL host for requests (#18359)
* [lynda] Fix authentication (#18158, #18217)
* [gfycat] Update API endpoint (#18333, #18343)
+ [hotstar] Add support for alternative app state layout (#18320)
* [azmedien] Fix extraction (#18334, #18336)
+ [vimeo] Add support for VHX (Vimeo OTT) (#14835)
* [joj] Fix extraction (#18280, #18281)
+ [wistia] Add support for fast.wistia.com (#18287)
version 2018.11.23
Core
+ [setup.py] Add more relevant classifiers
Extractors
* [mixcloud] Fallback to hardcoded decryption key (#18016)
* [nbc:news] Fix article extraction (#16194)
* [foxsports] Fix extraction (#17543)
* [loc] Relax regular expression and improve formats extraction
+ [ciscolive] Add support for ciscolive.cisco.com (#17984)
* [nzz] Relax kaltura regex (#18228)
* [sixplay] Fix formats extraction
* [bitchute] Improve title extraction
* [kaltura] Limit requested MediaEntry fields
+ [americastestkitchen] Add support for zype embeds (#18225)
+ [pornhub] Add pornhub.net alias
* [nova:embed] Fix extraction (#18222)
version 2018.11.18
Extractors
+ [wwe] Extract subtitles
+ [wwe] Add support for playlistst (#14781)
+ [wwe] Add support for wwe.com (#14781, #17450)
* [vk] Detect geo restriction (#17767)
* [openload] Use original host during extraction (#18211)
* [atvat] Fix extraction (#18041)
+ [rte] Add support for new API endpoint (#18206)
* [tnaflixnetwork:embed] Fix extraction (#18205)
* [picarto] Use API and add token support (#16518)
+ [zype] Add support for player.zype.com (#18143)
* [vivo] Fix extraction (#18139)
* [ruutu] Update API endpoint (#18138)
version 2018.11.07
Extractors
+ [youtube] Add another JS signature function name regex (#18091, #18093,
#18094)
* [facebook] Fix tahoe request (#17171)
* [cliphunter] Fix extraction (#18083)
+ [youtube:playlist] Add support for invidio.us (#18077)
* [zattoo] Arrange API hosts for derived extractors (#18035)
+ [youtube] Add fallback metadata extraction from videoDetails (#18052)
version 2018.11.03
Core
* [extractor/common] Ensure response handle is not prematurely closed before
it can be read if it matches expected_status (#17195, #17846, #17447)
Extractors
* [laola1tv:embed] Set correct stream access URL scheme (#16341)
+ [ehftv] Add support for ehftv.com (#15408)
* [azmedien] Adopt to major site redesign (#17745, #17746)
+ [twitcasting] Add support for twitcasting.tv (#17981)
* [orf:tvthek] Fix extraction (#17737, #17956, #18024)
+ [openload] Add support for oload.fun (#18045)
* [njpwworld] Fix authentication (#17427)
+ [linkedin:learning] Add support for linkedin.com/learning (#13545)
* [theplatform] Improve error detection (#13222)
* [cnbc] Simplify extraction (#14280, #17110)
+ [cbnc] Add support for new URL schema (#14193)
* [aparat] Improve extraction and extract more metadata (#17445, #18008)
* [aparat] Fix extraction
version 2018.10.29
Core
+ [extractor/common] Add validation for JSON-LD URLs
Extractors
+ [sportbox] Add support for matchtv.ru
* [sportbox] Fix extraction (#17978)
* [screencast] Fix extraction (#14590, #14617, #17990)
+ [openload] Add support for oload.icu
+ [ivi] Add support for ivi.tv
* [crunchyroll] Improve extraction failsafeness (#17991)
* [dailymail] Fix formats extraction (#17976)
* [viewster] Reduce format requests
* [cwtv] Handle API errors (#17905)
+ [rutube] Use geo verification headers (#17897)
+ [brightcove:legacy] Add fallbacks to brightcove:new (#13912)
- [tv3] Remove extractor (#10461, #15339)
* [ted] Fix extraction for HTTP and RTMP formats (#5941, #17572, #17894)
+ [openload] Add support for oload.cc (#17823)
+ [patreon] Extract post_file URL (#17792)
* [patreon] Fix extraction (#14502, #10471)
version 2018.10.05
Extractors
* [pluralsight] Improve authentication (#17762)
* [dailymotion] Fix extraction (#17699)
* [crunchyroll] Switch to HTTPS for RpcApi (#17749)
+ [philharmoniedeparis] Add support for pad.philharmoniedeparis.fr (#17705)
* [philharmoniedeparis] Fix extraction (#17705)
+ [jamendo] Add support for licensing.jamendo.com (#17724)
+ [openload] Add support for oload.cloud (#17710)
* [pluralsight] Fix subtitles extraction (#17726, #17728)
+ [vimeo] Add another config regular expression (#17690)
* [spike] Fix Paramount Network extraction (#17677)
* [hotstar] Fix extraction (#14694, #14931, #17637)
version 2018.09.26
Extractors
* [pluralsight] Fix subtitles extraction (#17671)
* [mediaset] Improve embed support (#17668)
+ [youtube] Add support for invidio.us (#17613)
+ [zattoo] Add support for more zattoo platform sites
* [zattoo] Fix extraction (#17175, #17542)
version 2018.09.18
Core
+ [extractor/common] Introduce channel meta fields
Extractors
* [adobepass] Don't pollute default headers dict
* [udemy] Don't pollute default headers dict
* [twitch] Don't pollute default headers dict
* [youtube] Don't pollute default query dict (#17593)
* [crunchyroll] Prefer hardsubless formats and formats in locale language
* [vrv] Make format ids deterministic
* [vimeo] Fix ondemand playlist extraction (#14591)
+ [pornhub] Extract upload date (#17574)
+ [porntube] Extract channel meta fields
+ [vimeo] Extract channel meta fields
+ [youtube] Extract channel meta fields (#9676, #12939)
* [porntube] Fix extraction (#17541)
* [asiancrush] Fix extraction (#15630)
+ [twitch:clips] Extend URL regular expression (closes #17559)
+ [vzaar] Add support for HLS
* [tube8] Fix metadata extraction (#17520)
* [eporner] Extract JSON-LD (#17519)
version 2018.09.10
Core

View File

@@ -511,6 +511,8 @@ The basic usage is not to set any template arguments when downloading a single f
- `timestamp` (numeric): UNIX timestamp of the moment the video became available
- `upload_date` (string): Video upload date (YYYYMMDD)
- `uploader_id` (string): Nickname or id of the video uploader
- `channel` (string): Full name of the channel the video is uploaded on
- `channel_id` (string): Id of the channel
- `location` (string): Physical location where the video was filmed
- `duration` (numeric): Length of the video in seconds
- `view_count` (numeric): How many users have watched the video on the platform
@@ -1166,7 +1168,28 @@ title = self._search_regex(
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples
##### Safely extract optional description from parsed JSON
```python
description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
```
##### Safely extract more optional metadata
```python
video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
description = video.get('summary')
duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```
# EMBEDDING YOUTUBE-DL

View File

@@ -33,7 +33,7 @@
- **AdobeTVShow**
- **AdobeTVVideo**
- **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **afreecatv**: afreecatv.com
- **AirMozilla**
- **AliExpressLive**
@@ -84,8 +84,6 @@
- **awaan:season**
- **awaan:video**
- **AZMedien**: AZ Medien videos
- **AZMedienPlaylist**: AZ Medien playlists
- **AZMedienShowPlaylist**: AZ Medien show playlists
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
@@ -98,6 +96,7 @@
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BBVTV**
- **Beatport**
- **Beeg**
- **BehindKink**
@@ -164,6 +163,8 @@
- **chirbit**
- **chirbit:profile**
- **Cinchcast**
- **CiscoLiveSearch**
- **CiscoLiveSession**
- **CJSW**
- **cliphunter**
- **Clippit**
@@ -177,6 +178,7 @@
- **Clyp**
- **cmt.com**
- **CNBC**
- **CNBCVideo**
- **CNN**
- **CNNArticle**
- **CNNBlogs**
@@ -250,7 +252,9 @@
- **EchoMsk**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
- **ehftv**
- **eHow**
- **EinsUndEinsTV**
- **Einthusan**
- **eitb.tv**
- **EllenTube**
@@ -268,6 +272,7 @@
- **EsriVideo**
- **Europa**
- **EveryonesMixtape**
- **EWETV**
- **ExpoTV**
- **Expressen**
- **ExtremeTube**
@@ -327,6 +332,7 @@
- **Gfycat**
- **GiantBomb**
- **Giga**
- **GlattvisionTV**
- **Glide**: Glide mobile video messages (glide.me)
- **Globo**
- **GloboArticle**
@@ -356,7 +362,7 @@
- **HitRecord**
- **HornBunny**
- **HotNewHipHop**
- **HotStar**
- **hotstar**
- **hotstar:playlist**
- **Howcast**
- **HowStuffWorks**
@@ -370,7 +376,8 @@
- **imdb**: Internet Movie Database trailers
- **imdb:list**: Internet Movie Database lists
- **Imgur**
- **ImgurAlbum**
- **imgur:album**
- **imgur:gallery**
- **Ina**
- **Inc**
- **IndavideoEmbed**
@@ -429,6 +436,8 @@
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
- **Lecturio**
- **LecturioCourse**
- **LEGO**
- **Lemonde**
- **Lenta**
@@ -441,6 +450,8 @@
- **limelight:channel**
- **limelight:channel_list**
- **LineTV**
- **linkedin:learning**
- **linkedin:learning:course**
- **LiTV**
- **LiveLeak**
- **LiveLeakEmbed**
@@ -494,6 +505,7 @@
- **Mixer:vod**
- **MLB**
- **Mnet**
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
@@ -525,6 +537,7 @@
- **Myvi**
- **MyVidster**
- **MyviEmbed**
- **MyVisionTV**
- **n-tv.de**
- **natgeo**
- **natgeo:episodeguide**
@@ -550,6 +563,7 @@
- **netease:program**: 网易云音乐 - 电台节目
- **netease:singer**: 网易云音乐 - 歌手
- **netease:song**: 网易云音乐
- **NetPlus**
- **Netzkino**
- **Newgrounds**
- **NewgroundsPlaylist**
@@ -626,6 +640,7 @@
- **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek
- **OsnatelTV**
- **PacktPub**
- **PacktPubCourse**
- **PandaTV**: 熊猫TV
@@ -686,6 +701,7 @@
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuantumTV**
- **Quickline**
- **QuicklineLive**
- **R7**
@@ -753,6 +769,7 @@
- **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses
- **SAKTV**
- **Sapo**: SAPO Vídeos
- **savefrom.net**
- **SBS**: sbs.com.au
@@ -808,7 +825,7 @@
- **Spiegeltv**
- **sport.francetvinfo.fr**
- **Sport5**
- **SportBoxEmbed**
- **SportBox**
- **SportDeutschland**
- **SpringboardPlatform**
- **Sprout**
@@ -839,6 +856,8 @@
- **TastyTrade**
- **TBS**
- **TDSLifeway**
- **Teachable**
- **TeachableCourse**
- **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel**
@@ -871,6 +890,8 @@
- **ThisAmericanLife**
- **ThisAV**
- **ThisOldHouse**
- **TikTok**
- **TikTokUser**
- **tinypic**: tinypic.com videos
- **TMZ**
- **TMZArticle**
@@ -899,7 +920,6 @@
- **TV2**
- **tv2.hu**
- **TV2Article**
- **TV3**
- **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+
- **TVA**
@@ -921,6 +941,7 @@
- **TVPlayer**
- **TVPlayHome**
- **Tweakers**
- **TwitCasting**
- **twitch:chapter**
- **twitch:clips**
- **twitch:profile**
@@ -945,8 +966,6 @@
- **uol.com.br**
- **uplynk**
- **uplynk:preplay**
- **Upskill**
- **UpskillCourse**
- **Urort**: NRK P3 Urørt
- **URPlay**
- **USANetwork**
@@ -965,6 +984,7 @@
- **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com**
- **vhx:embed**
- **Viafree**
- **vice**
- **vice:article**
@@ -1035,12 +1055,14 @@
- **vrv**
- **vrv:series**
- **VShare**
- **VTXTV**
- **vube**: Vube.com
- **VuClip**
- **VVVVID**
- **VyboryMos**
- **Vzaar**
- **Walla**
- **WalyTV**
- **washingtonpost**
- **washingtonpost:article**
- **wat.tv**
@@ -1066,6 +1088,7 @@
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal
- **WSJArticle**
- **WWE**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
@@ -1125,3 +1148,4 @@
- **ZDF**
- **ZDFChannel**
- **zingmp3**: mp3.zing.vn
- **Zype**

View File

@@ -124,6 +124,8 @@ setup(
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'License :: Public Domain',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
@@ -132,6 +134,13 @@ setup(
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: Implementation',
'Programming Language :: Python :: Implementation :: CPython',
'Programming Language :: Python :: Implementation :: IronPython',
'Programming Language :: Python :: Implementation :: Jython',
'Programming Language :: Python :: Implementation :: PyPy',
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@@ -7,6 +7,7 @@ import json
import os.path
import re
import types
import ssl
import sys
import youtube_dl.extractor
@@ -244,3 +245,12 @@ def expect_warnings(ydl, warnings_re):
real_warning(w)
ydl.report_warning = _report_warning
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]

View File

@@ -9,11 +9,30 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, expect_dict, expect_value
from youtube_dl.compat import compat_etree_fromstring
from test.helper import FakeYDL, expect_dict, expect_value, http_server_port
from youtube_dl.compat import compat_etree_fromstring, compat_http_server
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
import threading
TEAPOT_RESPONSE_STATUS = 418
TEAPOT_RESPONSE_BODY = "<h1>418 I'm a teapot</h1>"
class InfoExtractorTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
def do_GET(self):
if self.path == '/teapot':
self.send_response(TEAPOT_RESPONSE_STATUS)
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write(TEAPOT_RESPONSE_BODY.encode())
else:
assert False
class TestIE(InfoExtractor):
@@ -743,6 +762,25 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
for i in range(len(entries)):
expect_dict(self, entries[i], expected_entries[i])
def test_response_with_expected_status_returns_content(self):
# Checks for mitigations against the effects of
# <https://bugs.python.org/issue15002> that affect Python 3.4.1+, which
# manifest as `_download_webpage`, `_download_xml`, `_download_json`,
# or the underlying `_download_webpage_handle` returning no content
# when a response matches `expected_status`.
httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), InfoExtractorTestRequestHandler)
port = http_server_port(httpd)
server_thread = threading.Thread(target=httpd.serve_forever)
server_thread.daemon = True
server_thread.start()
(content, urlh) = self.ie._download_webpage_handle(
'http://127.0.0.1:%d/teapot' % port, None,
expected_status=TEAPOT_RESPONSE_STATUS)
self.assertEqual(content, TEAPOT_RESPONSE_BODY)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,34 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
import os
import re
import sys
import tempfile
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.utils import YoutubeDLCookieJar
class TestYoutubeDLCookieJar(unittest.TestCase):
def test_keep_session_cookies(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/session_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
tf = tempfile.NamedTemporaryFile(delete=False)
try:
cookiejar.save(filename=tf.name, ignore_discard=True, ignore_expires=True)
temp = tf.read().decode('utf-8')
self.assertTrue(re.search(
r'www\.foobar\.foobar\s+FALSE\s+/\s+TRUE\s+0\s+YoutubeDLExpiresEmpty\s+YoutubeDLExpiresEmptyValue', temp))
self.assertTrue(re.search(
r'www\.foobar\.foobar\s+FALSE\s+/\s+TRUE\s+0\s+YoutubeDLExpires0\s+YoutubeDLExpires0Value', temp))
finally:
tf.close()
os.remove(tf.name)
if __name__ == '__main__':
unittest.main()

View File

@@ -39,7 +39,7 @@ class TestCompat(unittest.TestCase):
def test_compat_expanduser(self):
old_home = os.environ.get('HOME')
test_str = 'C:\Documents and Settings\тест\Application Data'
test_str = r'C:\Documents and Settings\тест\Application Data'
compat_setenv('HOME', test_str)
self.assertEqual(compat_expanduser('~'), test_str)
compat_setenv('HOME', old_home or '')

View File

@@ -9,26 +9,16 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import try_rm
from test.helper import http_server_port, try_rm
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server
from youtube_dl.downloader.http import HttpFD
from youtube_dl.utils import encodeFilename
import ssl
import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
TEST_SIZE = 10 * 1024

View File

@@ -8,6 +8,7 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import http_server_port
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server, compat_urllib_request
import ssl
@@ -16,15 +17,6 @@ import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass

View File

@@ -14,4 +14,4 @@ from youtube_dl.postprocessor import MetadataFromTitlePP
class TestMetadataFromTitle(unittest.TestCase):
def test_format_to_regex(self):
pp = MetadataFromTitlePP(None, '%(title)s - %(artist)s')
self.assertEqual(pp._titleregex, '(?P<title>.+)\ \-\ (?P<artist>.+)')
self.assertEqual(pp._titleregex, r'(?P<title>.+)\ \-\ (?P<artist>.+)')

View File

@@ -0,0 +1,6 @@
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
www.foobar.foobar FALSE / TRUE 0 YoutubeDLExpires0 YoutubeDLExpires0Value
www.foobar.foobar FALSE / TRUE 0 YoutubeDLExpiresEmpty YoutubeDLExpiresEmptyValue

View File

@@ -88,6 +88,7 @@ from .utils import (
version_tuple,
write_json_file,
write_string,
YoutubeDLCookieJar,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
)
@@ -558,7 +559,7 @@ class YoutubeDL(object):
self.restore_console_title()
if self.params.get('cookiefile') is not None:
self.cookiejar.save()
self.cookiejar.save(ignore_discard=True, ignore_expires=True)
def trouble(self, message=None, tb=None):
"""Determine action to take when a download problem appears.
@@ -2297,10 +2298,9 @@ class YoutubeDL(object):
self.cookiejar = compat_cookiejar.CookieJar()
else:
opts_cookiefile = expand_path(opts_cookiefile)
self.cookiejar = compat_cookiejar.MozillaCookieJar(
opts_cookiefile)
self.cookiejar = YoutubeDLCookieJar(opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK):
self.cookiejar.load()
self.cookiejar.load(ignore_discard=True, ignore_expires=True)
cookie_processor = YoutubeDLCookieProcessor(self.cookiejar)
if opts_proxy is not None:

View File

@@ -1325,8 +1325,8 @@ class AdobePassIE(InfoExtractor):
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
def _download_webpage_handle(self, *args, **kwargs):
headers = kwargs.get('headers', {})
headers.update(self.geo_verification_headers())
headers = self.geo_verification_headers()
headers.update(kwargs.get('headers', {}))
kwargs['headers'] = headers
return super(AdobePassIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))

View File

@@ -22,18 +22,19 @@ class AENetworksBaseIE(ThePlatformIE):
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?P<domain>
(?:history|aetv|mylifetime|lifetimemovieclub)\.com|
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/
(?:
shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|
movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?|
specials/(?P<special_display_id>[^/]+)/full-special
specials/(?P<special_display_id>[^/]+)/full-special|
collections/[^/]+/(?P<collection_display_id>[^/]+)
)
'''
_TESTS = [{
@@ -80,6 +81,9 @@ class AENetworksIE(AENetworksBaseIE):
}, {
'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special',
'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/america-the-story-of-us/westward',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
@@ -90,9 +94,9 @@ class AENetworksIE(AENetworksBaseIE):
}
def _real_extract(self, url):
domain, show_path, movie_display_id, special_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id or special_display_id
webpage = self._download_webpage(url, display_id)
domain, show_path, movie_display_id, special_display_id, collection_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id or special_display_id or collection_display_id
webpage = self._download_webpage(url, display_id, headers=self.geo_verification_headers())
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)

View File

@@ -43,10 +43,6 @@ class AmericasTestKitchenIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
video_data = self._parse_json(
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
@@ -58,7 +54,18 @@ class AmericasTestKitchenIE(InfoExtractor):
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
external_id = ep_data.get('external_id') or ep_meta['external_id']
zype_id = ep_meta.get('zype_id')
if zype_id:
embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
ie_key = 'Zype'
else:
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
external_id = ep_data.get('external_id') or ep_meta['external_id']
embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
ie_key = 'Kaltura'
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@@ -72,8 +79,8 @@ class AmericasTestKitchenIE(InfoExtractor):
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, external_id),
'ie_key': 'Kaltura',
'url': embed_url,
'ie_key': ie_key,
'title': title,
'description': description,
'thumbnail': thumbnail,

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
merge_dicts,
mimetype2ext,
url_or_none,
)
@@ -12,59 +13,83 @@ from ..utils import (
class AparatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.aparat.com/v/wP8On',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
'title': 'تیم گلکسی 11 - زومیت',
'age_limit': 0,
'description': 'md5:096bdabcdcc4569f2b8a5e903a3b3028',
'duration': 231,
'timestamp': 1387394859,
'upload_date': '20131218',
'view_count': int,
},
# 'skip': 'Extremely unreliable',
}
}, {
# multiple formats
'url': 'https://www.aparat.com/v/8dflw/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
# Provides more metadata
webpage = self._download_webpage(url, video_id, fatal=False)
title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
if not webpage:
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
file_list = self._parse_json(
options = self._parse_json(
self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage,
'file list'),
r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
webpage, 'options', group='value'),
video_id)
player = options['plugins']['sabaPlayerPlugin']
formats = []
for item in file_list[0]:
file_url = url_or_none(item.get('file'))
if not file_url:
continue
ext = mimetype2ext(item.get('type'))
label = item.get('label')
formats.append({
'url': file_url,
'ext': ext,
'format_id': label or ext,
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', label or '', 'height', default=None)),
})
self._sort_formats(formats)
for sources in player['multiSRC']:
for item in sources:
if not isinstance(item, dict):
continue
file_url = url_or_none(item.get('src'))
if not file_url:
continue
item_type = item.get('type')
if item_type == 'application/vnd.apple.mpegurl':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
ext = mimetype2ext(item.get('type'))
label = item.get('label')
formats.append({
'url': file_url,
'ext': ext,
'format_id': 'http-%s' % (label or ext),
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', label or '', 'height',
default=None)),
})
self._sort_formats(
formats, field_preference=('height', 'width', 'tbr', 'format_id'))
thumbnail = self._search_regex(
r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
info = self._search_json_ld(webpage, video_id, default={})
return {
if not info.get('title'):
info['title'] = player['title']
return merge_dicts(info, {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'age_limit': self._family_friendly_search(webpage),
'thumbnail': url_or_none(options.get('poster')),
'duration': int_or_none(player.get('duration')),
'formats': formats,
}
})

View File

@@ -173,13 +173,18 @@ class ARDMediathekIE(InfoExtractor):
title = self._html_search_regex(
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms\.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>'],
r'<h4 class="headline">(.*?)</h4>',
r'<title[^>]*>(.*?)</title>'],
webpage, 'title')
description = self._html_search_meta(
'dcterms.abstract', webpage, 'description', default=None)
if description is None:
description = self._html_search_meta(
'description', webpage, 'meta description')
'description', webpage, 'meta description', default=None)
if description is None:
description = self._html_search_regex(
r'<p\s+class="teasertext">(.+?)</p>',
webpage, 'teaser text', default=None)
# Thumbnail is sometimes not present.
# It is in the mobile version, but that seems to use a different URL

View File

@@ -8,7 +8,6 @@ from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
urlencode_postdata,
)
@@ -34,19 +33,40 @@ class AsianCrushIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://www.asiancrush.com/wp-admin/admin-ajax.php', video_id,
data=urlencode_postdata({
'postid': video_id,
'action': 'get_channel_kaltura_vars',
}))
webpage = self._download_webpage(url, video_id)
entry_id = data['entry_id']
entry_id, partner_id, title = [None] * 3
vars = self._parse_json(
self._search_regex(
r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
default='{}'), video_id, fatal=False)
if vars:
entry_id = vars.get('entry_id')
partner_id = vars.get('partner_id')
title = vars.get('vid_label')
if not entry_id:
entry_id = self._search_regex(
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
'kaltura id', group='id')
if not partner_id:
partner_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
return self.url_result(
'kaltura:%s:%s' % (data['partner_id'], entry_id),
ie=KalturaIE.ie_key(), video_id=entry_id,
video_title=data.get('vid_label'))
'kaltura:%s:%s' % (partner_id, kaltura_id),
ie=KalturaIE.ie_key(), video_id=kaltura_id,
video_title=title)
class AsianCrushPlaylistIE(InfoExtractor):

View File

@@ -28,8 +28,10 @@ class ATVAtIE(InfoExtractor):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="([^"]+)"',
webpage, 'player data')), display_id)['config']['initial_video']
[r'flashPlayerOptions\s*=\s*(["\'])(?P<json>(?:(?!\1).)+)\1',
r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="(?P<json>[^"]+)"'],
webpage, 'player data', group='json')),
display_id)['config']['initial_video']
video_id = video_data['id']
video_title = video_data['title']

View File

@@ -1,213 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
get_element_by_class,
get_element_by_id,
strip_or_none,
urljoin,
)
class AZMedienBaseIE(InfoExtractor):
def _kaltura_video(self, partner_id, entry_id):
return self.url_result(
'kaltura:%s:%s' % (partner_id, entry_id), ie=KalturaIE.ie_key(),
video_id=entry_id)
class AZMedienIE(AZMedienBaseIE):
class AZMedienIE(InfoExtractor):
IE_DESC = 'AZ Medien videos'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
(?P<host>
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
[0-9]+-show-[^/\#]+
(?:
/[0-9]+-episode-[^/\#]+
(?:
/[0-9]+-segment-(?:[^/\#]+\#)?|
\#
)|
\#
[^/]+/
(?P<id>
[^/]+-(?P<article_id>\d+)
)
(?P<id>[^\#]+)
(?:
\#video=
(?P<kaltura_id>
[_0-9a-z]+
)
)?
'''
_TESTS = [{
# URL with 'segment'
'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
'url': 'https://www.telezueri.ch/sonntalk/bundesrats-vakanzen-eu-rahmenabkommen-133214569',
'info_dict': {
'id': '1_2444peh4',
'id': '1_anruz3wy',
'ext': 'mp4',
'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
'uploader_id': 'TeleZ?ri',
'upload_date': '20161218',
'timestamp': 1482084490,
'title': 'Bundesrats-Vakanzen / EU-Rahmenabkommen',
'uploader_id': 'TVOnline',
'upload_date': '20180930',
'timestamp': 1538328802,
},
'params': {
'skip_download': True,
},
}, {
# URL with 'segment' and fragment:
'url': 'http://www.telebaern.tv/118-show-news/14240-episode-dienstag-17-januar-2017/33666-segment-achtung-gefahr#zu-wenig-pflegerinnen-und-pfleger',
'only_matching': True
}, {
# URL with 'episode' and fragment:
'url': 'http://www.telem1.ch/47-show-sonntalk/13986-episode-soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz#soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz',
'only_matching': True
}, {
# URL with 'show' and fragment:
'url': 'http://www.telezueri.ch/66-show-sonntalk#burka-plakate-trump-putin-china-besuch',
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'<script[^>]+src=["\'](?:https?:)?//(?:[^/]+\.)?kaltura\.com(?:/[^/]+)*/(?:p|partner_id)/([0-9]+)',
webpage, 'kaltura partner id')
entry_id = self._html_search_regex(
r'<a[^>]+data-id=(["\'])(?P<id>(?:(?!\1).)+)\1[^>]+data-slug=["\']%s'
% re.escape(video_id), webpage, 'kaltura entry id', group='id')
return self._kaltura_video(partner_id, entry_id)
class AZMedienPlaylistIE(AZMedienBaseIE):
IE_DESC = 'AZ Medien playlists'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
(?P<id>[0-9]+-
(?:
show|
topic|
themen
)-[^/\#]+
(?:
/[0-9]+-episode-[^/\#]+
)?
)$
'''
_TESTS = [{
# URL with 'episode'
'url': 'http://www.telebaern.tv/118-show-news/13735-episode-donnerstag-15-dezember-2016',
'info_dict': {
'id': '118-show-news/13735-episode-donnerstag-15-dezember-2016',
'title': 'News - Donnerstag, 15. Dezember 2016',
},
'playlist_count': 9,
}, {
# URL with 'themen'
'url': 'http://www.telem1.ch/258-themen-tele-m1-classics',
'info_dict': {
'id': '258-themen-tele-m1-classics',
'title': 'Tele M1 Classics',
},
'playlist_mincount': 15,
}, {
# URL with 'topic', contains nested playlists
'url': 'http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen',
'only_matching': True,
}, {
# URL with 'show' only
'url': 'http://www.telezueri.ch/86-show-talktaeglich',
'only_matching': True
}]
_PARTNER_ID = '1719221'
def _real_extract(self, url):
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
entry_id = mobj.group('kaltura_id')
entries = []
if not entry_id:
api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
payload = {
'query': '''query VideoContext($articleId: ID!) {
article: node(id: $articleId) {
... on Article {
mainAssetRelation {
asset {
... on VideoAsset {
kalturaId
}
}
}
}
}
}''',
'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
}
json_data = self._download_json(
api_url, video_id, headers={
'Content-Type': 'application/json',
},
data=json.dumps(payload).encode())
entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id', default=None)
if partner_id:
entries = [
self._kaltura_video(partner_id, m.group('id'))
for m in re.finditer(
r'data-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage)]
if not entries:
entries = [
self.url_result(m.group('url'), ie=AZMedienIE.ie_key())
for m in re.finditer(
r'<a[^>]+data-real=(["\'])(?P<url>http.+?)\1', webpage)]
if not entries:
entries = [
# May contain nested playlists (e.g. [1]) thus no explicit
# ie_key
# 1. http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen)
self.url_result(urljoin(url, m.group('url')))
for m in re.finditer(
r'<a[^>]+name=[^>]+href=(["\'])(?P<url>/.+?)\1', webpage)]
title = self._search_regex(
r'episodeShareTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
webpage, 'title',
default=strip_or_none(get_element_by_id(
'video-title', webpage)), group='title')
return self.playlist_result(entries, show_id, title)
class AZMedienShowPlaylistIE(AZMedienBaseIE):
IE_DESC = 'AZ Medien show playlists'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
(?:
all-episodes|
alle-episoden
)/
(?P<id>[^/?#&]+)
'''
_TEST = {
'url': 'http://www.telezueri.ch/all-episodes/astrotalk',
'info_dict': {
'id': 'astrotalk',
'title': 'TeleZüri: AstroTalk - alle episoden',
'description': 'md5:4c0f7e7d741d906004266e295ceb4a26',
},
'playlist_mincount': 13,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
episodes = get_element_by_class('search-mobile-box', webpage)
entries = [self.url_result(
urljoin(url, m.group('url'))) for m in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1', episodes)]
title = self._og_search_title(webpage, fatal=False)
description = self._og_search_description(webpage)
return self.playlist_result(entries, playlist_id, title, description)
return self.url_result(
'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
ie=KalturaIE.ie_key(), video_id=entry_id)

View File

@@ -37,7 +37,7 @@ class BitChuteIE(InfoExtractor):
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.57 Safari/537.36',
})
title = self._search_regex(
title = self._html_search_regex(
(r'<[^>]+\bid=["\']video-title[^>]+>([^<]+)', r'<title>([^<]+)'),
webpage, 'title', default=None) or self._html_search_meta(
'description', webpage, 'title',

View File

@@ -1,8 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import base64
import json
import re
import struct
from .common import InfoExtractor
from .adobepass import AdobePassIE
@@ -310,6 +312,10 @@ class BrightcoveLegacyIE(InfoExtractor):
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _brightcove_new_url_result(self, publisher_id, video_id):
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
@@ -323,6 +329,28 @@ class BrightcoveLegacyIE(InfoExtractor):
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
'error message', default=None)
if error_msg is not None:
publisher_id = query.get('publisherId')
if publisher_id and publisher_id[0].isdigit():
publisher_id = publisher_id[0]
if not publisher_id:
player_key = query.get('playerKey')
if player_key and ',' in player_key[0]:
player_key = player_key[0]
else:
player_id = query.get('playerID')
if player_id and player_id[0].isdigit():
player_page = self._download_webpage(
'http://link.brightcove.com/services/player/bcpid' + player_id[0],
video_id, headers=headers, fatal=False)
if player_page:
player_key = self._search_regex(
r'<param\s+name="playerKey"\s+value="([\w~,-]+)"',
player_page, 'player key', fatal=False)
if player_key:
enc_pub_id = player_key.split(',')[1].replace('~', '=')
publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0]
if publisher_id:
return self._brightcove_new_url_result(publisher_id, video_id)
raise ExtractorError(
'brightcove said: %s' % error_msg, expected=True)
@@ -444,8 +472,12 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
return ad_info
if 'url' not in info and not info.get('formats'):
raise ExtractorError('Unable to extract video url for %s' % video_id)
if not info.get('url') and not info.get('formats'):
uploader_id = info.get('uploader_id')
if uploader_id:
info.update(self._brightcove_new_url_result(uploader_id, video_id))
else:
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info

View File

@@ -0,0 +1,142 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
clean_html,
float_or_none,
int_or_none,
try_get,
urlencode_postdata,
)
class CiscoLiveBaseIE(InfoExtractor):
# These appear to be constant across all Cisco Live presentations
# and are not tied to any user session or event
RAINFOCUS_API_URL = 'https://events.rainfocus.com/api/%s'
RAINFOCUS_API_PROFILE_ID = 'Na3vqYdAlJFSxhYTYQGuMbpafMqftalz'
RAINFOCUS_WIDGET_ID = 'n6l4Lo05R8fiy3RpUBm447dZN8uNWoye'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5647924234001/SyK2FdqjM_default/index.html?videoId=%s'
HEADERS = {
'Origin': 'https://ciscolive.cisco.com',
'rfApiProfileId': RAINFOCUS_API_PROFILE_ID,
'rfWidgetId': RAINFOCUS_WIDGET_ID,
}
def _call_api(self, ep, rf_id, query, referrer, note=None):
headers = self.HEADERS.copy()
headers['Referer'] = referrer
return self._download_json(
self.RAINFOCUS_API_URL % ep, rf_id, note=note,
data=urlencode_postdata(query), headers=headers)
def _parse_rf_item(self, rf_item):
event_name = rf_item.get('eventName')
title = rf_item['title']
description = clean_html(rf_item.get('abstract'))
presenter_name = try_get(rf_item, lambda x: x['participants'][0]['fullName'])
bc_id = rf_item['videos'][0]['url']
bc_url = self.BRIGHTCOVE_URL_TEMPLATE % bc_id
duration = float_or_none(try_get(rf_item, lambda x: x['times'][0]['length']))
location = try_get(rf_item, lambda x: x['times'][0]['room'])
if duration:
duration = duration * 60
return {
'_type': 'url_transparent',
'url': bc_url,
'ie_key': 'BrightcoveNew',
'title': title,
'description': description,
'duration': duration,
'creator': presenter_name,
'location': location,
'series': event_name,
}
class CiscoLiveSessionIE(CiscoLiveBaseIE):
_VALID_URL = r'https?://ciscolive\.cisco\.com/on-demand-library/\??[^#]*#/session/(?P<id>[^/?&]+)'
_TEST = {
'url': 'https://ciscolive.cisco.com/on-demand-library/?#/session/1423353499155001FoSs',
'md5': 'c98acf395ed9c9f766941c70f5352e22',
'info_dict': {
'id': '5803694304001',
'ext': 'mp4',
'title': '13 Smart Automations to Monitor Your Cisco IOS Network',
'description': 'md5:ec4a436019e09a918dec17714803f7cc',
'timestamp': 1530305395,
'upload_date': '20180629',
'uploader_id': '5647924234001',
'location': '16B Mezz.',
},
}
def _real_extract(self, url):
rf_id = self._match_id(url)
rf_result = self._call_api('session', rf_id, {'id': rf_id}, url)
return self._parse_rf_item(rf_result['items'][0])
class CiscoLiveSearchIE(CiscoLiveBaseIE):
_VALID_URL = r'https?://ciscolive\.cisco\.com/on-demand-library/'
_TESTS = [{
'url': 'https://ciscolive.cisco.com/on-demand-library/?search.event=ciscoliveus2018&search.technicallevel=scpsSkillLevel_aintroductory&search.focus=scpsSessionFocus_designAndDeployment#/',
'info_dict': {
'title': 'Search query',
},
'playlist_count': 5,
}, {
'url': 'https://ciscolive.cisco.com/on-demand-library/?search.technology=scpsTechnology_applicationDevelopment&search.technology=scpsTechnology_ipv6&search.focus=scpsSessionFocus_troubleshootingTroubleshooting#/',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if CiscoLiveSessionIE.suitable(url) else super(CiscoLiveSearchIE, cls).suitable(url)
@staticmethod
def _check_bc_id_exists(rf_item):
return int_or_none(try_get(rf_item, lambda x: x['videos'][0]['url'])) is not None
def _entries(self, query, url):
query['size'] = 50
query['from'] = 0
for page_num in itertools.count(1):
results = self._call_api(
'search', None, query, url,
'Downloading search JSON page %d' % page_num)
sl = try_get(results, lambda x: x['sectionList'][0], dict)
if sl:
results = sl
items = results.get('items')
if not items or not isinstance(items, list):
break
for item in items:
if not isinstance(item, dict):
continue
if not self._check_bc_id_exists(item):
continue
yield self._parse_rf_item(item)
size = int_or_none(results.get('size'))
if size is not None:
query['size'] = size
total = int_or_none(results.get('total'))
if total is not None and query['from'] + query['size'] > total:
break
query['from'] += query['size']
def _real_extract(self, url):
query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
query['type'] = 'session'
return self.playlist_result(
self._entries(query, url), playlist_title='Search query')

View File

@@ -1,19 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
_translation_table = {
'a': 'h', 'd': 'e', 'e': 'v', 'f': 'o', 'g': 'f', 'i': 'd', 'l': 'n',
'm': 'a', 'n': 'm', 'p': 'u', 'q': 't', 'r': 's', 'v': 'p', 'x': 'r',
'y': 'l', 'z': 'i',
'$': ':', '&': '.', '(': '=', '^': '&', '=': '/',
}
def _decode(s):
return ''.join(_translation_table.get(c, c) for c in s)
from ..utils import (
int_or_none,
url_or_none,
)
class CliphunterIE(InfoExtractor):
@@ -60,14 +51,14 @@ class CliphunterIE(InfoExtractor):
formats = []
for format_id, f in gexo_files.items():
video_url = f.get('url')
video_url = url_or_none(f.get('url'))
if not video_url:
continue
fmt = f.get('fmt')
height = f.get('h')
format_id = '%s_%sp' % (fmt, height) if fmt and height else format_id
formats.append({
'url': _decode(video_url),
'url': video_url,
'format_id': format_id,
'width': int_or_none(f.get('w')),
'height': int_or_none(height),

View File

@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
@@ -34,3 +35,32 @@ class CNBCIE(InfoExtractor):
{'force_smil_url': True}),
'id': video_id,
}
class CNBCVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cnbc\.com/video/(?:[^/]+/)+(?P<id>[^./?#&]+)'
_TEST = {
'url': 'https://www.cnbc.com/video/2018/07/19/trump-i-dont-necessarily-agree-with-raising-rates.html',
'info_dict': {
'id': '7000031301',
'ext': 'mp4',
'title': "Trump: I don't necessarily agree with raising rates",
'description': 'md5:878d8f0b4ebb5bb1dda3514b91b49de3',
'timestamp': 1531958400,
'upload_date': '20180719',
'uploader': 'NBCU-CNBC',
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'content_id["\']\s*:\s*["\'](\d+)', webpage, display_id,
'video id')
return self.url_result(
'http://video.cnbc.com/gallery/?video=%s' % video_id,
CNBCIE.ie_key())

View File

@@ -69,6 +69,7 @@ from ..utils import (
update_url_query,
urljoin,
url_basename,
url_or_none,
xpath_element,
xpath_text,
xpath_with_ns,
@@ -211,6 +212,11 @@ class InfoExtractor(object):
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
uploader_url: Full URL to a personal webpage of the video uploader.
channel: Full name of the channel the video is uploaded on.
Note that channel fields may or may not repeat uploader
fields. This depends on a particular extractor.
channel_id: Id of the channel.
channel_url: Full URL to a channel webpage.
location: Physical location where the video was filmed.
subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and
@@ -600,6 +606,11 @@ class InfoExtractor(object):
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status):
# Retain reference to error to prevent file object from
# being closed before it can be read. Works around the
# effects of <https://bugs.python.org/issue15002>
# introduced in Python 3.4.1.
err.fp._error = err
return err.fp
if errnote is False:
@@ -1208,10 +1219,10 @@ class InfoExtractor(object):
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
info.update({
'url': e.get('contentUrl'),
'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': e.get('thumbnailUrl') or e.get('thumbnailURL'),
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
@@ -1701,9 +1712,9 @@ class InfoExtractor(object):
# However, this is not always respected, for example, [2]
# contains EXT-X-STREAM-INF tag which references AUDIO
# rendition group but does not have CODECS and despite
# referencing audio group an audio group, it represents
# a complete (with audio and video) format. So, for such cases
# we will ignore references to rendition groups and treat them
# referencing an audio group it represents a complete
# (with audio and video) format. So, for such cases we will
# ignore references to rendition groups and treat them
# as complete formats.
if audio_group_id and codecs and f.get('vcodec') != 'none':
audio_group = groups.get(audio_group_id)

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re
import json
import xml.etree.ElementTree as etree
import zlib
from hashlib import sha1
@@ -45,7 +46,7 @@ class CrunchyrollBaseIE(InfoExtractor):
data['req'] = 'RpcApi' + method
data = compat_urllib_parse_urlencode(data).encode('utf-8')
return self._download_xml(
'http://www.crunchyroll.com/xml/',
'https://www.crunchyroll.com/xml/',
video_id, note, fatal=False, data=data, headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
@@ -398,7 +399,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'Downloading subtitles for ' + sub_name, data={
'subtitle_script_id': sub_id,
})
if sub_doc is None:
if not isinstance(sub_doc, etree.Element):
continue
sid = sub_doc.get('id')
iv = xpath_text(sub_doc, 'iv', 'subtitle iv')
@@ -445,6 +446,10 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
webpage, 'vilos media', default='{}'), video_id)
media_metadata = media.get('metadata') or {}
language = self._search_regex(
r'(?:vilos\.config\.player\.language|LOCALE)\s*=\s*(["\'])(?P<lang>(?:(?!\1).)+)\1',
webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex(
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title')
@@ -466,9 +471,22 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
formats = []
for stream in media.get('streams', []):
formats.extend(self._extract_vrv_formats(
audio_lang = stream.get('audio_lang')
hardsub_lang = stream.get('hardsub_lang')
vrv_formats = self._extract_vrv_formats(
stream.get('url'), video_id, stream.get('format'),
stream.get('audio_lang'), stream.get('hardsub_lang')))
audio_lang, hardsub_lang)
for f in vrv_formats:
if not hardsub_lang:
f['preference'] = 1
language_preference = 0
if audio_lang == language:
language_preference += 1
if hardsub_lang == language:
language_preference += 1
if language_preference:
f['language_preference'] = language_preference
formats.extend(vrv_formats)
if not formats:
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
@@ -498,7 +516,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'video_quality': stream_quality,
'current_page': url,
})
if streamdata is not None:
if isinstance(streamdata, etree.Element):
stream_info = streamdata.find('./{default}preload/stream_info')
if stream_info is not None:
stream_infos.append(stream_info)
@@ -509,7 +527,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'video_format': stream_format,
'video_encode_quality': stream_quality,
})
if stream_info is not None:
if isinstance(stream_info, etree.Element):
stream_infos.append(stream_info)
for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id')
@@ -557,7 +575,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats, ('height', 'width', 'tbr', 'fps'))
self._sort_formats(formats, ('preference', 'language_preference', 'height', 'width', 'tbr', 'fps'))
metadata = self._call_rpc_api(
'VideoPlayer_GetMediaMetadata', video_id,
@@ -581,10 +599,22 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
series = self._html_search_regex(
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
webpage, 'series', fatal=False)
season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title') or media_metadata.get('title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number') or media_metadata.get('episode_number'))
season = episode = episode_number = duration = thumbnail = None
if isinstance(metadata, etree.Element):
season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
duration = float_or_none(media_metadata.get('duration'), 1000)
thumbnail = xpath_text(metadata, 'episode_image_url')
if not episode:
episode = media_metadata.get('title')
if not episode_number:
episode_number = int_or_none(media_metadata.get('episode_number'))
if not thumbnail:
thumbnail = media_metadata.get('thumbnail', {}).get('url')
season_number = int_or_none(self._search_regex(
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
@@ -594,8 +624,8 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'id': video_id,
'title': video_title,
'description': video_description,
'duration': float_or_none(media_metadata.get('duration'), 1000),
'thumbnail': xpath_text(metadata, 'episode_image_url') or media_metadata.get('thumbnail', {}).get('url'),
'duration': duration,
'thumbnail': thumbnail,
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series,

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
@@ -66,9 +67,12 @@ class CWTVIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
data = self._download_json(
'http://images.cwtv.com/feed/mobileapp/video-meta/apiversion_8/guid_' + video_id,
video_id)['video']
video_id)
if data.get('result') != 'ok':
raise ExtractorError(data['msg'], expected=True)
video_data = data['video']
title = video_data['title']
mpx_url = video_data.get('mpx_url') or 'http://link.theplatform.com/s/cwtv/media/guid/2703454149/%s?formats=M3U' % video_id

View File

@@ -49,6 +49,9 @@ class DailyMailIE(InfoExtractor):
'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id)
video_sources = self._download_json(sources_url, video_id)
body = video_sources.get('body')
if body:
video_sources = body
formats = []
for rendition in video_sources['renditions']:

View File

@@ -22,7 +22,10 @@ from ..utils import (
parse_iso8601,
sanitized_Request,
str_to_int,
try_get,
unescapeHTML,
update_url_query,
url_or_none,
urlencode_postdata,
)
@@ -171,10 +174,25 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)
metadata = player['metadata']
player = self._parse_json(player_v5, video_id, fatal=False) or {}
metadata = try_get(player, lambda x: x['metadata'], dict)
if not metadata:
metadata_url = url_or_none(try_get(
player, lambda x: x['context']['metadata_template_url1']))
if metadata_url:
metadata_url = metadata_url.replace(':videoId', video_id)
else:
metadata_url = update_url_query(
'https://www.dailymotion.com/player/metadata/video/%s'
% video_id, {
'embedder': url,
'integration': 'inline',
'GK_PV5_NEON': '1',
})
metadata = self._download_json(
metadata_url, video_id, 'Downloading metadata JSON')
if metadata.get('error', {}).get('type') == 'password_protected':
if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)

View File

@@ -9,6 +9,7 @@ from ..utils import (
encode_base_n,
ExtractorError,
int_or_none,
merge_dicts,
parse_duration,
str_to_int,
url_or_none,
@@ -25,10 +26,16 @@ class EpornerIE(InfoExtractor):
'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video',
'ext': 'mp4',
'title': 'Infamous Tiffany Teen Strip Tease Video',
'description': 'md5:764f39abf932daafa37485eb46efa152',
'timestamp': 1232520922,
'upload_date': '20090121',
'duration': 1838,
'view_count': int,
'age_limit': 18,
},
'params': {
'proxy': '127.0.0.1:8118'
}
}, {
# New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
@@ -104,12 +111,15 @@ class EpornerIE(InfoExtractor):
})
self._sort_formats(formats)
duration = parse_duration(self._html_search_meta('duration', webpage))
json_ld = self._search_json_ld(webpage, display_id, default={})
duration = parse_duration(self._html_search_meta(
'duration', webpage, default=None))
view_count = str_to_int(self._search_regex(
r'id="cinemaviews">\s*([0-9,]+)\s*<small>views',
webpage, 'view count', fatal=False))
return {
return merge_dicts(json_ld, {
'id': video_id,
'display_id': display_id,
'title': title,
@@ -117,4 +127,4 @@ class EpornerIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
'age_limit': 18,
}
})

View File

@@ -88,11 +88,7 @@ from .awaan import (
AWAANLiveIE,
AWAANSeasonIE,
)
from .azmedien import (
AZMedienIE,
AZMedienPlaylistIE,
AZMedienShowPlaylistIE,
)
from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
@@ -198,6 +194,10 @@ from .chirbit import (
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .ciscolive import (
CiscoLiveSessionIE,
CiscoLiveSearchIE,
)
from .cjsw import CJSWIE
from .cliphunter import CliphunterIE
from .clippit import ClippitIE
@@ -209,7 +209,10 @@ from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
from .cmt import CMTIE
from .cnbc import CNBCIE
from .cnbc import (
CNBCIE,
CNBCVideoIE,
)
from .cnn import (
CNNIE,
CNNBlogsIE,
@@ -480,6 +483,7 @@ from .imdb import (
from .imgur import (
ImgurIE,
ImgurAlbumIE,
ImgurGalleryIE,
)
from .ina import InaIE
from .inc import IncIE
@@ -540,6 +544,7 @@ from .la7 import LA7IE
from .laola1tv import (
Laola1TvEmbedIE,
Laola1TvIE,
EHFTVIE,
ITTFIE,
)
from .lci import LCIIE
@@ -549,6 +554,10 @@ from .lcp import (
)
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lecturio import (
LecturioIE,
LecturioCourseIE,
)
from .leeco import (
LeIE,
LePlaylistIE,
@@ -569,6 +578,10 @@ from .limelight import (
LimelightChannelListIE,
)
from .line import LineTVIE
from .linkedin import (
LinkedInLearningIE,
LinkedInLearningCourseIE,
)
from .litv import LiTVIE
from .liveleak import (
LiveLeakIE,
@@ -1043,7 +1056,7 @@ from .spike import (
)
from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE
from .sportbox import SportBoxIE
from .sportdeutschland import SportDeutschlandIE
from .springboardplatform import SpringboardPlatformIE
from .sprout import SproutIE
@@ -1078,6 +1091,10 @@ from .tass import TassIE
from .tastytrade import TastyTradeIE
from .tbs import TBSIE
from .tdslifeway import TDSLifewayIE
from .teachable import (
TeachableIE,
TeachableCourseIE,
)
from .teachertube import (
TeacherTubeIE,
TeacherTubeUserIE,
@@ -1116,6 +1133,10 @@ from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .threeqsdn import ThreeQSDNIE
from .tiktok import (
TikTokIE,
TikTokUserIE,
)
from .tinypic import TinyPicIE
from .tmz import (
TMZIE,
@@ -1153,7 +1174,6 @@ from .tv2 import (
TV2ArticleIE,
)
from .tv2hu import TV2HuIE
from .tv3 import TV3IE
from .tv4 import TV4IE
from .tv5mondeplus import TV5MondePlusIE
from .tva import TVAIE
@@ -1190,6 +1210,7 @@ from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE
from .twitcasting import TwitCastingIE
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
@@ -1223,10 +1244,6 @@ from .uplynk import (
UplynkIE,
UplynkPreplayIE,
)
from .upskill import (
UpskillIE,
UpskillCourseIE,
)
from .urort import UrortIE
from .urplay import URPlayIE
from .usanetwork import USANetworkIE
@@ -1295,6 +1312,7 @@ from .vimeo import (
VimeoReviewIE,
VimeoUserIE,
VimeoWatchLaterIE,
VHXEmbedIE,
)
from .vimple import VimpleIE
from .vine import (
@@ -1382,6 +1400,7 @@ from .wsj import (
WSJIE,
WSJArticleIE,
)
from .wwe import WWEIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
from .xfileshare import XFileShareIE
@@ -1455,10 +1474,23 @@ from .youtube import (
from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zattoo import (
BBVTVIE,
EinsUndEinsTVIE,
EWETVIE,
GlattvisionTVIE,
MNetTVIE,
MyVisionTVIE,
NetPlusIE,
OsnatelTVIE,
QuantumTVIE,
QuicklineIE,
QuicklineLiveIE,
SAKTVIE,
VTXTVIE,
WalyTVIE,
ZattooIE,
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE
from .zype import ZypeIE

View File

@@ -57,7 +57,7 @@ class FacebookIE(InfoExtractor):
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary'
_TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',

View File

@@ -3,15 +3,45 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_b64decode,
compat_str,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
str_or_none,
str_to_int,
try_get,
unified_timestamp,
url_or_none,
)
class FourTubeBaseIE(InfoExtractor):
_TKN_HOST = 'tkn.kodicdn.com'
def _extract_formats(self, url, video_id, media_id, sources):
token_url = 'https://%s/%s/desktop/%s' % (
self._TKN_HOST, media_id, '+'.join(sources))
parsed_url = compat_urlparse.urlparse(url)
tokens = self._download_json(token_url, video_id, data=b'', headers={
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
'Referer': url,
})
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
return formats
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
kind, video_id, display_id = mobj.group('kind', 'id', 'display_id')
@@ -68,21 +98,7 @@ class FourTubeBaseIE(InfoExtractor):
media_id = params[0]
sources = ['%s' % p for p in params[2]]
token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format(
media_id, '+'.join(sources))
parsed_url = compat_urlparse.urlparse(url)
tokens = self._download_json(token_url, video_id, data=b'', headers={
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
'Referer': url,
})
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
formats = self._extract_formats(url, video_id, media_id, sources)
return {
'id': video_id,
@@ -164,6 +180,7 @@ class FuxIE(FourTubeBaseIE):
class PornTubeIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s'
_TKN_HOST = 'tkn.porntube.com'
_TESTS = [{
'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759',
'info_dict': {
@@ -171,13 +188,32 @@ class PornTubeIE(FourTubeBaseIE):
'ext': 'mp4',
'title': 'Teen couple doing anal',
'uploader': 'Alexy',
'uploader_id': 'Alexy',
'uploader_id': '91488',
'upload_date': '20150606',
'timestamp': 1433595647,
'duration': 5052,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.porntube.com/videos/squirting-teen-ballerina-ecg_1331406',
'info_dict': {
'id': '1331406',
'ext': 'mp4',
'title': 'Squirting Teen Ballerina on ECG',
'uploader': 'Exploited College Girls',
'uploader_id': '665',
'channel': 'Exploited College Girls',
'channel_id': '665',
'upload_date': '20130920',
'timestamp': 1379685485,
'duration': 851,
'view_count': int,
'like_count': int,
'age_limit': 18,
},
'params': {
@@ -191,6 +227,55 @@ class PornTubeIE(FourTubeBaseIE):
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
video = self._parse_json(
self._search_regex(
r'INITIALSTATE\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'data', group='value'), video_id,
transform_source=lambda x: compat_urllib_parse_unquote(
compat_b64decode(x).decode('utf-8')))['page']['video']
title = video['title']
media_id = video['mediaId']
sources = [compat_str(e['height'])
for e in video['encodings'] if e.get('height')]
formats = self._extract_formats(url, video_id, media_id, sources)
thumbnail = url_or_none(video.get('masterThumb'))
uploader = try_get(video, lambda x: x['user']['username'], compat_str)
uploader_id = str_or_none(try_get(
video, lambda x: x['user']['id'], int))
channel = try_get(video, lambda x: x['channel']['name'], compat_str)
channel_id = str_or_none(try_get(
video, lambda x: x['channel']['id'], int))
like_count = int_or_none(video.get('likes'))
dislike_count = int_or_none(video.get('dislikes'))
view_count = int_or_none(video.get('playsQty'))
duration = int_or_none(video.get('durationInSeconds'))
timestamp = unified_timestamp(video.get('publishedAt'))
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'uploader': uploader or channel,
'uploader_id': uploader_id or channel_id,
'channel': channel,
'channel_id': channel_id,
'timestamp': timestamp,
'like_count': like_count,
'dislike_count': dislike_count,
'view_count': view_count,
'duration': duration,
'age_limit': 18,
}
class PornerBrosIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'

View File

@@ -1,43 +1,33 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
smuggle_url,
update_url_query,
)
class FoxSportsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxsports\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?foxsports\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = {
'url': 'http://www.foxsports.com/tennessee/video/432609859715',
'md5': 'b49050e955bebe32c301972e4012ac17',
'info_dict': {
'id': 'bwduI3X_TgUB',
'id': '432609859715',
'ext': 'mp4',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.',
'upload_date': '20150423',
'timestamp': 1429761109,
# TODO: fix timestamp
'upload_date': '19700101', # '20150423',
# 'timestamp': 1429761109,
'uploader': 'NEWA-FNG-FOXSPORTS',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
config = self._parse_json(
self._html_search_regex(
r"""class="[^"]*(?:fs-player|platformPlayer-wrapper)[^"]*".+?data-player-config='([^']+)'""",
webpage, 'data player config'),
video_id)
return self.url_result(smuggle_url(update_url_query(
config['releaseURL'], {
'mbr': 'true',
'switch': 'http',
}), {'force_smil_url': True}))
return self.url_result(
'https://feed.theplatform.com/f/BKQ29B/foxsports-all?byId=' + video_id, 'ThePlatformFeed')

View File

@@ -14,7 +14,7 @@ from ..utils import (
class GameSpotIE(OnceIE):
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/(?:video|article)s/(?:[^/]+/\d+-|embed/)(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/(?:video|article|review)s/(?:[^/]+/\d+-|embed/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
@@ -41,6 +41,9 @@ class GameSpotIE(OnceIE):
}, {
'url': 'https://www.gamespot.com/articles/the-last-of-us-2-receives-new-ps4-trailer/1100-6454469/',
'only_matching': True,
}, {
'url': 'https://www.gamespot.com/reviews/gears-of-war-review/1900-6161188/',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -47,7 +47,7 @@ from .nbc import NBCSportsVPlayerIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
from .tvc import TVCIE
from .sportbox import SportBoxEmbedIE
from .sportbox import SportBoxIE
from .smotri import SmotriIE
from .myvi import MyviIE
from .condenast import CondeNastIE
@@ -109,11 +109,13 @@ from .vice import ViceIE
from .xfileshare import XFileShareIE
from .cloudflarestream import CloudflareStreamIE
from .peertube import PeerTubeIE
from .teachable import TeachableIE
from .indavideo import IndavideoEmbedIE
from .apa import APAIE
from .foxnews import FoxNewsIE
from .viqeo import ViqeoIE
from .expressen import ExpressenIE
from .zype import ZypeIE
class GenericIE(InfoExtractor):
@@ -2070,6 +2072,20 @@ class GenericIE(InfoExtractor):
},
'playlist_count': 6,
},
{
# Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
'info_dict': {
'id': '5b400b834b32992a310622b9',
'ext': 'mp4',
'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g',
},
'add_ie': [ZypeIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
# videojs embed
'url': 'https://video.sibnet.ru/shell.php?videoid=3422904',
@@ -2636,9 +2652,9 @@ class GenericIE(InfoExtractor):
return self.url_result(tvc_url, 'TVC')
# Look for embedded SportBox player
sportbox_urls = SportBoxEmbedIE._extract_urls(webpage)
sportbox_urls = SportBoxIE._extract_urls(webpage)
if sportbox_urls:
return self.playlist_from_matches(sportbox_urls, video_id, video_title, ie='SportBoxEmbed')
return self.playlist_from_matches(sportbox_urls, video_id, video_title, ie=SportBoxIE.ie_key())
# Look for embedded XHamster player
xhamster_urls = XHamsterEmbedIE._extract_urls(webpage)
@@ -3023,7 +3039,7 @@ class GenericIE(InfoExtractor):
wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key())
# Look for Mediaset embeds
mediaset_urls = MediasetIE._extract_urls(webpage)
mediaset_urls = MediasetIE._extract_urls(self, webpage)
if mediaset_urls:
return self.playlist_from_matches(
mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key())
@@ -3097,6 +3113,10 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls:
return self.playlist_from_matches(
@@ -3129,6 +3149,11 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
expressen_urls, video_id, video_title, ie=ExpressenIE.ie_key())
zype_urls = ZypeIE._extract_urls(webpage)
if zype_urls:
return self.playlist_from_matches(
zype_urls, video_id, video_title, ie=ZypeIE.ie_key())
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:

View File

@@ -53,7 +53,7 @@ class GfycatIE(InfoExtractor):
video_id = self._match_id(url)
gfy = self._download_json(
'http://gfycat.com/cajax/get/%s' % video_id,
'https://api.gfycat.com/v1/gfycats/%s' % video_id,
video_id, 'Downloading video info')
if 'error' in gfy:
raise ExtractorError('Gfycat said: ' + gfy['error'], expected=True)

View File

@@ -1,49 +1,56 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import hashlib
import hmac
import time
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
try_get,
)
class HotStarBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['IN']
_AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
def _download_json(self, *args, **kwargs):
response = super(HotStarBaseIE, self)._download_json(*args, **kwargs)
if response['resultCode'] != 'OK':
if kwargs.get('fatal'):
raise ExtractorError(
response['errorDescription'], expected=True)
return None
return response['resultObj']
def _download_content_info(self, content_id):
return self._download_json(
'https://account.hotstar.com/AVS/besc', content_id, query={
'action': 'GetAggregatedContentDetails',
'appVersion': '5.0.40',
'channel': 'PCTV',
'contentId': content_id,
})['contentInfo'][0]
def _call_api(self, path, video_id, query_name='contentId'):
st = int(time.time())
exp = st + 6000
auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
response = self._download_json(
'https://api.hotstar.com/' + path,
video_id, headers={
'hotstarauth': auth,
'x-country-code': 'IN',
'x-platform-code': 'JIO',
}, query={
query_name: video_id,
'tas': 10000,
})
if response['statusCode'] != 'OK':
raise ExtractorError(
response['body']['message'], expected=True)
return response['body']['results']
class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{
'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
# contentData
'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273',
'info_dict': {
'id': '1000076273',
'ext': 'mp4',
'title': 'On Air With AIB',
'title': 'Can You Not Spread Rumours?',
'description': 'md5:c957d8868e9bc793ccb813691cc4c434',
'timestamp': 1447227000,
'timestamp': 1447248600,
'upload_date': '20151111',
'duration': 381,
},
@@ -51,6 +58,10 @@ class HotStarIE(HotStarBaseIE):
# m3u8 download
'skip_download': True,
}
}, {
# contentDetail
'url': 'https://www.hotstar.com/movies/radha-gopalam/1000057157',
'only_matching': True,
}, {
'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
'only_matching': True,
@@ -58,47 +69,52 @@ class HotStarIE(HotStarBaseIE):
'url': 'http://www.hotstar.com/1000000515',
'only_matching': True,
}]
_GEO_BYPASS = False
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_content_info(video_id)
webpage = self._download_webpage(url, video_id)
app_state = self._parse_json(self._search_regex(
r'<script>window\.APP_STATE\s*=\s*({.+?})</script>',
webpage, 'app state'), video_id)
video_data = {}
getters = list(
lambda x, k=k: x['initialState']['content%s' % k]['content']
for k in ('Data', 'Detail')
)
for v in app_state.values():
content = try_get(v, getters, dict)
if content and content.get('contentId') == video_id:
video_data = content
break
title = video_data['episodeTitle']
title = video_data['title']
if video_data.get('encrypted') == 'Y':
if video_data.get('drmProtected'):
raise ExtractorError('This video is DRM protected.', expected=True)
formats = []
for f in ('JIO',):
format_data = self._download_json(
'http://getcdn.hotstar.com/AVS/besc',
video_id, 'Downloading %s JSON metadata' % f,
fatal=False, query={
'action': 'GetCDN',
'asJson': 'Y',
'channel': f,
'id': video_id,
'type': 'VOD',
})
if format_data:
format_url = format_data.get('src')
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
# produce broken files
continue
else:
formats.append({
'url': format_url,
'width': int_or_none(format_data.get('width')),
'height': int_or_none(format_data.get('height')),
})
format_data = self._call_api('h/v1/play', video_id)['item']
format_url = format_data['playbackUrl']
ext = determine_ext(format_url)
if ext == 'm3u8':
try:
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls'))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
self.raise_geo_restricted(countries=['IN'])
raise
elif ext == 'f4m':
# produce broken files
pass
else:
formats.append({
'url': format_url,
'width': int_or_none(format_data.get('width')),
'height': int_or_none(format_data.get('height')),
})
self._sort_formats(formats)
return {
@@ -106,57 +122,43 @@ class HotStarIE(HotStarBaseIE):
'title': title,
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate')),
'timestamp': int_or_none(video_data.get('broadcastDate') or video_data.get('startDate')),
'formats': formats,
'channel': video_data.get('channelName'),
'channel_id': video_data.get('channelId'),
'series': video_data.get('showName'),
'season': video_data.get('seasonName'),
'season_number': int_or_none(video_data.get('seasonNo')),
'season_id': video_data.get('seasonId'),
'episode': title,
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('contentTitle'),
'episode_number': int_or_none(video_data.get('episodeNo')),
}
class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com/tv/[^/]+/(?P<content_id>\d+))/(?P<type>[^/]+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/tv/[^/]+/s-\w+/list/[^/]+/t-(?P<id>\w+)'
_TESTS = [{
'url': 'http://www.hotstar.com/tv/pratidaan/14982/episodes/14812/9993',
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'info_dict': {
'id': '14812',
'id': '3_2_26',
},
'playlist_mincount': 75,
'playlist_mincount': 20,
}, {
'url': 'http://www.hotstar.com/tv/pratidaan/14982/popular-clips/9998/9998',
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480',
'only_matching': True,
}]
_ITEM_TYPES = {
'episodes': 'EPISODE',
'popular-clips': 'CLIPS',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
base_url = mobj.group('url')
content_id = mobj.group('content_id')
playlist_type = mobj.group('type')
playlist_id = self._match_id(url)
content_info = self._download_content_info(content_id)
playlist_id = compat_str(content_info['categoryId'])
collection = self._download_json(
'https://search.hotstar.com/AVS/besc', playlist_id, query={
'action': 'SearchContents',
'appVersion': '5.0.40',
'channel': 'PCTV',
'moreFilters': 'series:%s;' % playlist_id,
'query': '*',
'searchOrder': 'last_broadcast_date desc,year desc,title asc',
'type': self._ITEM_TYPES.get(playlist_type, 'EPISODE'),
})
collection = self._call_api('o/v1/tray/find', playlist_id, 'uqId')
entries = [
self.url_result(
'%s/_/%s' % (base_url, video['contentId']),
'https://www.hotstar.com/%s' % video['contentId'],
ie=HotStarIE.ie_key(), video_id=video['contentId'])
for video in collection['response']['docs']
for video in collection['assets']['items']
if video.get('contentId')]
return self.playlist_result(entries, playlist_id)

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z0-9]+)?$'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?!(?:a|gallery|(?:t(?:opic)?|r)/[^/]+)/)(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv',
@@ -20,28 +20,9 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The magic of the Internet',
},
}, {
'url': 'https://imgur.com/A61SaA1',
'info_dict': {
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The magic of the Internet',
},
}, {
'url': 'https://imgur.com/gallery/YcAQlkx',
'info_dict': {
'id': 'YcAQlkx',
'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
}
}, {
'url': 'http://imgur.com/topic/Funny/N8rOudd',
'only_matching': True,
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
}, {
'url': 'https://i.imgur.com/crGpqCV.mp4',
@@ -50,8 +31,8 @@ class ImgurIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
gifv_url = 'https://i.imgur.com/{id}.gifv'.format(id=video_id)
webpage = self._download_webpage(gifv_url, video_id)
webpage = self._download_webpage(
'https://i.imgur.com/{id}.gifv'.format(id=video_id), video_id)
width = int_or_none(self._og_search_property(
'video:width', webpage, default=None))
@@ -72,7 +53,6 @@ class ImgurIE(InfoExtractor):
'format_id': m.group('type').partition('/')[2],
'url': self._proto_relative_url(m.group('src')),
'ext': mimetype2ext(m.group('type')),
'acodec': 'none',
'width': width,
'height': height,
'http_headers': {
@@ -107,44 +87,64 @@ class ImgurIE(InfoExtractor):
return {
'id': video_id,
'formats': formats,
'description': self._og_search_description(webpage, default=None),
'title': self._og_search_title(webpage),
}
class ImgurAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:a|gallery|topic/[^/]+)/)?(?P<id>[a-zA-Z0-9]{5})(?:[/?#&]+)?$'
class ImgurGalleryIE(InfoExtractor):
IE_NAME = 'imgur:gallery'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:gallery|(?:t(?:opic)?|r)/[^/]+)/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://imgur.com/gallery/Q95ko',
'info_dict': {
'id': 'Q95ko',
'title': 'Adding faces make every GIF better',
},
'playlist_count': 25,
}, {
'url': 'http://imgur.com/a/j6Orj',
'url': 'http://imgur.com/topic/Aww/ll5Vk',
'only_matching': True,
}, {
'url': 'http://imgur.com/topic/Aww/ll5Vk',
'url': 'https://imgur.com/gallery/YcAQlkx',
'info_dict': {
'id': 'YcAQlkx',
'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
}
}, {
'url': 'http://imgur.com/topic/Funny/N8rOudd',
'only_matching': True,
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
}]
def _real_extract(self, url):
album_id = self._match_id(url)
gallery_id = self._match_id(url)
album_images = self._download_json(
'http://imgur.com/gallery/%s/album_images/hit.json?all=true' % album_id,
album_id, fatal=False)
data = self._download_json(
'https://imgur.com/gallery/%s.json' % gallery_id,
gallery_id)['data']['image']
if album_images:
data = album_images.get('data')
if data and isinstance(data, dict):
images = data.get('images')
if images and isinstance(images, list):
entries = [
self.url_result('http://imgur.com/%s' % image['hash'])
for image in images if image.get('hash')]
return self.playlist_result(entries, album_id)
if data.get('is_album'):
entries = [
self.url_result('http://imgur.com/%s' % image['hash'], ImgurIE.ie_key(), image['hash'])
for image in data['album_images']['images'] if image.get('hash')]
return self.playlist_result(entries, gallery_id, data.get('title'), data.get('description'))
# Fallback to single video
return self.url_result('http://imgur.com/%s' % album_id, ImgurIE.ie_key())
return self.url_result('http://imgur.com/%s' % gallery_id, ImgurIE.ie_key(), gallery_id)
class ImgurAlbumIE(ImgurGalleryIE):
IE_NAME = 'imgur:album'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/a/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://imgur.com/a/j6Orj',
'info_dict': {
'id': 'j6Orj',
'title': 'A Literary Analysis of "Star Wars: The Force Awakens"',
},
'playlist_count': 12,
}]

View File

@@ -12,7 +12,7 @@ from ..utils import (
class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://(?:play|prima)\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_VALID_URL = r'https?://(?:play|prima|www)\.iprima\.cz/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_GEO_BYPASS = False
_TESTS = [{
@@ -41,6 +41,9 @@ class IPrimaIE(InfoExtractor):
# iframe prima.iprima.cz
'url': 'https://prima.iprima.cz/porady/jak-se-stavi-sen/rodina-rathousova-praha',
'only_matching': True,
}, {
'url': 'http://www.iprima.cz/filmy/desne-rande',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -15,7 +15,7 @@ from ..utils import (
class IviIE(InfoExtractor):
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?ivi\.(?:ru|tv)/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
@@ -65,7 +65,11 @@ class IviIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
}
},
{
'url': 'https://www.ivi.tv/watch/33560/',
'only_matching': True,
},
]
# Sorted by quality

View File

@@ -26,8 +26,15 @@ class JamendoBaseIE(InfoExtractor):
class JamendoIE(JamendoBaseIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'''(?x)
https?://
(?:
licensing\.jamendo\.com/[^/]+|
(?:www\.)?jamendo\.com
)
/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)
'''
_TESTS = [{
'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
'md5': '6e9e82ed6db98678f171c25a8ed09ffd',
'info_dict': {
@@ -40,14 +47,19 @@ class JamendoIE(JamendoBaseIE):
'duration': 210,
'thumbnail': r're:^https?://.*\.jpg'
}
}
}, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url)
track_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(
'https://www.jamendo.com/track/%s/%s' % (track_id, display_id),
display_id)
title, artist, track = self._extract_meta(webpage)

View File

@@ -61,7 +61,7 @@ class JojIE(InfoExtractor):
bitrates = self._parse_json(
self._search_regex(
r'(?s)bitrates\s*=\s*({.+?});', webpage, 'bitrates',
r'(?s)(?:src|bitrates)\s*=\s*({.+?});', webpage, 'bitrates',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)

View File

@@ -192,6 +192,8 @@ class KalturaIE(InfoExtractor):
'entryId': video_id,
'service': 'baseentry',
'ks': '{1:result:ks}',
'responseProfile:fields': 'createdAt,dataUrl,duration,name,plays,thumbnailUrl,userId',
'responseProfile:type': 1,
},
{
'action': 'getbyentryid',

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
@@ -32,7 +33,8 @@ class Laola1TvEmbedIE(InfoExtractor):
def _extract_token_url(self, stream_access_url, video_id, data):
return self._download_json(
stream_access_url, video_id, headers={
self._proto_relative_url(stream_access_url, 'https:'), video_id,
headers={
'Content-Type': 'application/json',
}, data=json.dumps(data).encode())['data']['stream-access'][0]
@@ -119,9 +121,59 @@ class Laola1TvEmbedIE(InfoExtractor):
}
class Laola1TvIE(Laola1TvEmbedIE):
class Laola1TvBaseIE(Laola1TvEmbedIE):
def _extract_video(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
conf = self._parse_json(self._search_regex(
r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
display_id,
transform_source=lambda s: js_to_json(re.sub(r'shareurl:.+,', '', s)))
video_id = conf['videoid']
config = self._download_json(conf['configUrl'], video_id, query={
'videoid': video_id,
'partnerid': conf['partnerid'],
'language': conf.get('language', ''),
'portal': conf.get('portalid', ''),
})
error = config.get('error')
if error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
video_data = config['video']
title = video_data['title']
is_live = video_data.get('isLivestream') and video_data.get('isLive')
meta = video_data.get('metaInformation')
sports = meta.get('sports')
categories = sports.split(',') if sports else []
token_url = self._extract_token_url(
video_data['streamAccess'], video_id,
video_data['abo']['required'])
formats = self._extract_formats(token_url, video_id)
return {
'id': video_id,
'display_id': display_id,
'title': self._live_title(title) if is_live else title,
'description': video_data.get('description'),
'thumbnail': video_data.get('image'),
'categories': categories,
'formats': formats,
'is_live': is_live,
}
class Laola1TvIE(Laola1TvBaseIE):
IE_NAME = 'laola1tv'
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/[a-z]+-[a-z]+/[^/]+/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
'info_dict': {
@@ -169,52 +221,30 @@ class Laola1TvIE(Laola1TvEmbedIE):
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._extract_video(url)
webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
class EHFTVIE(Laola1TvBaseIE):
IE_NAME = 'ehftv'
_VALID_URL = r'https?://(?:www\.)?ehftv\.com/[a-z]+(?:-[a-z]+)?/[^/]+/(?P<id>[^/?#&]+)'
conf = self._parse_json(self._search_regex(
r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
display_id, js_to_json)
_TESTS = [{
'url': 'https://www.ehftv.com/int/video/paris-saint-germain-handball-pge-vive-kielce/1166761',
'info_dict': {
'id': '1166761',
'display_id': 'paris-saint-germain-handball-pge-vive-kielce',
'ext': 'mp4',
'title': 'Paris Saint-Germain Handball - PGE Vive Kielce',
'is_live': False,
'categories': ['Handball'],
},
'params': {
'skip_download': True,
},
}]
video_id = conf['videoid']
config = self._download_json(conf['configUrl'], video_id, query={
'videoid': video_id,
'partnerid': conf['partnerid'],
'language': conf.get('language', ''),
'portal': conf.get('portalid', ''),
})
error = config.get('error')
if error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
video_data = config['video']
title = video_data['title']
is_live = video_data.get('isLivestream') and video_data.get('isLive')
meta = video_data.get('metaInformation')
sports = meta.get('sports')
categories = sports.split(',') if sports else []
token_url = self._extract_token_url(
video_data['streamAccess'], video_id,
video_data['abo']['required'])
formats = self._extract_formats(token_url, video_id)
return {
'id': video_id,
'display_id': display_id,
'title': self._live_title(title) if is_live else title,
'description': video_data.get('description'),
'thumbnail': video_data.get('image'),
'categories': categories,
'formats': formats,
'is_live': is_live,
}
def _real_extract(self, url):
return self._extract_video(url)
class ITTFIE(InfoExtractor):

View File

@@ -0,0 +1,186 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
extract_attributes,
ExtractorError,
float_or_none,
int_or_none,
str_or_none,
url_or_none,
urlencode_postdata,
urljoin,
)
class LecturioBaseIE(InfoExtractor):
_LOGIN_URL = 'https://app.lecturio.com/en/login'
_NETRC_MACHINE = 'lecturio'
def _real_initialize(self):
self._login()
def _login(self):
username, password = self._get_login_info()
if username is None:
return
# Sets some cookies
_, urlh = self._download_webpage_handle(
self._LOGIN_URL, None, 'Downloading login popup')
def is_logged(url_handle):
return self._LOGIN_URL not in compat_str(url_handle.geturl())
# Already logged in
if is_logged(urlh):
return
login_form = {
'signin[email]': username,
'signin[password]': password,
'signin[remember]': 'on',
}
response, urlh = self._download_webpage_handle(
self._LOGIN_URL, None, 'Logging in',
data=urlencode_postdata(login_form))
# Logged in successfully
if is_logged(urlh):
return
errors = self._html_search_regex(
r'(?s)<ul[^>]+class=["\']error_list[^>]+>(.+?)</ul>', response,
'errors', default=None)
if errors:
raise ExtractorError('Unable to login: %s' % errors, expected=True)
raise ExtractorError('Unable to log in')
class LecturioIE(LecturioBaseIE):
_VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture'
_TEST = {
'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
'info_dict': {
'id': '39634',
'ext': 'mp4',
'title': 'Important Concepts and Terms Introduction to Microbiology',
},
'skip': 'Requires lecturio account credentials',
}
_CC_LANGS = {
'German': 'de',
'English': 'en',
'Spanish': 'es',
'French': 'fr',
'Polish': 'pl',
'Russian': 'ru',
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(
'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
display_id)
lecture_id = self._search_regex(
r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
api_url = self._search_regex(
r'lectureDataLink\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'api url', group='url')
video = self._download_json(api_url, display_id)
title = video['title'].strip()
formats = []
for format_ in video['content']['media']:
if not isinstance(format_, dict):
continue
file_ = format_.get('file')
if not file_:
continue
ext = determine_ext(file_)
if ext == 'smil':
# smil contains only broken RTMP formats anyway
continue
file_url = url_or_none(file_)
if not file_url:
continue
label = str_or_none(format_.get('label'))
filesize = int_or_none(format_.get('fileSize'))
formats.append({
'url': file_url,
'format_id': label,
'filesize': float_or_none(filesize, invscale=1000)
})
self._sort_formats(formats)
subtitles = {}
automatic_captions = {}
cc = self._parse_json(
self._search_regex(
r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles',
default='{}'), display_id, fatal=False)
for cc_label, cc_url in cc.items():
cc_url = url_or_none(cc_url)
if not cc_url:
continue
sub_dict = automatic_captions if 'auto-translated' in cc_label else subtitles
lang = self._search_regex(
r'/([a-z]{2})_', cc_url, 'lang', default=cc_label.split()[0])
sub_dict.setdefault(self._CC_LANGS.get(lang, lang), []).append({
'url': cc_url,
})
return {
'id': lecture_id,
'title': title,
'formats': formats,
'subtitles': subtitles,
'automatic_captions': automatic_captions,
}
class LecturioCourseIE(LecturioBaseIE):
_VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.course'
_TEST = {
'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
'info_dict': {
'id': 'microbiology-introduction',
'title': 'Microbiology: Introduction',
},
'playlist_count': 45,
'skip': 'Requires lecturio account credentials',
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
entries = []
for mobj in re.finditer(
r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>',
webpage):
params = extract_attributes(mobj.group(0))
lecture_url = urljoin(url, params.get('data-url'))
lecture_id = params.get('data-id')
entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
title = self._search_regex(
r'<span[^>]+class=["\']content-title[^>]+>([^<]+)', webpage,
'title', default=None)
return self.playlist_result(entries, display_id, title)

View File

@@ -16,16 +16,15 @@ from ..utils import (
class LibraryOfCongressIE(InfoExtractor):
IE_NAME = 'loc'
IE_DESC = 'Library of Congress'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9a-z_.]+)'
_TESTS = [{
# embedded via <div class="media-player"
'url': 'http://loc.gov/item/90716351/',
'md5': '353917ff7f0255aa6d4b80a034833de8',
'md5': '6ec0ae8f07f86731b1b2ff70f046210a',
'info_dict': {
'id': '90716351',
'ext': 'mp4',
'title': "Pa's trip to Mars",
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 0,
'view_count': int,
},
@@ -57,6 +56,12 @@ class LibraryOfCongressIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.loc.gov/item/ihas.200197114/',
'only_matching': True,
}, {
'url': 'https://www.loc.gov/item/afc1981005_afs20503/',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -67,12 +72,13 @@ class LibraryOfCongressIE(InfoExtractor):
(r'id=(["\'])media-player-(?P<id>.+?)\1',
r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'),
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1',
r'data-tab="share-media-(?P<id>[0-9A-F]{32})"'),
webpage, 'media id', group='id')
data = self._download_json(
'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
video_id)['mediaObject']
media_id)['mediaObject']
derivative = data['derivatives'][0]
media_url = derivative['derivativeUrl']
@@ -89,25 +95,29 @@ class LibraryOfCongressIE(InfoExtractor):
if ext not in ('mp4', 'mp3'):
media_url += '.mp4' if is_video else '.mp3'
if 'vod/mp4:' in media_url:
formats = [{
'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8',
formats = []
if '/vod/mp4:' in media_url:
formats.append({
'url': media_url.replace('/vod/mp4:', '/hls-vod/media/') + '.m3u8',
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
'quality': 1,
}]
elif 'vod/mp3:' in media_url:
formats = [{
'url': media_url.replace('vod/mp3:', ''),
'vcodec': 'none',
}]
})
http_format = {
'url': re.sub(r'(://[^/]+/)(?:[^/]+/)*(?:mp4|mp3):', r'\1', media_url),
'format_id': 'http',
'quality': 1,
}
if not is_video:
http_format['vcodec'] = 'none'
formats.append(http_format)
download_urls = set()
for m in re.finditer(
r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
format_id = m.group('id').lower()
if format_id == 'gif':
if format_id in ('gif', 'jpeg'):
continue
download_url = m.group('url')
if download_url in download_urls:

View File

@@ -0,0 +1,175 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
urlencode_postdata,
)
class LinkedInLearningBaseIE(InfoExtractor):
_NETRC_MACHINE = 'linkedin'
def _call_api(self, course_slug, fields, video_slug=None, resolution=None):
query = {
'courseSlug': course_slug,
'fields': fields,
'q': 'slugs',
}
sub = ''
if video_slug:
query.update({
'videoSlug': video_slug,
'resolution': '_%s' % resolution,
})
sub = ' %dp' % resolution
api_url = 'https://www.linkedin.com/learning-api/detailedCourses'
return self._download_json(
api_url, video_slug, 'Downloading%s JSON metadata' % sub, headers={
'Csrf-Token': self._get_cookies(api_url)['JSESSIONID'].value,
}, query=query)['elements'][0]
def _get_video_id(self, urn, course_slug, video_slug):
if urn:
mobj = re.search(r'urn:li:lyndaCourse:\d+,(\d+)', urn)
if mobj:
return mobj.group(1)
return '%s/%s' % (course_slug, video_slug)
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
login_page = self._download_webpage(
'https://www.linkedin.com/uas/login?trk=learning',
None, 'Downloading login page')
action_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page, 'post url',
default='https://www.linkedin.com/uas/login-submit', group='url')
data = self._hidden_inputs(login_page)
data.update({
'session_key': email,
'session_password': password,
})
login_submit_page = self._download_webpage(
action_url, None, 'Logging in',
data=urlencode_postdata(data))
error = self._search_regex(
r'<span[^>]+class="error"[^>]*>\s*(.+?)\s*</span>',
login_submit_page, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
class LinkedInLearningIE(LinkedInLearningBaseIE):
IE_NAME = 'linkedin:learning'
_VALID_URL = r'https?://(?:www\.)?linkedin\.com/learning/(?P<course_slug>[^/]+)/(?P<id>[^/?#]+)'
_TEST = {
'url': 'https://www.linkedin.com/learning/programming-foundations-fundamentals/welcome?autoplay=true',
'md5': 'a1d74422ff0d5e66a792deb996693167',
'info_dict': {
'id': '90426',
'ext': 'mp4',
'title': 'Welcome',
'timestamp': 1430396150.82,
'upload_date': '20150430',
},
}
def _real_extract(self, url):
course_slug, video_slug = re.match(self._VALID_URL, url).groups()
video_data = None
formats = []
for width, height in ((640, 360), (960, 540), (1280, 720)):
video_data = self._call_api(
course_slug, 'selectedVideo', video_slug, height)['selectedVideo']
video_url_data = video_data.get('url') or {}
progressive_url = video_url_data.get('progressiveUrl')
if progressive_url:
formats.append({
'format_id': 'progressive-%dp' % height,
'url': progressive_url,
'height': height,
'width': width,
'source_preference': 1,
})
title = video_data['title']
audio_url = video_data.get('audio', {}).get('progressiveUrl')
if audio_url:
formats.append({
'abr': 64,
'ext': 'm4a',
'format_id': 'audio',
'url': audio_url,
'vcodec': 'none',
})
streaming_url = video_url_data.get('streamingUrl')
if streaming_url:
formats.extend(self._extract_m3u8_formats(
streaming_url, video_slug, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('width', 'height', 'source_preference', 'tbr', 'abr'))
return {
'id': self._get_video_id(video_data.get('urn'), course_slug, video_slug),
'title': title,
'formats': formats,
'thumbnail': video_data.get('defaultThumbnail'),
'timestamp': float_or_none(video_data.get('publishedOn'), 1000),
'duration': int_or_none(video_data.get('durationInSeconds')),
}
class LinkedInLearningCourseIE(LinkedInLearningBaseIE):
IE_NAME = 'linkedin:learning:course'
_VALID_URL = r'https?://(?:www\.)?linkedin\.com/learning/(?P<id>[^/?#]+)'
_TEST = {
'url': 'https://www.linkedin.com/learning/programming-foundations-fundamentals',
'info_dict': {
'id': 'programming-foundations-fundamentals',
'title': 'Programming Foundations: Fundamentals',
'description': 'md5:76e580b017694eb89dc8e8923fff5c86',
},
'playlist_mincount': 61,
}
@classmethod
def suitable(cls, url):
return False if LinkedInLearningIE.suitable(url) else super(LinkedInLearningCourseIE, cls).suitable(url)
def _real_extract(self, url):
course_slug = self._match_id(url)
course_data = self._call_api(course_slug, 'chapters,description,title')
entries = []
for chapter in course_data.get('chapters', []):
chapter_title = chapter.get('title')
for video in chapter.get('videos', []):
video_slug = video.get('slug')
if not video_slug:
continue
entries.append({
'_type': 'url_transparent',
'id': self._get_video_id(video.get('urn'), course_slug, video_slug),
'title': video.get('title'),
'url': 'https://www.linkedin.com/learning/%s/%s' % (course_slug, video_slug),
'chapter': chapter_title,
'ie_key': LinkedInLearningIE.ie_key(),
})
return self.playlist_result(
entries, course_slug,
course_data.get('title'),
course_data.get('description'))

View File

@@ -15,7 +15,7 @@ from ..utils import (
class LyndaBaseIE(InfoExtractor):
_SIGNIN_URL = 'https://www.lynda.com/signin'
_SIGNIN_URL = 'https://www.lynda.com/signin/lynda'
_PASSWORD_URL = 'https://www.lynda.com/signin/password'
_USER_URL = 'https://www.lynda.com/signin/user'
_ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.'

View File

@@ -4,6 +4,11 @@ from __future__ import unicode_literals
import re
from .theplatform import ThePlatformBaseIE
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
@@ -76,12 +81,33 @@ class MediasetIE(ThePlatformBaseIE):
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>https?://(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml\?.*?\bid=\d+.*?)\1',
webpage)]
def _extract_urls(ie, webpage):
def _qs(url):
return compat_parse_qs(compat_urllib_parse_urlparse(url).query)
def _program_guid(qs):
return qs.get('programGuid', [None])[0]
entries = []
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml.*?)\1',
webpage):
embed_url = mobj.group('url')
embed_qs = _qs(embed_url)
program_guid = _program_guid(embed_qs)
if program_guid:
entries.append(embed_url)
continue
video_id = embed_qs.get('id', [None])[0]
if not video_id:
continue
urlh = ie._request_webpage(
embed_url, video_id, note='Following embed URL redirect')
embed_url = compat_str(urlh.geturl())
program_guid = _program_guid(_qs(embed_url))
if program_guid:
entries.append(embed_url)
return entries
def _real_extract(self, url):
guid = self._match_id(url)

View File

@@ -161,11 +161,17 @@ class MixcloudIE(InfoExtractor):
stream_info = info_json['streamInfo']
formats = []
def decrypt_url(f_url):
for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'):
decrypted_url = self._decrypt_xor_cipher(k, f_url)
if re.search(r'^https?://[0-9a-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
return decrypted_url
for url_key in ('url', 'hlsUrl', 'dashUrl'):
format_url = stream_info.get(url_key)
if not format_url:
continue
decrypted = self._decrypt_xor_cipher(key, compat_b64decode(format_url))
decrypted = decrypt_url(compat_b64decode(format_url))
if not decrypted:
continue
if url_key == 'hlsUrl':

View File

@@ -9,10 +9,8 @@ from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_unquote
from ..utils import (
find_xpath_attr,
smuggle_url,
try_get,
unescapeHTML,
update_url_query,
int_or_none,
)
@@ -269,27 +267,14 @@ class CSNNEIE(InfoExtractor):
class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/
(?:video/.+?/(?P<id>\d+)|
([^/]+/)*(?:.*-)?(?P<mpx_id>[^/?]+))
'''
_VALID_URL = r'(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/([^/]+/)*(?:.*-)?(?P<id>[^/?]+)'
_TESTS = [
{
'url': 'http://www.nbcnews.com/video/nbc-news/52753292',
'md5': '47abaac93c6eaf9ad37ee6c4463a5179',
'info_dict': {
'id': '52753292',
'ext': 'flv',
'title': 'Crew emerges after four-month Mars food study',
'description': 'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
},
},
{
'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
'md5': 'af1adfa51312291a017720403826bb64',
'info_dict': {
'id': 'p_tweet_snow_140529',
'id': '269389891880',
'ext': 'mp4',
'title': 'How Twitter Reacted To The Snowden Interview',
'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
@@ -313,7 +298,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
'md5': '73135a2e0ef819107bbb55a5a9b2a802',
'info_dict': {
'id': 'nn_netcast_150204',
'id': '394064451844',
'ext': 'mp4',
'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
@@ -326,7 +311,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
'md5': 'a49e173825e5fcd15c13fc297fced39d',
'info_dict': {
'id': 'x_lon_vwhorn_150922',
'id': '529953347624',
'ext': 'mp4',
'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
'description': 'md5:c8be487b2d80ff0594c005add88d8351',
@@ -339,7 +324,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
'md5': '118d7ca3f0bea6534f119c68ef539f71',
'info_dict': {
'id': 'tdy_al_space_160420',
'id': '669831235788',
'ext': 'mp4',
'title': 'See the aurora borealis from space in stunning new NASA video',
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
@@ -352,7 +337,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
'info_dict': {
'id': 'n_hayes_Aimm_140801_272214',
'id': '314487875924',
'ext': 'mp4',
'title': 'The chaotic GOP immigration vote',
'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
@@ -374,60 +359,22 @@ class NBCNewsIE(ThePlatformIE):
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if video_id is not None:
all_info = self._download_xml('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = all_info.find('video')
return {
'id': video_id,
'title': info.find('headline').text,
'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
'description': info.find('caption').text,
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
}
else:
# "feature" and "nightly-news" pages use theplatform.com
video_id = mobj.group('mpx_id')
video_id = self._match_id(url)
if not video_id.isdigit():
webpage = self._download_webpage(url, video_id)
filter_param = 'byId'
bootstrap_json = self._search_regex(
[r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"',
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);'],
webpage, 'bootstrap json', default=None)
if bootstrap_json:
bootstrap = self._parse_json(
bootstrap_json, video_id, transform_source=unescapeHTML)
data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.+});', webpage,
'bootstrap json'), video_id)
video_id = data['article']['content'][0]['primaryMedia']['video']['mpxMetadata']['id']
info = None
if 'results' in bootstrap:
info = bootstrap['results'][0]['video']
elif 'video' in bootstrap:
info = bootstrap['video']
elif 'msnbcVideoInfo' in bootstrap:
info = bootstrap['msnbcVideoInfo']['meta']
elif 'msnbcThePlatform' in bootstrap:
info = bootstrap['msnbcThePlatform']['videoPlayer']['video']
else:
info = bootstrap
if 'guid' in info:
video_id = info['guid']
filter_param = 'byGuid'
elif 'mpxId' in info:
video_id = info['mpxId']
return {
'_type': 'url_transparent',
'id': video_id,
# http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {filter_param: video_id}),
'ie_key': 'ThePlatformFeed',
}
return {
'_type': 'url_transparent',
'id': video_id,
# http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {'byId': video_id}),
'ie_key': 'ThePlatformFeed',
}
class NBCOlympicsIE(InfoExtractor):

View File

@@ -31,6 +31,8 @@ class NJPWWorldIE(InfoExtractor):
'skip': 'Requires login',
}
_LOGIN_URL = 'https://front.njpwworld.com/auth/login'
def _real_initialize(self):
self._login()
@@ -40,13 +42,17 @@ class NJPWWorldIE(InfoExtractor):
if not username:
return True
# Setup session (will set necessary cookies)
self._request_webpage(
'https://njpwworld.com/', None, note='Setting up session')
webpage, urlh = self._download_webpage_handle(
'https://njpwworld.com/auth/login', None,
self._LOGIN_URL, None,
note='Logging in', errnote='Unable to login',
data=urlencode_postdata({'login_id': username, 'pw': password}),
headers={'Referer': 'https://njpwworld.com/auth'})
headers={'Referer': 'https://front.njpwworld.com/auth'})
# /auth/login will return 302 for successful logins
if urlh.geturl() == 'https://njpwworld.com/auth/login':
if urlh.geturl() == self._LOGIN_URL:
self.report_warning('unable to login')
return False

View File

@@ -35,7 +35,7 @@ class NovaEmbedIE(InfoExtractor):
bitrates = self._parse_json(
self._search_regex(
r'(?s)bitrates\s*=\s*({.+?})\s*;', webpage, 'formats'),
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')

View File

@@ -211,13 +211,13 @@ class NRKIE(NRKBaseIE):
_TESTS = [{
# video
'url': 'http://www.nrk.no/video/PS*150533',
'md5': '2f7f6eeb2aacdd99885f355428715cfa',
'md5': '706f34cdf1322577589e369e522b50ef',
'info_dict': {
'id': '150533',
'ext': 'mp4',
'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
'duration': 263,
'duration': 262,
}
}, {
# audio
@@ -248,7 +248,7 @@ class NRKTVIE(NRKBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:tv|radio)\.nrk(?:super)?\.no/
(?:serie/[^/]+|program)/
(?:serie(?:/[^/]+){1,2}|program)/
(?![Ee]pisodes)%s
(?:/\d{2}-\d{2}-\d{4})?
(?:\#del=(?P<part_id>\d+))?
@@ -256,14 +256,14 @@ class NRKTVIE(NRKBaseIE):
_API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
_TESTS = [{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '4e9ca6629f09e588ed240fb11619922a',
'md5': '9a167e54d04671eb6317a37b7bc8a280',
'info_dict': {
'id': 'MUHH48000314AA',
'ext': 'mp4',
'title': '20 spørsmål 23.05.2014',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741,
'series': '20 spørsmål - TV',
'series': '20 spørsmål',
'episode': '23.05.2014',
},
}, {
@@ -301,7 +301,7 @@ class NRKTVIE(NRKBaseIE):
'id': 'MSPO40010515AH',
'ext': 'mp4',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 1)',
'description': 'md5:c03aba1e917561eface5214020551b7a',
'description': 'md5:1f97a41f05a9486ee00c56f35f82993d',
'duration': 772,
'series': 'Tour de Ski',
'episode': '06.01.2015',
@@ -314,7 +314,7 @@ class NRKTVIE(NRKBaseIE):
'id': 'MSPO40010515BH',
'ext': 'mp4',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 2)',
'description': 'md5:c03aba1e917561eface5214020551b7a',
'description': 'md5:1f97a41f05a9486ee00c56f35f82993d',
'duration': 6175,
'series': 'Tour de Ski',
'episode': '06.01.2015',
@@ -326,7 +326,7 @@ class NRKTVIE(NRKBaseIE):
'info_dict': {
'id': 'MSPO40010515',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015',
'description': 'md5:c03aba1e917561eface5214020551b7a',
'description': 'md5:1f97a41f05a9486ee00c56f35f82993d',
},
'expected_warnings': ['Video is geo restricted'],
}, {
@@ -362,6 +362,9 @@ class NRKTVIE(NRKBaseIE):
}, {
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True,
}, {
'url': 'https://tv.nrk.no/serie/lindmo/2018/MUHU11006318/avspiller',
'only_matching': True,
}]
@@ -403,21 +406,35 @@ class NRKTVSerieBaseIE(InfoExtractor):
def _extract_series(self, webpage, display_id, fatal=True):
config = self._parse_json(
self._search_regex(
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>', webpage, 'config',
default='{}' if not fatal else NO_DEFAULT),
(r'INITIAL_DATA_*\s*=\s*({.+?})\s*;',
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
display_id, fatal=False)
if not config:
return
return try_get(config, lambda x: x['series'], dict)
return try_get(
config,
(lambda x: x['initialState']['series'], lambda x: x['series']),
dict)
def _extract_seasons(self, seasons):
if not isinstance(seasons, list):
return []
entries = []
for season in seasons:
entries.extend(self._extract_episodes(season))
return entries
def _extract_episodes(self, season):
entries = []
if not isinstance(season, dict):
return entries
episodes = season.get('episodes')
if not isinstance(episodes, list):
return entries
for episode in episodes:
return []
return self._extract_entries(season.get('episodes'))
def _extract_entries(self, entry_list):
if not isinstance(entry_list, list):
return []
entries = []
for episode in entry_list:
nrk_id = episode.get('prfId')
if not nrk_id or not isinstance(nrk_id, compat_str):
continue
@@ -462,7 +479,7 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
_ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
_TESTS = [{
# new layout
# new layout, seasons
'url': 'https://tv.nrk.no/serie/backstage',
'info_dict': {
'id': 'backstage',
@@ -471,20 +488,21 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
},
'playlist_mincount': 60,
}, {
# old layout
# new layout, instalments
'url': 'https://tv.nrk.no/serie/groenn-glede',
'info_dict': {
'id': 'groenn-glede',
'title': 'Grønn glede',
'description': 'md5:7576e92ae7f65da6993cf90ee29e4608',
},
'playlist_mincount': 9,
'playlist_mincount': 10,
}, {
'url': 'http://tv.nrksuper.no/serie/labyrint',
# old layout
'url': 'https://tv.nrksuper.no/serie/labyrint',
'info_dict': {
'id': 'labyrint',
'title': 'Labyrint',
'description': 'md5:58afd450974c89e27d5a19212eee7115',
'description': 'md5:318b597330fdac5959247c9b69fdb1ec',
},
'playlist_mincount': 3,
}, {
@@ -517,11 +535,12 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
description = try_get(
series, lambda x: x['titles']['subtitle'], compat_str)
entries = []
for season in series['seasons']:
entries.extend(self._extract_episodes(season))
entries.extend(self._extract_seasons(series.get('seasons')))
entries.extend(self._extract_entries(series.get('instalments')))
entries.extend(self._extract_episodes(series.get('extraMaterial')))
return self.playlist_result(entries, series_id, title, description)
# Old layout (e.g. https://tv.nrk.no/serie/groenn-glede)
# Old layout (e.g. https://tv.nrksuper.no/serie/labyrint)
entries = [
self.url_result(
'https://tv.nrk.no/program/Episodes/{series}/{season}'.format(
@@ -533,6 +552,9 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
'seriestitle', webpage,
'title', default=None) or self._og_search_title(
webpage, fatal=False)
if title:
title = self._search_regex(
r'NRK (?:Super )?TV\s*[-]\s*(.+)', title, 'title', default=title)
description = self._html_search_meta(
'series_description', webpage,
@@ -593,7 +615,7 @@ class NRKPlaylistIE(NRKPlaylistBaseIE):
'title': 'Rivertonprisen til Karin Fossum',
'description': 'Første kvinne på 15 år til å vinne krimlitteraturprisen.',
},
'playlist_count': 5,
'playlist_count': 2,
}]
def _extract_title(self, webpage):

View File

@@ -11,20 +11,27 @@ from ..utils import (
class NZZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nzz\.ch/(?:[^/]+/)*[^/?#]+-ld\.(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.nzz.ch/zuerich/gymizyte/gymizyte-schreiben-schueler-heute-noch-diktate-ld.9153',
'info_dict': {
'id': '9153',
},
'playlist_mincount': 6,
}
}, {
'url': 'https://www.nzz.ch/video/nzz-standpunkte/cvp-auf-der-suche-nach-dem-mass-der-mitte-ld.1368112',
'info_dict': {
'id': '1368112',
},
'playlist_count': 1,
}]
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
entries = []
for player_element in re.findall(r'(<[^>]+class="kalturaPlayer"[^>]*>)', webpage):
for player_element in re.findall(
r'(<[^>]+class="kalturaPlayer[^"]*"[^>]*>)', webpage):
player_params = extract_attributes(player_element)
if player_params.get('data-type') not in ('kaltura_singleArticle',):
self.report_warning('Unsupported player type')

View File

@@ -243,7 +243,18 @@ class PhantomJSwrapper(object):
class OpenloadIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream|site|xyz|win|download))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_VALID_URL = r'''(?x)
https?://
(?P<host>
(?:www\.)?
(?:
openload\.(?:co|io|link)|
oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun)
)
)/
(?:f|embed)/
(?P<id>[a-zA-Z0-9-_]+)
'''
_TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o',
@@ -307,10 +318,22 @@ class OpenloadIE(InfoExtractor):
}, {
'url': 'https://oload.download/f/kUEfGclsU9o',
'only_matching': True,
}, {
'url': 'https://oload.cloud/f/4ZDnBXRWiB8',
'only_matching': True,
}, {
# Its title has not got its extension but url has it
'url': 'https://oload.download/f/N4Otkw39VCw/Tomb.Raider.2018.HDRip.XviD.AC3-EVO.avi.mp4',
'only_matching': True,
}, {
'url': 'https://oload.cc/embed/5NEAbI2BDSk',
'only_matching': True,
}, {
'url': 'https://oload.icu/f/-_i4y_F_Hs8',
'only_matching': True,
}, {
'url': 'https://oload.fun/f/gb6G1H4sHXY',
'only_matching': True,
}]
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
@@ -322,8 +345,11 @@ class OpenloadIE(InfoExtractor):
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
url_pattern = 'https://openload.co/%%s/%s/' % video_id
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
url_pattern = 'https://%s/%%s/%s/' % (host, video_id)
headers = {
'User-Agent': self._USER_AGENT,
}
@@ -356,7 +382,7 @@ class OpenloadIE(InfoExtractor):
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage,
'stream URL'))
video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id
video_url = 'https://%s/stream/%s?mime=true' % (host, decoded_id)
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
@@ -367,7 +393,7 @@ class OpenloadIE(InfoExtractor):
entry = entries[0] if entries else {}
subtitles = entry.get('subtitles')
info_dict = {
return {
'id': video_id,
'title': title,
'thumbnail': entry.get('thumbnail') or self._og_search_thumbnail(webpage, default=None),
@@ -376,4 +402,3 @@ class OpenloadIE(InfoExtractor):
'subtitles': subtitles,
'http_headers': headers,
}
return info_dict

View File

@@ -15,6 +15,7 @@ from ..utils import (
strip_jsonp,
unescapeHTML,
unified_strdate,
url_or_none,
)
@@ -68,26 +69,35 @@ class ORFTVthekIE(InfoExtractor):
webpage, 'playlist', group='json'),
playlist_id, transform_source=unescapeHTML)['playlist']['videos']
def quality_to_int(s):
m = re.search('([0-9]+)', s)
if m is None:
return -1
return int(m.group(1))
entries = []
for sd in data_jsb:
video_id, title = sd.get('id'), sd.get('title')
if not video_id or not title:
continue
video_id = compat_str(video_id)
formats = [{
'preference': -10 if fd['delivery'] == 'hls' else None,
'format_id': '%s-%s-%s' % (
fd['delivery'], fd['quality'], fd['quality_string']),
'url': fd['src'],
'protocol': fd['protocol'],
'quality': quality_to_int(fd['quality']),
} for fd in sd['sources']]
formats = []
for fd in sd['sources']:
src = url_or_none(fd.get('src'))
if not src:
continue
format_id_list = []
for key in ('delivery', 'quality', 'quality_string'):
value = fd.get(key)
if value:
format_id_list.append(value)
format_id = '-'.join(format_id_list)
if determine_ext(fd['src']) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
fd['src'], video_id, 'mp4', m3u8_id=format_id))
elif determine_ext(fd['src']) == 'f4m':
formats.extend(self._extract_f4m_formats(
fd['src'], video_id, f4m_id=format_id))
else:
formats.append({
'format_id': format_id,
'url': src,
'protocol': fd.get('protocol'),
})
# Check for geoblocking.
# There is a property is_geoprotection, but that's always false

View File

@@ -2,52 +2,63 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import js_to_json
from ..utils import (
clean_html,
determine_ext,
int_or_none,
parse_iso8601,
)
class PatreonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?patreon\.com/creation\?hid=(?P<id>[^&#]+)'
_TESTS = [
{
'url': 'http://www.patreon.com/creation?hid=743933',
'md5': 'e25505eec1053a6e6813b8ed369875cc',
'info_dict': {
'id': '743933',
'ext': 'mp3',
'title': 'Episode 166: David Smalley of Dogma Debate',
'uploader': 'Cognitive Dissonance Podcast',
'thumbnail': 're:^https?://.*$',
},
_VALID_URL = r'https?://(?:www\.)?patreon\.com/(?:creation\?hid=|posts/(?:[\w-]+-)?)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.patreon.com/creation?hid=743933',
'md5': 'e25505eec1053a6e6813b8ed369875cc',
'info_dict': {
'id': '743933',
'ext': 'mp3',
'title': 'Episode 166: David Smalley of Dogma Debate',
'description': 'md5:713b08b772cd6271b9f3906683cfacdf',
'uploader': 'Cognitive Dissonance Podcast',
'thumbnail': 're:^https?://.*$',
'timestamp': 1406473987,
'upload_date': '20140727',
},
{
'url': 'http://www.patreon.com/creation?hid=754133',
'md5': '3eb09345bf44bf60451b8b0b81759d0a',
'info_dict': {
'id': '754133',
'ext': 'mp3',
'title': 'CD 167 Extra',
'uploader': 'Cognitive Dissonance Podcast',
'thumbnail': 're:^https?://.*$',
},
}, {
'url': 'http://www.patreon.com/creation?hid=754133',
'md5': '3eb09345bf44bf60451b8b0b81759d0a',
'info_dict': {
'id': '754133',
'ext': 'mp3',
'title': 'CD 167 Extra',
'uploader': 'Cognitive Dissonance Podcast',
'thumbnail': 're:^https?://.*$',
},
{
'url': 'https://www.patreon.com/creation?hid=1682498',
'info_dict': {
'id': 'SU4fj_aEMVw',
'ext': 'mp4',
'title': 'I\'m on Patreon!',
'uploader': 'TraciJHines',
'thumbnail': 're:^https?://.*$',
'upload_date': '20150211',
'description': 'md5:c5a706b1f687817a3de09db1eb93acd4',
'uploader_id': 'TraciJHines',
},
'params': {
'noplaylist': True,
'skip_download': True,
}
'skip': 'Patron-only content',
}, {
'url': 'https://www.patreon.com/creation?hid=1682498',
'info_dict': {
'id': 'SU4fj_aEMVw',
'ext': 'mp4',
'title': 'I\'m on Patreon!',
'uploader': 'TraciJHines',
'thumbnail': 're:^https?://.*$',
'upload_date': '20150211',
'description': 'md5:c5a706b1f687817a3de09db1eb93acd4',
'uploader_id': 'TraciJHines',
},
'params': {
'noplaylist': True,
'skip_download': True,
}
]
}, {
'url': 'https://www.patreon.com/posts/episode-166-of-743933',
'only_matching': True,
}, {
'url': 'https://www.patreon.com/posts/743933',
'only_matching': True,
}]
# Currently Patreon exposes download URL via hidden CSS, so login is not
# needed. Keeping this commented for when this inevitably changes.
@@ -78,38 +89,48 @@ class PatreonIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage).strip()
attach_fn = self._html_search_regex(
r'<div class="attach"><a target="_blank" href="([^"]+)">',
webpage, 'attachment URL', default=None)
embed = self._html_search_regex(
r'<div[^>]+id="watchCreation"[^>]*>\s*<iframe[^>]+src="([^"]+)"',
webpage, 'embedded URL', default=None)
if attach_fn is not None:
video_url = 'http://www.patreon.com' + attach_fn
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<strong>(.*?)</strong> is creating', webpage, 'uploader')
elif embed is not None:
return self.url_result(embed)
else:
playlist = self._parse_json(self._search_regex(
r'(?s)new\s+jPlayerPlaylist\(\s*\{\s*[^}]*},\s*(\[.*?,?\s*\])',
webpage, 'playlist JSON'),
video_id, transform_source=js_to_json)
data = playlist[0]
video_url = self._proto_relative_url(data['mp3'])
thumbnail = self._proto_relative_url(data.get('cover'))
uploader = data.get('artist')
return {
post = self._download_json(
'https://www.patreon.com/api/posts/' + video_id, video_id)
attributes = post['data']['attributes']
title = attributes['title'].strip()
image = attributes.get('image') or {}
info = {
'id': video_id,
'url': video_url,
'ext': 'mp3',
'title': title,
'uploader': uploader,
'thumbnail': thumbnail,
'description': clean_html(attributes.get('content')),
'thumbnail': image.get('large_url') or image.get('url'),
'timestamp': parse_iso8601(attributes.get('published_at')),
'like_count': int_or_none(attributes.get('like_count')),
'comment_count': int_or_none(attributes.get('comment_count')),
}
def add_file(file_data):
file_url = file_data.get('url')
if file_url:
info.update({
'url': file_url,
'ext': determine_ext(file_data.get('name'), 'mp3'),
})
for i in post.get('included', []):
i_type = i.get('type')
if i_type == 'attachment':
add_file(i.get('attributes') or {})
elif i_type == 'user':
user_attributes = i.get('attributes')
if user_attributes:
info.update({
'uploader': user_attributes.get('full_name'),
'uploader_url': user_attributes.get('url'),
})
if not info.get('url'):
add_file(attributes.get('post_file') or {})
if not info.get('url'):
info.update({
'_type': 'url',
'url': attributes['embed']['url'],
})
return info

View File

@@ -2,31 +2,38 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
xpath_text,
try_get,
urljoin,
)
class PhilharmonieDeParisIE(InfoExtractor):
IE_DESC = 'Philharmonie de Paris'
_VALID_URL = r'https?://live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?:
live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)|
pad\.philharmoniedeparis\.fr/doc/CIMU/
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://pad.philharmoniedeparis.fr/doc/CIMU/1086697/jazz-a-la-villette-knower',
'md5': 'a0a4b195f544645073631cbec166a2c2',
'info_dict': {
'id': '1086697',
'ext': 'mp4',
'title': 'Jazz à la Villette : Knower',
},
}, {
'url': 'http://live.philharmoniedeparis.fr/concert/1032066.html',
'info_dict': {
'id': '1032066',
'ext': 'flv',
'title': 'md5:d1f5585d87d041d07ce9434804bc8425',
'timestamp': 1428179400,
'upload_date': '20150404',
'duration': 6592.278,
'title': 'md5:0a031b81807b3593cffa3c9a87a167a0',
},
'params': {
# rtmp download
'skip_download': True,
}
'playlist_mincount': 2,
}, {
'url': 'http://live.philharmoniedeparis.fr/Concert/1030324.html',
'only_matching': True,
@@ -34,45 +41,60 @@ class PhilharmonieDeParisIE(InfoExtractor):
'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
'only_matching': True,
}]
_LIVE_URL = 'https://live.philharmoniedeparis.fr'
def _real_extract(self, url):
video_id = self._match_id(url)
concert = self._download_xml(
'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=%s' % video_id,
video_id).find('./concert')
config = self._download_json(
'%s/otoPlayer/config.ashx' % self._LIVE_URL, video_id, query={
'id': video_id,
'lang': 'fr-FR',
})
formats = []
info_dict = {
'id': video_id,
'title': xpath_text(concert, './titre', 'title', fatal=True),
'formats': formats,
}
fichiers = concert.find('./fichiers')
stream = fichiers.attrib['serveurstream']
for fichier in fichiers.findall('./fichier'):
info_dict['duration'] = float_or_none(fichier.get('timecodefin'))
for quality, (format_id, suffix) in enumerate([('lq', ''), ('hq', '_hd')]):
format_url = fichier.get('url%s' % suffix)
if not format_url:
def extract_entry(source):
if not isinstance(source, dict):
return
title = source.get('title')
if not title:
return
files = source.get('files')
if not isinstance(files, dict):
return
format_urls = set()
formats = []
for format_id in ('mobile', 'desktop'):
format_url = try_get(
files, lambda x: x[format_id]['file'], compat_str)
if not format_url or format_url in format_urls:
continue
formats.append({
'url': stream,
'play_path': format_url,
'ext': 'flv',
'format_id': format_id,
'width': int_or_none(concert.get('largeur%s' % suffix)),
'height': int_or_none(concert.get('hauteur%s' % suffix)),
'quality': quality,
})
self._sort_formats(formats)
format_urls.add(format_url)
m3u8_url = urljoin(self._LIVE_URL, format_url)
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
if not formats:
return
self._sort_formats(formats)
return {
'title': title,
'formats': formats,
}
date, hour = concert.get('date'), concert.get('heure')
if date and hour:
info_dict['timestamp'] = parse_iso8601(
'%s-%s-%sT%s:00' % (date[0:4], date[4:6], date[6:8], hour))
elif date:
info_dict['upload_date'] = date
thumbnail = urljoin(self._LIVE_URL, config.get('image'))
return info_dict
info = extract_entry(config)
if info:
info.update({
'id': video_id,
'thumbnail': thumbnail,
})
return info
entries = []
for num, chapter in enumerate(config['chapters'], start=1):
entry = extract_entry(chapter)
entry['id'] = '%s-%d' % (video_id, num)
entries.append(entry)
return self.playlist_result(entries, video_id, config.get('title'))

View File

@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
@@ -15,7 +16,7 @@ from ..utils import (
class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)(?:/(?P<token>[a-zA-Z0-9]+))?'
_TEST = {
'url': 'https://picarto.tv/Setz',
'info_dict': {
@@ -33,20 +34,14 @@ class PicartoIE(InfoExtractor):
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
def _real_extract(self, url):
channel_id = self._match_id(url)
stream_page = self._download_webpage(url, channel_id)
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
if '>This channel does not exist' in stream_page:
raise ExtractorError(
'Channel %s does not exist' % channel_id, expected=True)
metadata = self._download_json(
'https://api.picarto.tv/v1/channel/name/' + channel_id,
channel_id)
player = self._parse_json(
self._search_regex(
r'(?s)playerSettings\[\d+\]\s*=\s*(\{.+?\}\s*\n)', stream_page,
'player settings'),
channel_id, transform_source=js_to_json)
if player.get('online') is False:
if metadata.get('online') is False:
raise ExtractorError('Stream is offline', expected=True)
cdn_data = self._download_json(
@@ -54,20 +49,13 @@ class PicartoIE(InfoExtractor):
data=urlencode_postdata({'loadbalancinginfo': channel_id}),
note='Downloading load balancing info')
def get_event(key):
return try_get(player, lambda x: x['event'][key], compat_str) or ''
token = mobj.group('token') or 'public'
params = {
'token': player.get('token') or '',
'ticket': get_event('ticket'),
'con': int(time.time() * 1000),
'type': get_event('ticket'),
'scope': get_event('scope'),
'token': token,
}
prefered_edge = cdn_data.get('preferedEdge')
default_tech = player.get('defaultTech')
formats = []
for edge in cdn_data['edges']:
@@ -81,8 +69,6 @@ class PicartoIE(InfoExtractor):
preference = 0
if edge_id == prefered_edge:
preference += 1
if tech_type == default_tech:
preference += 1
format_id = []
if edge_id:
format_id.append(edge_id)
@@ -109,7 +95,7 @@ class PicartoIE(InfoExtractor):
continue
self._sort_formats(formats)
mature = player.get('mature')
mature = metadata.get('adult')
if mature is None:
age_limit = None
else:
@@ -117,9 +103,11 @@ class PicartoIE(InfoExtractor):
return {
'id': channel_id,
'title': self._live_title(channel_id),
'title': self._live_title(metadata.get('title') or channel_id),
'is_live': True,
'thumbnail': player.get('vodThumb'),
'thumbnail': try_get(metadata, lambda x: x['thumbnails']['web']),
'channel': channel_id,
'channel_url': 'https://picarto.tv/%s' % channel_id,
'age_limit': age_limit,
'formats': formats,
}

View File

@@ -4,6 +4,7 @@ import collections
import json
import os
import random
import re
from .common import InfoExtractor
from ..compat import (
@@ -196,7 +197,10 @@ query viewClip {
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
if all(not re.search(p, response) for p in (
r'__INITIAL_STATE__', r'["\']currentUser["\']',
# new layout?
r'>\s*Sign out\s*<')):
BLOCKED = 'Your account has been blocked due to suspicious activity'
if BLOCKED in response:
raise ExtractorError(
@@ -210,18 +214,26 @@ query viewClip {
raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_idx, lang, name, duration, video_id):
captions_post = {
'a': author,
'cn': clip_idx,
'lc': lang,
'm': name,
}
captions = self._download_json(
'%s/player/retrieve-captions' % self._API_BASE, video_id,
'Downloading captions JSON', 'Unable to download captions JSON',
fatal=False, data=json.dumps(captions_post).encode('utf-8'),
headers={'Content-Type': 'application/json;charset=utf-8'})
def _get_subtitles(self, author, clip_idx, clip_id, lang, name, duration, video_id):
captions = None
if clip_id:
captions = self._download_json(
'%s/transcript/api/v1/caption/json/%s/%s'
% (self._API_BASE, clip_id, lang), video_id,
'Downloading captions JSON', 'Unable to download captions JSON',
fatal=False)
if not captions:
captions_post = {
'a': author,
'cn': int(clip_idx),
'lc': lang,
'm': name,
}
captions = self._download_json(
'%s/player/retrieve-captions' % self._API_BASE, video_id,
'Downloading captions JSON', 'Unable to download captions JSON',
fatal=False, data=json.dumps(captions_post).encode('utf-8'),
headers={'Content-Type': 'application/json;charset=utf-8'})
if captions:
return {
lang: [{
@@ -413,7 +425,7 @@ query viewClip {
# TODO: other languages?
subtitles = self.extract_subtitles(
author, clip_idx, 'en', name, duration, display_id)
author, clip_idx, clip.get('clipId'), 'en', name, duration, display_id)
return {
'id': clip_id,

View File

@@ -58,8 +58,6 @@ class PopcornTVIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp'))
print(self._html_search_meta(
'duration', webpage))
duration = int_or_none(self._html_search_meta(
'duration', webpage), invscale=60)
view_count = int_or_none(self._html_search_meta(

View File

@@ -27,7 +27,7 @@ class PornHubIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
(?:[^/]+\.)?pornhub\.com/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/
)
(?P<id>[\da-z]+)
@@ -40,6 +40,7 @@ class PornHubIE(InfoExtractor):
'ext': 'mp4',
'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
'uploader': 'Babes',
'upload_date': '20130628',
'duration': 361,
'view_count': int,
'like_count': int,
@@ -57,6 +58,7 @@ class PornHubIE(InfoExtractor):
'ext': 'mp4',
'title': '重庆婷婷女王足交',
'uploader': 'Unknown',
'upload_date': '20150213',
'duration': 1753,
'view_count': int,
'like_count': int,
@@ -119,12 +121,15 @@ class PornHubIE(InfoExtractor):
}, {
'url': 'http://www.pornhub.com/video/show?viewkey=648719015',
'only_matching': True,
}, {
'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.com/embed/[\da-z]+)',
r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.(?:com|net)/embed/[\da-z]+)',
webpage)
def _extract_count(self, pattern, webpage, name):
@@ -132,14 +137,16 @@ class PornHubIE(InfoExtractor):
pattern, webpage, '%s count' % name, fatal=False))
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host') or 'pornhub.com'
video_id = mobj.group('id')
self._set_cookie('pornhub.com', 'age_verified', '1')
self._set_cookie(host, 'age_verified', '1')
def dl_webpage(platform):
self._set_cookie('pornhub.com', 'platform', platform)
self._set_cookie(host, 'platform', platform)
return self._download_webpage(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
'http://www.%s/view_video.php?viewkey=%s' % (host, video_id),
video_id, 'Downloading %s webpage' % platform)
webpage = dl_webpage('pc')
@@ -237,8 +244,14 @@ class PornHubIE(InfoExtractor):
video_urls.append((video_url, None))
video_urls_set.add(video_url)
upload_date = None
formats = []
for video_url, height in video_urls:
if not upload_date:
upload_date = self._search_regex(
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date:
upload_date = upload_date.replace('/', '')
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj:
@@ -278,6 +291,7 @@ class PornHubIE(InfoExtractor):
return {
'id': video_id,
'uploader': video_uploader,
'upload_date': upload_date,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
@@ -294,7 +308,7 @@ class PornHubIE(InfoExtractor):
class PornHubPlaylistBaseIE(InfoExtractor):
def _extract_entries(self, webpage):
def _extract_entries(self, webpage, host):
# Only process container div with main playlist content skipping
# drop-down menu that uses similar pattern for videos (see
# https://github.com/rg3/youtube-dl/issues/11594).
@@ -304,7 +318,7 @@ class PornHubPlaylistBaseIE(InfoExtractor):
return [
self.url_result(
'http://www.pornhub.com/%s' % video_url,
'http://www.%s/%s' % (host, video_url),
PornHubIE.ie_key(), video_title=title)
for video_url, title in orderedSet(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
@@ -312,11 +326,13 @@ class PornHubPlaylistBaseIE(InfoExtractor):
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
entries = self._extract_entries(webpage)
entries = self._extract_entries(webpage, host)
playlist = self._parse_json(
self._search_regex(
@@ -331,7 +347,7 @@ class PornHubPlaylistBaseIE(InfoExtractor):
class PornHubPlaylistIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?pornhub\.com/playlist/(?P<id>\d+)'
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/playlist/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.pornhub.com/playlist/4667351',
'info_dict': {
@@ -346,7 +362,7 @@ class PornHubPlaylistIE(PornHubPlaylistBaseIE):
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?pornhub\.com/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos'
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos'
_TESTS = [{
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': {
@@ -387,7 +403,9 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
}]
def _real_extract(self, url):
user_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
user_id = mobj.group('id')
entries = []
for page_num in itertools.count(1):
@@ -399,7 +417,7 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break
raise
page_entries = self._extract_entries(webpage)
page_entries = self._extract_entries(webpage, host)
if not page_entries:
break
entries.extend(page_entries)

View File

@@ -274,7 +274,6 @@ class RaiPlayPlaylistIE(InfoExtractor):
('programma', 'nomeProgramma'), webpage, 'title')
description = unescapeHTML(self._html_search_meta(
('description', 'og:description'), webpage, 'description'))
print(description)
entries = []
for mobj in re.finditer(

View File

@@ -8,7 +8,10 @@ from ..compat import compat_HTTPError
from ..utils import (
float_or_none,
parse_iso8601,
str_or_none,
try_get,
unescapeHTML,
url_or_none,
ExtractorError,
)
@@ -17,65 +20,87 @@ class RteBaseIE(InfoExtractor):
def _real_extract(self, url):
item_id = self._match_id(url)
try:
json_string = self._download_json(
'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
item_id)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error_info = self._parse_json(ee.cause.read().decode(), item_id, fatal=False)
if error_info:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error_info['message']),
expected=True)
raise
# NB the string values in the JSON are stored using XML escaping(!)
show = json_string['shows'][0]
title = unescapeHTML(show['title'])
description = unescapeHTML(show.get('description'))
thumbnail = show.get('thumbnail')
duration = float_or_none(show.get('duration'), 1000)
timestamp = parse_iso8601(show.get('published'))
mg = show['media:group'][0]
info_dict = {}
formats = []
if mg.get('url'):
m = re.match(r'(?P<url>rtmpe?://[^/]+)/(?P<app>.+)/(?P<playpath>mp4:.*)', mg['url'])
if m:
m = m.groupdict()
formats.append({
'url': m['url'] + '/' + m['app'],
'app': m['app'],
'play_path': m['playpath'],
'player_url': url,
'ext': 'flv',
'format_id': 'rtmp',
})
ENDPOINTS = (
'https://feeds.rasset.ie/rteavgen/player/playlist?type=iptv&format=json&showId=',
'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=',
)
if mg.get('hls_server') and mg.get('hls_url'):
formats.extend(self._extract_m3u8_formats(
mg['hls_server'] + mg['hls_url'], item_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
for num, ep_url in enumerate(ENDPOINTS, start=1):
try:
data = self._download_json(ep_url + item_id, item_id)
except ExtractorError as ee:
if num < len(ENDPOINTS) or formats:
continue
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error_info = self._parse_json(ee.cause.read().decode(), item_id, fatal=False)
if error_info:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error_info['message']),
expected=True)
raise
if mg.get('hds_server') and mg.get('hds_url'):
formats.extend(self._extract_f4m_formats(
mg['hds_server'] + mg['hds_url'], item_id,
f4m_id='hds', fatal=False))
# NB the string values in the JSON are stored using XML escaping(!)
show = try_get(data, lambda x: x['shows'][0], dict)
if not show:
continue
if not info_dict:
title = unescapeHTML(show['title'])
description = unescapeHTML(show.get('description'))
thumbnail = show.get('thumbnail')
duration = float_or_none(show.get('duration'), 1000)
timestamp = parse_iso8601(show.get('published'))
info_dict = {
'id': item_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
}
mg = try_get(show, lambda x: x['media:group'][0], dict)
if not mg:
continue
if mg.get('url'):
m = re.match(r'(?P<url>rtmpe?://[^/]+)/(?P<app>.+)/(?P<playpath>mp4:.*)', mg['url'])
if m:
m = m.groupdict()
formats.append({
'url': m['url'] + '/' + m['app'],
'app': m['app'],
'play_path': m['playpath'],
'player_url': url,
'ext': 'flv',
'format_id': 'rtmp',
})
if mg.get('hls_server') and mg.get('hls_url'):
formats.extend(self._extract_m3u8_formats(
mg['hls_server'] + mg['hls_url'], item_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
if mg.get('hds_server') and mg.get('hds_url'):
formats.extend(self._extract_f4m_formats(
mg['hds_server'] + mg['hds_url'], item_id,
f4m_id='hds', fatal=False))
mg_rte_server = str_or_none(mg.get('rte:server'))
mg_url = str_or_none(mg.get('url'))
if mg_rte_server and mg_url:
hds_url = url_or_none(mg_rte_server + mg_url)
if hds_url:
formats.extend(self._extract_f4m_formats(
hds_url, item_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return {
'id': item_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
}
info_dict['formats'] = formats
return info_dict
class RteIE(RteBaseIE):

View File

@@ -103,7 +103,8 @@ class RutubeIE(RutubeBaseIE):
options = self._download_json(
'http://rutube.ru/api/play/options/%s/?format=json' % video_id,
video_id, 'Downloading options JSON')
video_id, 'Downloading options JSON',
headers=self.geo_verification_headers())
formats = []
for format_id, format_url in options['video_balancer'].items():

View File

@@ -65,7 +65,8 @@ class RuutuIE(InfoExtractor):
video_id = self._match_id(url)
video_xml = self._download_xml(
'http://gatling.ruutu.fi/media-xml-cache?id=%s' % video_id, video_id)
'https://gatling.nelonenmedia.fi/media-xml-cache', video_id,
query={'id': video_id})
formats = []
processed_urls = []

View File

@@ -90,6 +90,15 @@ class ScreencastIE(InfoExtractor):
r'src=(.*?)(?:$|&)', video_meta,
'meta tag video URL', default=None)
if video_url is None:
video_url = self._html_search_regex(
r'MediaContentUrl["\']\s*:(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video url', default=None, group='url')
if video_url is None:
video_url = self._html_search_meta(
'og:video', webpage, default=None)
if video_url is None:
raise ExtractorError('Cannot find video')

View File

@@ -5,6 +5,7 @@ from ..compat import compat_b64decode
from ..utils import (
ExtractorError,
int_or_none,
url_or_none,
urlencode_postdata,
)
@@ -86,9 +87,16 @@ class VivoIE(SharedBaseIE):
}
def _extract_video_url(self, webpage, video_id, *args):
def decode_url(encoded_url):
return compat_b64decode(encoded_url).decode('utf-8')
stream_url = url_or_none(decode_url(self._search_regex(
r'data-stream\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'stream url', default=None, group='url')))
if stream_url:
return stream_url
return self._parse_json(
self._search_regex(
r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'stream', group='url'),
video_id,
transform_source=lambda x: compat_b64decode(x).decode('utf-8'))[0]
video_id, transform_source=decode_url)[0]

View File

@@ -64,7 +64,7 @@ class SixPlayIE(InfoExtractor):
for asset in clip_data['assets']:
asset_url = asset.get('full_physical_path')
protocol = asset.get('protocol')
if not asset_url or protocol == 'primetime' or asset_url in urls:
if not asset_url or protocol == 'primetime' or asset.get('type') == 'usp_hlsfp_h264' or asset_url in urls:
continue
urls.append(asset_url)
container = asset.get('video_container')
@@ -81,19 +81,17 @@ class SixPlayIE(InfoExtractor):
if not urlh:
continue
asset_url = urlh.geturl()
asset_url = re.sub(r'/([^/]+)\.ism/[^/]*\.m3u8', r'/\1.ism/\1.m3u8', asset_url)
formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
asset_url.replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
asset_url.replace('.m3u8', '.mpd'),
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_ism_formats(
re.sub(r'/[^/]+\.m3u8', '/Manifest', asset_url),
video_id, ism_id='mss', fatal=False))
for i in range(3, 0, -1):
asset_url = asset_url = asset_url.replace('_sd1/', '_sd%d/' % i)
m3u8_formats = self._extract_m3u8_formats(
asset_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats)
formats.extend(self._extract_mpd_formats(
asset_url.replace('.m3u8', '.mpd'),
video_id, mpd_id='dash', fatal=False))
if m3u8_formats:
break
else:
formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', 'm3u8_native',

View File

@@ -44,3 +44,10 @@ class ParamountNetworkIE(MTVServicesInfoExtractor):
_FEED_URL = 'http://www.paramountnetwork.com/feeds/mrss/'
_GEO_COUNTRIES = ['US']
def _extract_mgid(self, webpage):
cs = self._parse_json(self._search_regex(
r'window\.__DATA__\s*=\s*({.+})',
webpage, 'data'), None)['children']
c = next(c for c in cs if c.get('type') == 'VideoPlayer')
return c['props']['media']['video']['config']['uri']

View File

@@ -8,20 +8,24 @@ from ..utils import (
determine_ext,
int_or_none,
js_to_json,
merge_dicts,
)
class SportBoxEmbedIE(InfoExtractor):
_VALID_URL = r'https?://news\.sportbox\.ru/vdl/player(?:/[^/]+/|\?.*?\bn?id=)(?P<id>\d+)'
class SportBoxIE(InfoExtractor):
_VALID_URL = r'https?://(?:news\.sportbox|matchtv)\.ru/vdl/player(?:/[^/]+/|\?.*?\bn?id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://news.sportbox.ru/vdl/player/ci/211355',
'info_dict': {
'id': '211355',
'id': '109158',
'ext': 'mp4',
'title': '211355',
'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
'description': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 292,
'view_count': int,
'timestamp': 1426237001,
'upload_date': '20150313',
},
'params': {
# m3u8 download
@@ -33,12 +37,18 @@ class SportBoxEmbedIE(InfoExtractor):
}, {
'url': 'https://news.sportbox.ru/vdl/player/media/193095',
'only_matching': True,
}, {
'url': 'https://news.sportbox.ru/vdl/player/media/109158',
'only_matching': True,
}, {
'url': 'https://matchtv.ru/vdl/player/media/109158',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src="(https?://news\.sportbox\.ru/vdl/player[^"]+)"',
r'<iframe[^>]+src="(https?://(?:news\.sportbox|matchtv)\.ru/vdl/player[^"]+)"',
webpage)
def _real_extract(self, url):
@@ -46,13 +56,14 @@ class SportBoxEmbedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
wjplayer_data = self._parse_json(
sources = self._parse_json(
self._search_regex(
r'(?s)wjplayer\(({.+?})\);', webpage, 'wjplayer settings'),
r'(?s)playerOptions\.sources(?:WithRes)?\s*=\s*(\[.+?\])\s*;\s*\n',
webpage, 'sources'),
video_id, transform_source=js_to_json)
formats = []
for source in wjplayer_data['sources']:
for source in sources:
src = source.get('src')
if not src:
continue
@@ -66,14 +77,23 @@ class SportBoxEmbedIE(InfoExtractor):
})
self._sort_formats(formats)
player = self._parse_json(
self._search_regex(
r'(?s)playerOptions\s*=\s*({.+?})\s*;\s*\n', webpage,
'player options', default='{}'),
video_id, transform_source=js_to_json)
media_id = player['mediaId']
info = self._search_json_ld(webpage, media_id, default={})
view_count = int_or_none(self._search_regex(
r'Просмотров\s*:\s*(\d+)', webpage, 'view count', default=None))
return {
'id': video_id,
'title': video_id,
'thumbnail': wjplayer_data.get('poster'),
'duration': int_or_none(wjplayer_data.get('duration')),
return merge_dicts(info, {
'id': media_id,
'title': self._og_search_title(webpage, default=None) or media_id,
'thumbnail': player.get('poster'),
'duration': int_or_none(player.get('duration')),
'view_count': view_count,
'formats': formats,
}
})

View File

@@ -16,7 +16,7 @@ from ..utils import (
class TBSIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com/(?:movies|shows/[^/]+/(?:clips|season-\d+/episode-\d+))/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com(?P<path>/(?:movies|shows/[^/]+/(?:clips|season-\d+/episode-\d+))/(?P<id>[^/?#]+))'
_TESTS = [{
'url': 'http://www.tntdrama.com/shows/the-alienist/clips/monster',
'info_dict': {
@@ -40,12 +40,12 @@ class TBSIE(TurnerBaseIE):
}]
def _real_extract(self, url):
site, display_id = re.match(self._VALID_URL, url).groups()
site, path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex(
r'<script[^>]+?data-drupal-selector="drupal-settings-json"[^>]*?>({.+?})</script>',
webpage, 'drupal setting'), display_id)
video_data = drupal_settings['turner_playlist'][0]
video_data = next(v for v in drupal_settings['turner_playlist'] if v.get('url') == path)
media_id = video_data['mediaID']
title = video_data['title']

View File

@@ -14,20 +14,38 @@ from ..utils import (
)
class UpskillBaseIE(InfoExtractor):
_LOGIN_URL = 'http://upskillcourses.com/sign_in'
_NETRC_MACHINE = 'upskill'
class TeachableBaseIE(InfoExtractor):
_NETRC_MACHINE = 'teachable'
_URL_PREFIX = 'teachable:'
_SITES = {
# Only notable ones here
'upskillcourses.com': 'upskill',
'academy.gns3.com': 'gns3',
'academyhacker.com': 'academyhacker',
'stackskills.com': 'stackskills',
'market.saleshacker.com': 'saleshacker',
'learnability.org': 'learnability',
'edurila.com': 'edurila',
}
_VALID_URL_SUB_TUPLE = (_URL_PREFIX, '|'.join(re.escape(site) for site in _SITES.keys()))
def _real_initialize(self):
self._login()
self._logged_in = False
def _login(self):
username, password = self._get_login_info()
def _login(self, site):
if self._logged_in:
return
username, password = self._get_login_info(
netrc_machine=self._SITES.get(site, site))
if username is None:
return
login_page, urlh = self._download_webpage_handle(
self._LOGIN_URL, None, 'Downloading login page')
'https://%s/sign_in' % site, None,
'Downloading %s login page' % site)
login_url = compat_str(urlh.geturl())
@@ -46,18 +64,24 @@ class UpskillBaseIE(InfoExtractor):
post_url = urljoin(login_url, post_url)
response = self._download_webpage(
post_url, None, 'Logging in',
post_url, None, 'Logging in to %s' % site,
data=urlencode_postdata(login_form),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': login_url,
})
if '>I accept the new Privacy Policy<' in response:
raise ExtractorError(
'Unable to login: %s asks you to accept new Privacy Policy. '
'Go to https://%s/ and accept.' % (site, site), expected=True)
# Successful login
if any(re.search(p, response) for p in (
r'class=["\']user-signout',
r'<a[^>]+\bhref=["\']/sign_out',
r'>\s*Log out\s*<')):
self._logged_in = True
return
message = get_element_by_class('alert', response)
@@ -68,8 +92,14 @@ class UpskillBaseIE(InfoExtractor):
raise ExtractorError('Unable to log in')
class UpskillIE(UpskillBaseIE):
_VALID_URL = r'https?://(?:www\.)?upskillcourses\.com/courses/[^/]+/lectures/(?P<id>\d+)'
class TeachableIE(TeachableBaseIE):
_VALID_URL = r'''(?x)
(?:
%shttps?://(?P<site_t>[^/]+)|
https?://(?:www\.)?(?P<site>%s)
)
/courses/[^/]+/lectures/(?P<id>\d+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
@@ -77,7 +107,7 @@ class UpskillIE(UpskillBaseIE):
'id': 'uzw6zw58or',
'ext': 'mp4',
'title': 'Welcome to the Course!',
'description': 'md5:8d66c13403783370af62ca97a7357bdd',
'description': 'md5:65edb0affa582974de4625b9cdea1107',
'duration': 138.763,
'timestamp': 1479846621,
'upload_date': '20161122',
@@ -88,10 +118,38 @@ class UpskillIE(UpskillBaseIE):
}, {
'url': 'http://upskillcourses.com/courses/119763/lectures/1747100',
'only_matching': True,
}, {
'url': 'https://academy.gns3.com/courses/423415/lectures/6885939',
'only_matching': True,
}, {
'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'only_matching': True,
}]
@staticmethod
def _is_teachable(webpage):
return 'teachableTracker.linker:autoLink' in webpage and re.search(
r'<link[^>]+href=["\']https?://process\.fs\.teachablecdn\.com',
webpage)
@staticmethod
def _extract_url(webpage, source_url):
if not TeachableIE._is_teachable(webpage):
print('NOT TEACHABLE')
return
if re.match(r'https?://[^/]+/(?:courses|p)', source_url):
return '%s%s' % (TeachableBaseIE._URL_PREFIX, source_url)
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site') or mobj.group('site_t')
video_id = mobj.group('id')
self._login(site)
prefixed = url.startswith(self._URL_PREFIX)
if prefixed:
url = url[len(self._URL_PREFIX):]
webpage = self._download_webpage(url, video_id)
@@ -113,12 +171,18 @@ class UpskillIE(UpskillBaseIE):
}
class UpskillCourseIE(UpskillBaseIE):
_VALID_URL = r'https?://(?:www\.)?upskillcourses\.com/courses/(?:enrolled/)?(?P<id>[^/?#&]+)'
class TeachableCourseIE(TeachableBaseIE):
_VALID_URL = r'''(?x)
(?:
%shttps?://(?P<site_t>[^/]+)|
https?://(?:www\.)?(?P<site>%s)
)
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/',
'info_dict': {
'id': '119763',
'id': 'essential-web-developer-course',
'title': 'The Essential Web Developer Course (Free)',
},
'playlist_count': 192,
@@ -128,21 +192,37 @@ class UpskillCourseIE(UpskillBaseIE):
}, {
'url': 'http://upskillcourses.com/courses/enrolled/119763',
'only_matching': True,
}, {
'url': 'https://academy.gns3.com/courses/enrolled/423415',
'only_matching': True,
}, {
'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',
'only_matching': True,
}, {
'url': 'teachable:https://filmsimplified.com/p/davinci-resolve-15-crash-course',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if UpskillIE.suitable(url) else super(
UpskillCourseIE, cls).suitable(url)
return False if TeachableIE.suitable(url) else super(
TeachableCourseIE, cls).suitable(url)
def _real_extract(self, url):
course_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site') or mobj.group('site_t')
course_id = mobj.group('id')
self._login(site)
prefixed = url.startswith(self._URL_PREFIX)
if prefixed:
prefix = self._URL_PREFIX
url = url[len(prefix):]
webpage = self._download_webpage(url, course_id)
course_id = self._search_regex(
r'data-course-id=["\'](\d+)', webpage, 'course id',
default=course_id)
url_base = 'https://%s/' % site
entries = []
@@ -162,10 +242,13 @@ class UpskillCourseIE(UpskillBaseIE):
title = self._html_search_regex(
r'<span[^>]+class=["\']lecture-name[^>]+>([^<]+)', li,
'title', default=None)
entry_url = urljoin(url_base, lecture_url)
if prefixed:
entry_url = self._URL_PREFIX + entry_url
entries.append(
self.url_result(
urljoin('http://upskillcourses.com/', lecture_url),
ie=UpskillIE.ie_key(), video_id=lecture_id,
entry_url,
ie=TeachableIE.ie_key(), video_id=lecture_id,
video_title=clean_html(title)))
course_title = self._html_search_regex(

View File

@@ -212,8 +212,6 @@ class TEDIE(InfoExtractor):
http_url = None
for format_id, resources in resources_.items():
if not isinstance(resources, dict):
continue
if format_id == 'h264':
for resource in resources:
h264_url = resource.get('file')
@@ -242,6 +240,8 @@ class TEDIE(InfoExtractor):
'tbr': int_or_none(resource.get('bitrate')),
})
elif format_id == 'hls':
if not isinstance(resources, dict):
continue
stream_url = url_or_none(resources.get('stream'))
if not stream_url:
continue

View File

@@ -39,9 +39,17 @@ class ThePlatformBaseIE(OnceIE):
smil_url, video_id, note=note, query={'format': 'SMIL'},
headers=self.geo_verification_headers())
error_element = find_xpath_attr(meta, _x('.//smil:ref'), 'src')
if error_element is not None and error_element.attrib['src'].startswith(
'http://link.theplatform.%s/s/errorFiles/Unavailable.' % self._TP_TLD):
raise ExtractorError(error_element.attrib['abstract'], expected=True)
if error_element is not None:
exception = find_xpath_attr(
error_element, _x('.//smil:param'), 'name', 'exception')
if exception is not None:
if exception.get('value') == 'GeoLocationBlocked':
self.raise_geo_restricted(error_element.attrib['abstract'])
elif error_element.attrib['src'].startswith(
'http://link.theplatform.%s/s/errorFiles/Unavailable.'
% self._TP_TLD):
raise ExtractorError(
error_element.attrib['abstract'], expected=True)
smil_formats = self._parse_smil_formats(
meta, smil_url, video_id, namespace=default_ns,
@@ -335,7 +343,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}, account_id=None):
real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
entry = self._download_json(real_url, video_id)['entries'][0]
main_smil_url = 'http://link.theplatform.com/s/%s/media/guid/%d/%s' % (provider_id, account_id, entry['guid']) if account_id else None
main_smil_url = 'http://link.theplatform.com/s/%s/media/guid/%d/%s' % (provider_id, account_id, entry['guid']) if account_id else entry.get('plmedia$publicUrl')
formats = []
subtitles = {}
@@ -348,7 +356,8 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
if first_video_id is None:
first_video_id = cur_video_id
duration = float_or_none(item.get('plfile$duration'))
for asset_type in item['plfile$assetTypes']:
file_asset_types = item.get('plfile$assetTypes') or compat_parse_qs(compat_urllib_parse_urlparse(smil_url).query)['assetTypes']
for asset_type in file_asset_types:
if asset_type in asset_types:
continue
asset_types.append(asset_type)

View File

@@ -0,0 +1,117 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
compat_str,
ExtractorError,
int_or_none,
str_or_none,
try_get,
url_or_none,
)
class TikTokBaseIE(InfoExtractor):
def _extract_aweme(self, data):
video = data['video']
description = str_or_none(try_get(data, lambda x: x['desc']))
width = int_or_none(try_get(data, lambda x: video['width']))
height = int_or_none(try_get(data, lambda x: video['height']))
format_urls = set()
formats = []
for format_id in (
'play_addr_lowbr', 'play_addr', 'play_addr_h264',
'download_addr'):
for format in try_get(
video, lambda x: x[format_id]['url_list'], list) or []:
format_url = url_or_none(format)
if not format_url:
continue
if format_url in format_urls:
continue
format_urls.add(format_url)
formats.append({
'url': format_url,
'ext': 'mp4',
'height': height,
'width': width,
})
self._sort_formats(formats)
thumbnail = url_or_none(try_get(
video, lambda x: x['cover']['url_list'][0], compat_str))
uploader = try_get(data, lambda x: x['author']['nickname'], compat_str)
timestamp = int_or_none(data.get('create_time'))
comment_count = int_or_none(data.get('comment_count')) or int_or_none(
try_get(data, lambda x: x['statistics']['comment_count']))
repost_count = int_or_none(try_get(
data, lambda x: x['statistics']['share_count']))
aweme_id = data['aweme_id']
return {
'id': aweme_id,
'title': uploader or aweme_id,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'timestamp': timestamp,
'comment_count': comment_count,
'repost_count': repost_count,
'formats': formats,
}
class TikTokIE(TikTokBaseIE):
_VALID_URL = r'https?://(?:m\.)?tiktok\.com/v/(?P<id>\d+)'
_TEST = {
'url': 'https://m.tiktok.com/v/6606727368545406213.html',
'md5': 'd584b572e92fcd48888051f238022420',
'info_dict': {
'id': '6606727368545406213',
'ext': 'mp4',
'title': 'Zureeal',
'description': '#bowsette#mario#cosplay#uk#lgbt#gaming#asian#bowsettecosplay',
'thumbnail': r're:^https?://.*~noop.image',
'uploader': 'Zureeal',
'timestamp': 1538248586,
'upload_date': '20180929',
'comment_count': int,
'repost_count': int,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = self._parse_json(self._search_regex(
r'\bdata\s*=\s*({.+?})\s*;', webpage, 'data'), video_id)
return self._extract_aweme(data)
class TikTokUserIE(TikTokBaseIE):
_VALID_URL = r'https?://(?:m\.)?tiktok\.com/h5/share/usr/(?P<id>\d+)'
_TEST = {
'url': 'https://m.tiktok.com/h5/share/usr/188294915489964032.html',
'info_dict': {
'id': '188294915489964032',
},
'playlist_mincount': 24,
}
def _real_extract(self, url):
user_id = self._match_id(url)
data = self._download_json(
'https://m.tiktok.com/h5/share/usr/list/%s/' % user_id, user_id,
query={'_signature': '_'})
entries = []
for aweme in data['aweme_list']:
try:
entry = self._extract_aweme(aweme)
except ExtractorError:
continue
entry['extractor_key'] = TikTokIE.ie_key()
entries.append(entry)
return self.playlist_result(entries, user_id)

View File

@@ -18,8 +18,9 @@ from ..utils import (
class TNAFlixNetworkBaseIE(InfoExtractor):
# May be overridden in descendants if necessary
_CONFIG_REGEX = [
r'flashvars\.config\s*=\s*escape\("([^"]+)"',
r'<input[^>]+name="config\d?" value="([^"]+)"',
r'flashvars\.config\s*=\s*escape\("(?P<url>[^"]+)"',
r'<input[^>]+name="config\d?" value="(?P<url>[^"]+)"',
r'config\s*=\s*(["\'])(?P<url>(?:https?:)?//(?:(?!\1).)+)\1',
]
_HOST = 'tna'
_VKEY_SUFFIX = ''
@@ -85,7 +86,8 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
cfg_url = self._proto_relative_url(self._html_search_regex(
self._CONFIG_REGEX, webpage, 'flashvars.config', default=None), 'http:')
self._CONFIG_REGEX, webpage, 'flashvars.config', default=None,
group='url'), 'http:')
if not cfg_url:
inputs = self._hidden_inputs(webpage)

View File

@@ -45,7 +45,7 @@ class Tube8IE(KeezMoviesIE):
r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
description = self._html_search_regex(
r'>Description:</strong>\s*(.+?)\s*<', webpage, 'description', fatal=False)
r'(?s)Description:</dt>\s*<dd>(.+?)</dd>', webpage, 'description', fatal=False)
uploader = self._html_search_regex(
r'<span class="username">\s*(.+?)\s*<',
webpage, 'uploader', fatal=False)
@@ -55,19 +55,19 @@ class Tube8IE(KeezMoviesIE):
dislike_count = int_or_none(self._search_regex(
r'rdownVar\s*=\s*"(\d+)"', webpage, 'dislike count', fatal=False))
view_count = str_to_int(self._search_regex(
r'<strong>Views: </strong>([\d,\.]+)\s*</li>',
r'Views:\s*</dt>\s*<dd>([\d,\.]+)',
webpage, 'view count', fatal=False))
comment_count = str_to_int(self._search_regex(
r'<span id="allCommentsCount">(\d+)</span>',
webpage, 'comment count', fatal=False))
category = self._search_regex(
r'Category:\s*</strong>\s*<a[^>]+href=[^>]+>([^<]+)',
r'Category:\s*</dt>\s*<dd>\s*<a[^>]+href=[^>]+>([^<]+)',
webpage, 'category', fatal=False)
categories = [category] if category else None
tags_str = self._search_regex(
r'(?s)Tags:\s*</strong>(.+?)</(?!a)',
r'(?s)Tags:\s*</dt>\s*<dd>(.+?)</(?!a)',
webpage, 'tags', fatal=False)
tags = [t for t in re.findall(
r'<a[^>]+href=[^>]+>([^<]+)', tags_str)] if tags_str else None

View File

@@ -1,34 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class TV3IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tv3\.co\.nz/(?P<id>[^/]+)/tabid/\d+/articleID/\d+/MCat/\d+/Default\.aspx'
_TEST = {
'url': 'http://www.tv3.co.nz/MOTORSPORT-SRS-SsangYong-Hampton-Downs-Round-3/tabid/3692/articleID/121615/MCat/2915/Default.aspx',
'info_dict': {
'id': '4659127992001',
'ext': 'mp4',
'title': 'CRC Motorsport: SRS SsangYong Hampton Downs Round 3 - S2015 Ep3',
'description': 'SsangYong Racing Series returns for Round 3 with drivers from New Zealand and Australia taking to the grid at Hampton Downs raceway.',
'uploader_id': '3812193411001',
'upload_date': '20151213',
'timestamp': 1449975272,
},
'expected_warnings': [
'Failed to download MPD manifest'
],
'params': {
# m3u8 download
'skip_download': True,
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/3812193411001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(r'<param\s*name="@videoPlayer"\s*value="(\d+)"', webpage, 'brightcove id')
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

View File

@@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
import re
class TwitCastingIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?twitcasting\.tv/(?P<uploader_id>[^/]+)/movie/(?P<id>\d+)'
_TEST = {
'url': 'https://twitcasting.tv/ivetesangalo/movie/2357609',
'md5': '745243cad58c4681dc752490f7540d7f',
'info_dict': {
'id': '2357609',
'ext': 'mp4',
'title': 'Recorded Live #2357609',
'uploader_id': 'ivetesangalo',
'description': "Moi! I'm live on TwitCasting from my iPhone.",
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
uploader_id = mobj.group('uploader_id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'(?s)<[^>]+id=["\']movietitle[^>]+>(.+?)</',
webpage, 'title', default=None) or self._html_search_meta(
'twitter:title', webpage, fatal=True)
m3u8_url = self._search_regex(
(r'data-movie-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
r'(["\'])(?P<url>http.+?\.m3u8.*?)\1'),
webpage, 'm3u8 url', group='url')
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader_id': uploader_id,
'formats': formats,
}

View File

@@ -51,7 +51,9 @@ class TwitchBaseIE(InfoExtractor):
expected=True)
def _call_api(self, path, item_id, *args, **kwargs):
kwargs.setdefault('headers', {})['Client-ID'] = self._CLIENT_ID
headers = kwargs.get('headers', {}).copy()
headers['Client-ID'] = self._CLIENT_ID
kwargs['headers'] = headers
response = self._download_json(
'%s/%s' % (self._API_BASE, path), item_id,
*args, **compat_kwargs(kwargs))
@@ -559,7 +561,8 @@ class TwitchStreamIE(TwitchBaseIE):
TwitchAllVideosIE,
TwitchUploadsIE,
TwitchPastBroadcastsIE,
TwitchHighlightsIE))
TwitchHighlightsIE,
TwitchClipsIE))
else super(TwitchStreamIE, cls).suitable(url))
def _real_extract(self, url):
@@ -633,7 +636,7 @@ class TwitchStreamIE(TwitchBaseIE):
class TwitchClipsIE(TwitchBaseIE):
IE_NAME = 'twitch:clips'
_VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:[^/]+/)*|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
@@ -653,6 +656,9 @@ class TwitchClipsIE(TwitchBaseIE):
# multiple formats
'url': 'https://clips.twitch.tv/rflegendary/UninterestedBeeDAESuppy',
'only_matching': True,
}, {
'url': 'https://www.twitch.tv/sergeynixon/clip/StormyThankfulSproutFutureMan',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -122,7 +122,9 @@ class UdemyIE(InfoExtractor):
raise ExtractorError(error_str, expected=True)
def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
headers = kwargs.get('headers', {}).copy()
headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
kwargs['headers'] = headers
return super(UdemyIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))

View File

@@ -130,16 +130,16 @@ class ViewsterIE(InfoExtractor):
def concat(suffix, sep='-'):
return (base_format_id + '%s%s' % (sep, suffix)) if base_format_id else suffix
for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'):
media = self._download_json(
'https://public-api.viewster.com/movies/%s/video' % entry_id,
video_id, 'Downloading %s JSON' % concat(media_type, ' '), fatal=False, query={
'mediaType': media_type,
'language': audio,
'subtitle': subtitle,
})
if not media:
continue
medias = self._download_json(
'https://public-api.viewster.com/movies/%s/videos' % entry_id,
video_id, fatal=False, query={
'mediaTypes': ['application/f4m+xml', 'application/x-mpegURL', 'video/mp4'],
'language': audio,
'subtitle': subtitle,
})
if not medias:
continue
for media in medias:
video_url = media.get('Uri')
if not video_url:
continue

View File

@@ -14,10 +14,13 @@ from ..compat import (
from ..utils import (
determine_ext,
ExtractorError,
js_to_json,
InAdvancePagedList,
int_or_none,
merge_dicts,
NO_DEFAULT,
parse_filesize,
qualities,
RegexNotFoundError,
sanitized_Request,
smuggle_url,
@@ -27,7 +30,6 @@ from ..utils import (
unsmuggle_url,
urlencode_postdata,
unescapeHTML,
parse_filesize,
)
@@ -299,10 +301,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/atencio',
'uploader_id': 'atencio',
'uploader': 'Peter Atencio',
'channel_id': 'keypeele',
'channel_url': r're:https?://(?:www\.)?vimeo\.com/channels/keypeele',
'timestamp': 1380339469,
'upload_date': '20130928',
'duration': 187,
},
'expected_warnings': ['Unable to download JSON metadata'],
},
{
'url': 'http://vimeo.com/76979871',
@@ -355,11 +360,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/channels/tributes/6213729',
'info_dict': {
'id': '6213729',
'ext': 'mov',
'ext': 'mp4',
'title': 'Vimeo Tribute: The Shining',
'uploader': 'Casey Donahue',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/caseydonahue',
'uploader_id': 'caseydonahue',
'channel_url': r're:https?://(?:www\.)?vimeo\.com/channels/tributes',
'channel_id': 'tributes',
'timestamp': 1250886430,
'upload_date': '20090821',
'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
@@ -465,6 +472,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
if 'Referer' not in headers:
headers['Referer'] = url
channel_id = self._search_regex(
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
# Extract ID from URL
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
@@ -543,6 +553,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
else:
config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;')
config_re.append(r'\bconfig\s*=\s*({.+?})\s*;')
config = self._search_regex(config_re, webpage, 'info section',
flags=re.DOTALL)
config = json.loads(config)
@@ -563,19 +574,23 @@ class VimeoIE(VimeoBaseInfoExtractor):
if config.get('view') == 4:
config = self._verify_player_video_password(redirect_url, video_id)
vod = config.get('video', {}).get('vod', {})
def is_rented():
if '>You rented this title.<' in webpage:
return True
if config.get('user', {}).get('purchased'):
return True
label = try_get(
config, lambda x: x['video']['vod']['purchase_options'][0]['label_string'], compat_str)
if label and label.startswith('You rented this'):
return True
for purchase_option in vod.get('purchase_options', []):
if purchase_option.get('purchased'):
return True
label = purchase_option.get('label_string')
if label and (label.startswith('You rented this') or label.endswith(' remaining')):
return True
return False
if is_rented():
feature_id = config.get('video', {}).get('vod', {}).get('feature_id')
if is_rented() and vod.get('is_trailer'):
feature_id = vod.get('feature_id')
if feature_id and not data.get('force_feature_id', False):
return self.url_result(smuggle_url(
'https://player.vimeo.com/player/%s' % feature_id,
@@ -652,6 +667,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license')
channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None
info_dict = {
'id': video_id,
'formats': formats,
@@ -662,6 +679,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': like_count,
'comment_count': comment_count,
'license': cc_license,
'channel_id': channel_id,
'channel_url': channel_url,
}
info_dict = merge_dicts(info_dict, info_dict_config, json_ld)
@@ -1046,3 +1065,96 @@ class VimeoLikesIE(InfoExtractor):
'description': description,
'entries': pl,
}
class VHXEmbedIE(InfoExtractor):
IE_NAME = 'vhx:embed'
_VALID_URL = r'https?://embed\.vhx\.tv/videos/(?P<id>\d+)'
def _call_api(self, video_id, access_token, path='', query=None):
return self._download_json(
'https://api.vhx.tv/videos/' + video_id + path, video_id, headers={
'Authorization': 'Bearer ' + access_token,
}, query=query)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
credentials = self._parse_json(self._search_regex(
r'(?s)credentials\s*:\s*({.+?}),', webpage,
'config'), video_id, js_to_json)
access_token = credentials['access_token']
query = {}
for k, v in credentials.items():
if k in ('authorization', 'authUserToken', 'ticket') and v and v != 'undefined':
if k == 'authUserToken':
query['auth_user_token'] = v
else:
query[k] = v
files = self._call_api(video_id, access_token, '/files', query)
formats = []
for f in files:
href = try_get(f, lambda x: x['_links']['source']['href'])
if not href:
continue
method = f.get('method')
if method == 'hls':
formats.extend(self._extract_m3u8_formats(
href, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif method == 'dash':
formats.extend(self._extract_mpd_formats(
href, video_id, mpd_id='dash', fatal=False))
else:
fmt = {
'filesize': int_or_none(try_get(f, lambda x: x['size']['bytes'])),
'format_id': 'http',
'preference': 1,
'url': href,
'vcodec': f.get('codec'),
}
quality = f.get('quality')
if quality:
fmt.update({
'format_id': 'http-' + quality,
'height': int_or_none(self._search_regex(r'(\d+)p', quality, 'height', default=None)),
})
formats.append(fmt)
self._sort_formats(formats)
video_data = self._call_api(video_id, access_token)
title = video_data.get('title') or video_data['name']
subtitles = {}
for subtitle in try_get(video_data, lambda x: x['tracks']['subtitles'], list) or []:
lang = subtitle.get('srclang') or subtitle.get('label')
for _link in subtitle.get('_links', {}).values():
href = _link.get('href')
if not href:
continue
subtitles.setdefault(lang, []).append({
'url': href,
})
q = qualities(['small', 'medium', 'large', 'source'])
thumbnails = []
for thumbnail_id, thumbnail_url in video_data.get('thumbnail', {}).items():
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
'preference': q(thumbnail_id),
})
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'duration': int_or_none(try_get(video_data, lambda x: x['duration']['seconds'])),
'formats': formats,
'subtitles': subtitles,
'thumbnails': thumbnails,
'timestamp': unified_timestamp(video_data.get('created_at')),
'view_count': int_or_none(video_data.get('plays_count')),
}

View File

@@ -293,8 +293,12 @@ class VKIE(VKBaseIE):
# This video is no longer available, because its author has been blocked.
'url': 'https://vk.com/video-10639516_456240611',
'only_matching': True,
}
]
},
{
# The video is not available in your region.
'url': 'https://vk.com/video-51812607_171445436',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -354,6 +358,9 @@ class VKIE(VKBaseIE):
r'<!>This video is no longer available, because it has been deleted.':
'Video %s is no longer available, because it has been deleted.',
r'<!>The video .+? is not available in your region.':
'Video %s is not available in your region.',
}
for error_re, error_msg in ERRORS.items():

View File

@@ -90,7 +90,13 @@ class VRVIE(VRVBaseIE):
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
if not url or stream_format not in ('hls', 'dash'):
return []
stream_id = hardsub_lang or audio_lang
assert audio_lang or hardsub_lang
stream_id_list = []
if audio_lang:
stream_id_list.append('audio-%s' % audio_lang)
if hardsub_lang:
stream_id_list.append('hardsub-%s' % hardsub_lang)
stream_id = '-'.join(stream_id_list)
format_id = '%s-%s' % (stream_format, stream_id)
if stream_format == 'hls':
adaptive_formats = self._extract_m3u8_formats(

View File

@@ -4,15 +4,19 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
unified_timestamp,
url_or_none,
)
class VzaarIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|view)\.)?vzaar\.com/(?:videos/)?(?P<id>\d+)'
_TESTS = [{
# HTTP and HLS
'url': 'https://vzaar.com/videos/1152805',
'md5': 'bde5ddfeb104a6c56a93a06b04901dbf',
'info_dict': {
@@ -40,24 +44,48 @@ class VzaarIE(InfoExtractor):
video_id = self._match_id(url)
video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
source_url = video_data['sourceUrl']
info = {
title = video_data['videoTitle']
formats = []
source_url = url_or_none(video_data.get('sourceUrl'))
if source_url:
f = {
'url': source_url,
'format_id': 'http',
}
if 'audio' in source_url:
f.update({
'vcodec': 'none',
'ext': 'mp3',
})
else:
f.update({
'width': int_or_none(video_data.get('width')),
'height': int_or_none(video_data.get('height')),
'ext': 'mp4',
'fps': float_or_none(video_data.get('fps')),
})
formats.append(f)
video_guid = video_data.get('guid')
usp = video_data.get('usp')
if isinstance(video_guid, compat_str) and isinstance(usp, dict):
m3u8_url = ('http://fable.vzaar.com/v4/usp/%s/%s.ism/.m3u8?'
% (video_guid, video_id)) + '&'.join(
'%s=%s' % (k, v) for k, v in usp.items())
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_data['videoTitle'],
'url': source_url,
'title': title,
'thumbnail': self._proto_relative_url(video_data.get('poster')),
'duration': float_or_none(video_data.get('videoDuration')),
'timestamp': unified_timestamp(video_data.get('ts')),
'formats': formats,
}
if 'audio' in source_url:
info.update({
'vcodec': 'none',
'ext': 'mp3',
})
else:
info.update({
'width': int_or_none(video_data.get('width')),
'height': int_or_none(video_data.get('height')),
'ext': 'mp4',
})
return info

View File

@@ -12,7 +12,7 @@ from ..utils import (
class WistiaIE(InfoExtractor):
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.net/embed/iframe/)(?P<id>[a-z0-9]+)'
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/iframe/)(?P<id>[a-z0-9]+)'
_API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
_IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
@@ -35,12 +35,15 @@ class WistiaIE(InfoExtractor):
# with hls video
'url': 'wistia:807fafadvk',
'only_matching': True,
}, {
'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
match = re.search(
r'<(?:meta[^>]+?content|iframe[^>]+?src)=(["\'])(?P<url>(?:https?:)?//(?:fast\.)?wistia\.net/embed/iframe/.+?)\1', webpage)
r'<(?:meta[^>]+?content|iframe[^>]+?src)=(["\'])(?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/iframe/.+?)\1', webpage)
if match:
return unescapeHTML(match.group('url'))

140
youtube_dl/extractor/wwe.py Normal file
View File

@@ -0,0 +1,140 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
try_get,
unescapeHTML,
url_or_none,
urljoin,
)
class WWEBaseIE(InfoExtractor):
_SUBTITLE_LANGS = {
'English': 'en',
'Deutsch': 'de',
}
def _extract_entry(self, data, url, video_id=None):
video_id = compat_str(video_id or data['nid'])
title = data['title']
formats = self._extract_m3u8_formats(
data['file'], video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
description = data.get('description')
thumbnail = urljoin(url, data.get('image'))
series = data.get('show_name')
episode = data.get('episode_name')
subtitles = {}
tracks = data.get('tracks')
if isinstance(tracks, list):
for track in tracks:
if not isinstance(track, dict):
continue
if track.get('kind') != 'captions':
continue
track_file = url_or_none(track.get('file'))
if not track_file:
continue
label = track.get('label')
lang = self._SUBTITLE_LANGS.get(label, label) or 'en'
subtitles.setdefault(lang, []).append({
'url': track_file,
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'series': series,
'episode': episode,
'formats': formats,
'subtitles': subtitles,
}
class WWEIE(WWEBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?wwe\.com/(?:[^/]+/)*videos/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.wwe.com/videos/daniel-bryan-vs-andrade-cien-almas-smackdown-live-sept-4-2018',
'md5': '92811c6a14bfc206f7a6a9c5d9140184',
'info_dict': {
'id': '40048199',
'ext': 'mp4',
'title': 'Daniel Bryan vs. Andrade "Cien" Almas: SmackDown LIVE, Sept. 4, 2018',
'description': 'md5:2d7424dbc6755c61a0e649d2a8677f67',
'thumbnail': r're:^https?://.*\.jpg$',
}
}, {
'url': 'https://de.wwe.com/videos/gran-metalik-vs-tony-nese-wwe-205-live-sept-4-2018',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
landing = self._parse_json(
self._html_search_regex(
r'(?s)Drupal\.settings\s*,\s*({.+?})\s*\)\s*;',
webpage, 'drupal settings'),
display_id)['WWEVideoLanding']
data = landing['initialVideo']['playlist'][0]
video_id = landing.get('initialVideoId')
info = self._extract_entry(data, url, video_id)
info['display_id'] = display_id
return info
class WWEPlaylistIE(WWEBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?wwe\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.wwe.com/shows/raw/2018-11-12',
'info_dict': {
'id': '2018-11-12',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.wwe.com/article/walk-the-prank-wwe-edition',
'only_matching': True,
}, {
'url': 'https://www.wwe.com/shows/wwenxt/article/matt-riddle-interview',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if WWEIE.suitable(url) else super(WWEPlaylistIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
entries = []
for mobj in re.finditer(
r'data-video\s*=\s*(["\'])(?P<data>{.+?})\1', webpage):
video = self._parse_json(
mobj.group('data'), display_id, transform_source=unescapeHTML,
fatal=False)
if not video:
continue
data = try_get(video, lambda x: x['playlist'][0], dict)
if not data:
continue
try:
entry = self._extract_entry(data, url)
except Exception:
continue
entry['extractor_key'] = WWEIE.ie_key()
entries.append(entry)
return self.playlist_result(entries, display_id)

View File

@@ -45,7 +45,7 @@ class XVideosIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.xvideos.com/video%s/' % video_id, video_id)
'https://www.xvideos.com/video%s/' % video_id, video_id)
mobj = re.search(r'<h1 class="inlineError">(.+?)</h1>', webpage)
if mobj:

View File

@@ -26,7 +26,7 @@ class YourPornIE(InfoExtractor):
self._search_regex(
r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
group='data'),
video_id)[video_id])
video_id)[video_id]).replace('/cdn/', '/cdn2/')
title = (self._search_regex(
r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',

View File

@@ -41,6 +41,7 @@ from ..utils import (
remove_quotes,
remove_start,
smuggle_url,
str_or_none,
str_to_int,
try_get,
unescapeHTML,
@@ -259,7 +260,9 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return True
def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('query', {})['disable_polymer'] = 'true'
query = kwargs.get('query', {}).copy()
query['disable_polymer'] = 'true'
kwargs['query'] = query
return super(YoutubeBaseInfoExtractor, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
@@ -347,6 +350,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?hooktube\.com/|
(?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/|
(?:www\.)?invidio\.us/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID:
@@ -490,12 +494,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Philipp Hagemeister',
'uploader_id': 'phihag',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
'channel_id': 'UCLqxVugv74EIW3VWh2NOa3Q',
'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCLqxVugv74EIW3VWh2NOa3Q',
'upload_date': '20121002',
'license': 'Standard YouTube License',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'],
'tags': ['youtube-dl'],
'duration': 10,
'view_count': int,
'like_count': int,
'dislike_count': int,
'start_time': 1,
@@ -578,6 +585,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'categories': ['Science & Technology'],
'tags': ['youtube-dl'],
'duration': 10,
'view_count': int,
'like_count': int,
'dislike_count': int,
},
@@ -1064,6 +1072,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtube.com/watch?v=MuAGGZNfUkU&list=RDMM',
'only_matching': True,
},
{
'url': 'https://invidio.us/watch?v=BaW_jenozKc',
'only_matching': True,
},
]
def __init__(self, *args, **kwargs):
@@ -1180,7 +1192,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\('),
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\('),
jscode, 'Initial JS player signature function name', group='sig')
jsi = JSInterpreter(jscode)
@@ -1529,6 +1542,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def extract_view_count(v_info):
return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
player_response = {}
# Get video info
embed_webpage = None
if re.search(r'player-age-gate-content">', video_webpage) is not None:
@@ -1571,6 +1586,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if args.get('livestream') == '1' or args.get('live_playback') == 1:
is_live = True
sts = ytplayer_config.get('sts')
if not player_response:
pl_response = str_or_none(args.get('player_response'))
if pl_response:
pl_response = self._parse_json(pl_response, video_id, fatal=False)
if isinstance(pl_response, dict):
player_response = pl_response
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
# We also try looking in get_video_info since it may contain different dashmpd
# URL that points to a DASH manifest with possibly different itag set (some itags
@@ -1599,6 +1620,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if not video_info_webpage:
continue
get_video_info = compat_parse_qs(video_info_webpage)
if not player_response:
pl_response = get_video_info.get('player_response', [None])[0]
if isinstance(pl_response, dict):
player_response = pl_response
add_dash_mpd(get_video_info)
if view_count is None:
view_count = extract_view_count(get_video_info)
@@ -1644,9 +1669,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'"token" parameter not in video info for unknown reason',
video_id=video_id)
video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {}
# title
if 'title' in video_info:
video_title = video_info['title'][0]
elif 'title' in player_response:
video_title = video_details['title']
else:
self._downloader.report_warning('Unable to extract video title')
video_title = '_'
@@ -1709,6 +1739,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if view_count is None:
view_count = extract_view_count(video_info)
if view_count is None and video_details:
view_count = int_or_none(video_details.get('viewCount'))
# Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
@@ -1889,7 +1921,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
# uploader
video_uploader = try_get(video_info, lambda x: x['author'][0], compat_str)
video_uploader = try_get(
video_info, lambda x: x['author'][0],
compat_str) or str_or_none(video_details.get('author'))
if video_uploader:
video_uploader = compat_urllib_parse_unquote_plus(video_uploader)
else:
@@ -1907,6 +1941,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else:
self._downloader.report_warning('unable to extract uploader nickname')
channel_id = self._html_search_meta(
'channelId', video_webpage, 'channel id')
channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
# thumbnail image
# We try first to get a high quality image:
m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
@@ -1998,12 +2036,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
like_count = _extract_count('like')
dislike_count = _extract_count('dislike')
if view_count is None:
view_count = str_to_int(self._search_regex(
r'<[^>]+class=["\']watch-view-count[^>]+>\s*([\d,\s]+)', video_webpage,
'view count', default=None))
# subtitles
video_subtitles = self.extract_subtitles(video_id, video_webpage)
automatic_captions = self.extract_automatic_captions(video_id, video_webpage)
video_duration = try_get(
video_info, lambda x: int_or_none(x['length_seconds'][0]))
if not video_duration:
video_duration = int_or_none(video_details.get('lengthSeconds'))
if not video_duration:
video_duration = parse_duration(self._html_search_meta(
'duration', video_webpage, 'video duration'))
@@ -2078,6 +2123,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': video_uploader,
'uploader_id': video_uploader_id,
'uploader_url': video_uploader_url,
'channel_id': channel_id,
'channel_url': channel_url,
'upload_date': upload_date,
'license': video_license,
'creator': video_creator or artist,
@@ -2116,7 +2163,11 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
(?:https?://)?
(?:\w+\.)?
(?:
youtube\.com/
(?:
youtube\.com|
invidio\.us
)
/
(?:
(?:course|view_play_list|my_playlists|artist|playlist|watch|embed/(?:videoseries|[0-9A-Za-z_-]{11}))
\? (?:.*?[&;])*? (?:p|a|list)=
@@ -2229,6 +2280,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'description': 'md5:507cdcb5a49ac0da37a920ece610be80',
'categories': ['People & Blogs'],
'tags': list,
'view_count': int,
'like_count': int,
'dislike_count': int,
},
@@ -2267,6 +2319,9 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
# music album playlist
'url': 'OLAK5uy_m4xAFdmMC5rX3Ji3g93pQe3hqLZw_9LhM',
'only_matching': True,
}, {
'url': 'https://invidio.us/playlist?list=PLDIoUOhQQPlXr63I_vwF9GD8sAKh77dWU',
'only_matching': True,
}]
def _real_initialize(self):
@@ -2409,7 +2464,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
IE_DESC = 'YouTube.com channels'
_VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com)/channel/(?P<id>[0-9A-Za-z_-]+)'
_VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com|(?:www\.)?invidio\.us)/channel/(?P<id>[0-9A-Za-z_-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/channel/%s/videos'
_VIDEO_RE = r'(?:title="(?P<title>[^"]+)"[^>]+)?href="/watch\?v=(?P<id>[0-9A-Za-z_-]+)&?'
IE_NAME = 'youtube:channel'
@@ -2430,6 +2485,9 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
'id': 'UUs0ifCMCm1icqRbqhUINa0w',
'title': 'Uploads from Deus Ex',
},
}, {
'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
'only_matching': True,
}]
@classmethod

View File

@@ -18,12 +18,12 @@ from ..utils import (
)
class ZattooBaseIE(InfoExtractor):
_NETRC_MACHINE = 'zattoo'
_HOST_URL = 'https://zattoo.com'
class ZattooPlatformBaseIE(InfoExtractor):
_power_guide_hash = None
def _host_url(self):
return 'https://%s' % (self._API_HOST if hasattr(self, '_API_HOST') else self._HOST)
def _login(self):
username, password = self._get_login_info()
if not username or not password:
@@ -33,13 +33,13 @@ class ZattooBaseIE(InfoExtractor):
try:
data = self._download_json(
'%s/zapi/v2/account/login' % self._HOST_URL, None, 'Logging in',
'%s/zapi/v2/account/login' % self._host_url(), None, 'Logging in',
data=urlencode_postdata({
'login': username,
'password': password,
'remember': 'true',
}), headers={
'Referer': '%s/login' % self._HOST_URL,
'Referer': '%s/login' % self._host_url(),
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})
except ExtractorError as e:
@@ -53,7 +53,7 @@ class ZattooBaseIE(InfoExtractor):
def _real_initialize(self):
webpage = self._download_webpage(
self._HOST_URL, None, 'Downloading app token')
self._host_url(), None, 'Downloading app token')
app_token = self._html_search_regex(
r'appToken\s*=\s*(["\'])(?P<token>(?:(?!\1).)+?)\1',
webpage, 'app token', group='token')
@@ -62,7 +62,7 @@ class ZattooBaseIE(InfoExtractor):
# Will setup appropriate cookies
self._request_webpage(
'%s/zapi/v2/session/hello' % self._HOST_URL, None,
'%s/zapi/v2/session/hello' % self._host_url(), None,
'Opening session', data=urlencode_postdata({
'client_app_token': app_token,
'uuid': compat_str(uuid4()),
@@ -75,7 +75,7 @@ class ZattooBaseIE(InfoExtractor):
def _extract_cid(self, video_id, channel_name):
channel_groups = self._download_json(
'%s/zapi/v2/cached/channels/%s' % (self._HOST_URL,
'%s/zapi/v2/cached/channels/%s' % (self._host_url(),
self._power_guide_hash),
video_id, 'Downloading channel list',
query={'details': False})['channel_groups']
@@ -93,28 +93,30 @@ class ZattooBaseIE(InfoExtractor):
def _extract_cid_and_video_info(self, video_id):
data = self._download_json(
'%s/zapi/program/details' % self._HOST_URL,
'%s/zapi/v2/cached/program/power_details/%s' % (
self._host_url(), self._power_guide_hash),
video_id,
'Downloading video information',
query={
'program_id': video_id,
'complete': True
'program_ids': video_id,
'complete': True,
})
p = data['program']
p = data['programs'][0]
cid = p['cid']
info_dict = {
'id': video_id,
'title': p.get('title') or p['episode_title'],
'description': p.get('description'),
'thumbnail': p.get('image_url'),
'title': p.get('t') or p['et'],
'description': p.get('d'),
'thumbnail': p.get('i_url'),
'creator': p.get('channel_name'),
'episode': p.get('episode_title'),
'episode_number': int_or_none(p.get('episode_number')),
'season_number': int_or_none(p.get('season_number')),
'episode': p.get('et'),
'episode_number': int_or_none(p.get('e_no')),
'season_number': int_or_none(p.get('s_no')),
'release_year': int_or_none(p.get('year')),
'categories': try_get(p, lambda x: x['categories'], list),
'categories': try_get(p, lambda x: x['c'], list),
'tags': try_get(p, lambda x: x['g'], list)
}
return cid, info_dict
@@ -126,11 +128,11 @@ class ZattooBaseIE(InfoExtractor):
if is_live:
postdata_common.update({'timeshift': 10800})
url = '%s/zapi/watch/live/%s' % (self._HOST_URL, cid)
url = '%s/zapi/watch/live/%s' % (self._host_url(), cid)
elif record_id:
url = '%s/zapi/watch/recording/%s' % (self._HOST_URL, record_id)
url = '%s/zapi/watch/recording/%s' % (self._host_url(), record_id)
else:
url = '%s/zapi/watch/recall/%s/%s' % (self._HOST_URL, cid, video_id)
url = '%s/zapi/watch/recall/%s/%s' % (self._host_url(), cid, video_id)
formats = []
for stream_type in ('dash', 'hls', 'hls5', 'hds'):
@@ -201,13 +203,13 @@ class ZattooBaseIE(InfoExtractor):
return info_dict
class QuicklineBaseIE(ZattooBaseIE):
class QuicklineBaseIE(ZattooPlatformBaseIE):
_NETRC_MACHINE = 'quickline'
_HOST_URL = 'https://mobiltv.quickline.com'
_HOST = 'mobiltv.quickline.com'
class QuicklineIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?%s/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)' % re.escape(QuicklineBaseIE._HOST)
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste',
@@ -220,7 +222,7 @@ class QuicklineIE(QuicklineBaseIE):
class QuicklineLiveIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?%s/watch/(?P<id>[^/]+)' % re.escape(QuicklineBaseIE._HOST)
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/srf1',
@@ -236,8 +238,18 @@ class QuicklineLiveIE(QuicklineBaseIE):
return self._extract_video(channel_name, video_id, is_live=True)
class ZattooBaseIE(ZattooPlatformBaseIE):
_NETRC_MACHINE = 'zattoo'
_HOST = 'zattoo.com'
def _make_valid_url(tmpl, host):
return tmpl % re.escape(host)
class ZattooIE(ZattooBaseIE):
_VALID_URL = r'https?://(?:www\.)?zattoo\.com/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'
_VALID_URL_TEMPLATE = r'https?://(?:www\.)?%s/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'
_VALID_URL = _make_valid_url(_VALID_URL_TEMPLATE, ZattooBaseIE._HOST)
# Since regular videos are only available for 7 days and recorded videos
# are only available for a specific user, we cannot have detailed tests.
@@ -269,3 +281,142 @@ class ZattooLiveIE(ZattooBaseIE):
def _real_extract(self, url):
channel_name = video_id = self._match_id(url)
return self._extract_video(channel_name, video_id, is_live=True)
class NetPlusIE(ZattooIE):
_NETRC_MACHINE = 'netplus'
_HOST = 'netplus.tv'
_API_HOST = 'www.%s' % _HOST
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.netplus.tv/watch/abc/123-abc',
'only_matching': True,
}]
class MNetTVIE(ZattooIE):
_NETRC_MACHINE = 'mnettv'
_HOST = 'tvplus.m-net.de'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://tvplus.m-net.de/watch/abc/123-abc',
'only_matching': True,
}]
class WalyTVIE(ZattooIE):
_NETRC_MACHINE = 'walytv'
_HOST = 'player.waly.tv'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://player.waly.tv/watch/abc/123-abc',
'only_matching': True,
}]
class BBVTVIE(ZattooIE):
_NETRC_MACHINE = 'bbvtv'
_HOST = 'bbv-tv.net'
_API_HOST = 'www.%s' % _HOST
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.bbv-tv.net/watch/abc/123-abc',
'only_matching': True,
}]
class VTXTVIE(ZattooIE):
_NETRC_MACHINE = 'vtxtv'
_HOST = 'vtxtv.ch'
_API_HOST = 'www.%s' % _HOST
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.vtxtv.ch/watch/abc/123-abc',
'only_matching': True,
}]
class MyVisionTVIE(ZattooIE):
_NETRC_MACHINE = 'myvisiontv'
_HOST = 'myvisiontv.ch'
_API_HOST = 'www.%s' % _HOST
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.myvisiontv.ch/watch/abc/123-abc',
'only_matching': True,
}]
class GlattvisionTVIE(ZattooIE):
_NETRC_MACHINE = 'glattvisiontv'
_HOST = 'iptv.glattvision.ch'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://iptv.glattvision.ch/watch/abc/123-abc',
'only_matching': True,
}]
class SAKTVIE(ZattooIE):
_NETRC_MACHINE = 'saktv'
_HOST = 'saktv.ch'
_API_HOST = 'www.%s' % _HOST
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.saktv.ch/watch/abc/123-abc',
'only_matching': True,
}]
class EWETVIE(ZattooIE):
_NETRC_MACHINE = 'ewetv'
_HOST = 'tvonline.ewe.de'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://tvonline.ewe.de/watch/abc/123-abc',
'only_matching': True,
}]
class QuantumTVIE(ZattooIE):
_NETRC_MACHINE = 'quantumtv'
_HOST = 'quantum-tv.com'
_API_HOST = 'www.%s' % _HOST
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.quantum-tv.com/watch/abc/123-abc',
'only_matching': True,
}]
class OsnatelTVIE(ZattooIE):
_NETRC_MACHINE = 'osnateltv'
_HOST = 'tvonline.osnatel.de'
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://tvonline.osnatel.de/watch/abc/123-abc',
'only_matching': True,
}]
class EinsUndEinsTVIE(ZattooIE):
_NETRC_MACHINE = '1und1tv'
_HOST = '1und1.tv'
_API_HOST = 'www.%s' % _HOST
_VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
_TESTS = [{
'url': 'https://www.1und1.tv/watch/abc/123-abc',
'only_matching': True,
}]

Some files were not shown because too many files have changed in this diff Show More