Compare commits

..

273 Commits

Author SHA1 Message Date
Sergey M․
ab896fa894 release 2018.12.03 2018-12-03 00:10:20 +07:00
Sergey M․
1fa59a928e [ChangeLog] Actualize
[ci skip]
2018-12-03 00:06:54 +07:00
Sergey M․
ce18a19be9 [tiktok] Improve extraction and add support for user pages (closes #18135) 2018-12-02 02:42:56 +07:00
Ken Swenson
1ead840d2c [tiktok] Add extractor (closes #18108) 2018-12-02 02:42:56 +07:00
Alexander Seiler
aa374bc78e [utils] Fix random_birthday to generate existing dates only 2018-12-02 00:05:15 +07:00
Sergey M․
3430ff9b07 [pornhub] Use actual URL host for requests (closes #18359) 2018-12-01 16:45:51 +07:00
Hakim Boyles
f012823082 [lynda] Fix authentication (closes #18158) 2018-11-30 01:20:27 +07:00
Jimm Stout
16597c2f94 [gfycat] Update API endpoint (closes #18333) 2018-11-30 01:07:07 +07:00
Sergey M․
adbbdefc81 [hotstar] Add support for alternative app state layout (closes #18320) 2018-11-30 00:48:15 +07:00
Alexander Seiler
053e5b12b2 [azmedien] Fix extraction (closes #18334) 2018-11-30 00:12:18 +07:00
Remita Amine
d9df8f120b [vimeo] extract VHX subtitles 2018-11-28 20:13:36 +01:00
Remita Amine
ca01d17884 [vimeo] Add support for VHX(Vimeo OTT)(#14835) 2018-11-28 19:53:45 +01:00
Alexander Seiler
d19600df07 [joj] Fix extraction (closes #18280) 2018-11-24 22:14:27 +07:00
Sergey M․
641e86e3cf [wistia] Add support for fast.wistia.com (closes #18287) 2018-11-24 21:47:41 +07:00
Sergey M․
6864855eb1 [tests] Fix invalid escape sequences 2018-11-23 00:43:42 +07:00
Sergey M․
d861a9d581 release 2018.11.23 2018-11-23 00:16:45 +07:00
Sergey M․
66173211c4 [ChangeLog] Actualize
[ci skip]
2018-11-23 00:14:43 +07:00
Remita Amine
6f2883a2df [mixcloud] base64 decode before decryption 2018-11-21 23:25:38 +01:00
Remita Amine
560020da30 [mixcloud] fallback to hardcoded decryption key(closes #18016) 2018-11-21 23:21:05 +01:00
Sergey M․
305ce767d5 [travis] Add python 3.8-dev build 2018-11-22 02:34:35 +07:00
Sergey M․
157eef3e63 [setup.py] Add python 3.8 classifier 2018-11-22 02:08:41 +07:00
Sergey M․
bd2d553c7b [travis] Add python 3.7 build 2018-11-22 02:01:39 +07:00
Sergey M․
af60e81e3c [setup.py] Add more relevant classifiers 2018-11-22 02:01:39 +07:00
Remita Amine
a843464a7e [nbc] fix NBCNews article extraction(closes #16194) 2018-11-21 12:10:06 +01:00
Remita Amine
6866f24494 [foxsports] update test 2018-11-21 12:08:46 +01:00
Remita Amine
4e33e0792a [loc] update test 2018-11-21 12:00:50 +01:00
Remita Amine
35328915b5 [foxsports] fix extraction(closes #17543) 2018-11-21 09:46:36 +01:00
Remita Amine
6c882aa899 [loc] relax _VALID_URL regex and improve formats extraction 2018-11-21 09:46:36 +01:00
Sergey M․
183417a50f [ciscolive:search] Add support for pagination 2018-11-21 06:10:43 +07:00
Sergey M․
6a6d7f0641 [ciscolive] Fix issues and improve extraction (closes #17984) 2018-11-21 06:10:39 +07:00
Austin de Coup-Crank
05bd5e9c77 [ciscolive] Add extractor 2018-11-21 06:10:30 +07:00
Alexander Seiler
15ed5a2784 [nzz] Relax kaltura regex 2018-11-21 02:50:40 +07:00
Remita Amine
2e1280ed43 [sixplay] fix format extraction 2018-11-19 18:15:51 +01:00
Remita Amine
8578ea4dcb [bitchute] use _html_search_regex for title extraction 2018-11-18 16:15:27 +01:00
Remita Amine
9b27a78a88 [kaltura] limit requested MediaEntry fields 2018-11-18 16:15:27 +01:00
Sergey M․
964b989dc8 [americastestkitchen] Add support for zype embeds (closes #18225) 2018-11-18 20:45:25 +07:00
Sergey M․
f97c099131 [pornhub] Move test to correct place 2018-11-18 11:14:46 +07:00
Sergey M․
1febf99da1 [pornhub] Add pornhub.net alias 2018-11-18 06:26:08 +07:00
Sergey M․
4167148fa4 [nova:embed] Fix extraction (closes #18222) 2018-11-18 01:11:10 +07:00
Sergey M․
5bb0479269 release 2018.11.18 2018-11-18 00:11:54 +07:00
Sergey M․
02df855e13 [ChangeLog] Actualize
[ci skip]
2018-11-18 00:07:40 +07:00
Sergey M․
006374e3ae [wwe] Fix issues, extract subtitles and add support for playlists (closes #14781, closes #17450) 2018-11-17 23:59:52 +07:00
mttronc
11d19ff503 [wwe] Add extractor 2018-11-17 23:59:46 +07:00
aviperes
a640c4d226 [vk] Detect geo restriction 2018-11-17 20:59:13 +07:00
Sergey M․
d0058c76d5 [openload] Use original host during extraction (closes #18211) 2018-11-17 16:59:20 +07:00
NeroBurner
0919cd4d01 [atvat] Fix extraction (closes #18041) 2018-11-17 00:18:50 +07:00
Sergey M․
2599956c9f [rte] Add support for new API endpoint (closes #18206) 2018-11-17 00:07:59 +07:00
Sergey M․
9b9b3501c5 [tnaflixnetwork:embed] Fix extraction (closes #18205) 2018-11-16 22:55:35 +07:00
Sergey M․
730c0d12a0 [picarto] Extract more metadata (closes #16518) 2018-11-11 16:17:59 +07:00
Patrick Griffis
f17a24a6df [picarto] Use API and add token support
This is just more reliable than trying to extract
it from the page itself.
2018-11-11 16:17:56 +07:00
Sergey M․
83852e57bf [zype] Add extractor (closes #18143) 2018-11-11 00:44:49 +07:00
Sergey M․
96a91b1551 [vivo] Fix extraction (closes #18139) 2018-11-10 23:37:27 +07:00
Sergey M․
cab26223bf [ruutu] Update API endpoint (closes #18138) 2018-11-10 15:26:04 +07:00
Sergey M․
532782ade1 release 2018.11.07 2018-11-07 01:38:25 +07:00
Sergey M․
f81d44aab6 [ChangeLog] Actualize
[ci skip]
2018-11-07 09:58:08 +07:00
Sergey M․
2511eee215 [youtube] Add another JS signature function name regex (closes #18091, closes #18093, closes #18094) 2018-11-07 09:55:59 +07:00
Remita Amine
0df514f07e [facebook] fix tahoe request(closes #17171) 2018-11-06 21:22:27 +01:00
Sergey M․
432cd48410 [cliphinter] Fix extraction (closes #18083) 2018-11-06 23:29:42 +07:00
Sergey M․
c0345b825f [youtube:playlist] Add support for invidio.us (closes #18077) 2018-11-05 19:08:39 +07:00
Sergey M․
2004e2210b [osnateltv] Update host 2018-11-05 17:09:57 +07:00
Sergey M․
16d896b2a7 [zattoo] Arrange API hosts for derived extractors (closes #18035) 2018-11-05 15:52:46 +07:00
Sergey M․
22e07ce502 [README.md] Improve documentation on safe metadata extraction and add more examples 2018-11-05 00:11:36 +07:00
Sergey M․
dbdaaa231a [youtube] Add fallback metadata extraction from videoDetails (closes #18052) 2018-11-03 06:26:16 +07:00
Sergey M․
38c32dbf19 release 2018.11.03 2018-11-03 02:57:48 +07:00
Sergey M․
a085410936 [ChangeLog] Actualize
[ci skip]
2018-11-03 02:56:14 +07:00
Sergey M․
6895ea4d3f [laola1tv:embed] Set correct stream access URL scheme (closes #16341) 2018-11-03 02:45:37 +07:00
Alexander Seiler
faac1c1f70 [ehftv] Add extractor (closes #15408) 2018-11-03 02:45:32 +07:00
Sergey M․
573531dcfb [azmedien] Simplify (closes #17746) 2018-11-03 01:33:42 +07:00
Alexander Seiler
da56fb631f [azmedien] Adopt to major site redesign (closes #17745) 2018-11-03 01:33:36 +07:00
Xiao Di Guan
95e42d7336 [extractor/common] Ensure response handle is not prematurely closed before it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447) 2018-11-03 01:18:20 +07:00
Sergey M․
cf0db4d997 [twitcasting] Improve extraction and fix issues (closes #17981) 2018-11-03 00:28:25 +07:00
sichuan-pepper
036f905161 [twitcasting] Add extractor 2018-11-03 00:28:21 +07:00
Sergey M․
4b6aca17cc [orf:tvthek] Improve extraction and remove unused code (closes #17956, closes #18024) 2018-11-02 23:47:23 +07:00
Sebastian Haas
c620694c97 [orf:tvthek] Fix extraction (closes #17737)
use _extract_m3u8_formats and _extract_f4m_formats helper functions
closes #17737
2018-11-02 23:47:17 +07:00
yonaikerlol
061ea3a776 [openload] Add support for oload.fun 2018-11-02 23:08:41 +07:00
Sergey M․
c70ba664f1 [njpwworld] Fix authentication (closes #17427) 2018-11-01 01:40:02 +07:00
Sergey M․
f16679e843 [cnbc:video] Fix _VALID_URL (#17110) 2018-10-30 04:57:51 +07:00
Remita Amine
b14475724b [linkedin:learning:course] use url_transparent type for playlist entries 2018-10-29 21:49:12 +01:00
Remita Amine
aa7e974a2a [linkedin:learning] Add new extractor(closes #13545) 2018-10-29 19:28:29 +01:00
Sergey M․
9aac22c195 [theplatform] Improve error detection (#13222) 2018-10-30 00:22:44 +07:00
Sergey M․
94db1f7f3b [cnbc] Simplify extraction (closes #14280, closes #17110) 2018-10-29 23:55:55 +07:00
gfabiano
ffa7b2bfee [cbnc] Add support for new URL schema (closes #14193) 2018-10-29 23:54:14 +07:00
Sergey M․
2943397e87 [aparat] Improve extraction and extract more metadata (closes #17445, closes #18008) 2018-10-29 23:32:45 +07:00
Ali Irani
9c4a83a1be [aparat] Fix extraction 2018-10-29 23:31:24 +07:00
Sergey M․
9ff558f67f release 2018.10.29 2018-10-29 00:39:29 +07:00
Sergey M․
c2fe21efaa [ChangeLog] Actualize
[ci skip]
2018-10-29 00:38:06 +07:00
Sergey M․
476cf548e1 [sportbox] Improve extraction, add support for matchtv.ru and fix video id (closes #17978) 2018-10-29 00:21:50 +07:00
Sergey M․
bebef10909 [extractor/common] Add validation for JSON-LD URLs 2018-10-29 00:21:45 +07:00
Alexey Trofimov
4c237ab787 [sportbox] Fix extraction 2018-10-29 00:21:40 +07:00
Sergey M․
a1d1c63678 [screencast] Improve extraction (closes #14617, closes #17990) 2018-10-28 23:26:30 +07:00
sichuan-pepper
1fafb32984 [screencast] Fix extraction (closes #14590) 2018-10-28 23:26:30 +07:00
yonaikerlol
c901cc38e5 [openload] Add support for oload.icu 2018-10-28 22:51:29 +07:00
Sergey M․
022218f2f0 [ivi] Add support for ivi.tv 2018-10-28 22:49:10 +07:00
Sergey M․
08c7d3dade [crunchyroll] Improve extraction failsafeness (closes #17991) 2018-10-28 22:12:54 +07:00
Remita Amine
5e733b066a [dailymail] fix format extraction(closes #17976) 2018-10-26 05:41:57 +01:00
Remita Amine
7d9e858132 [viewster] reduce format requests 2018-10-26 05:40:49 +01:00
Remita Amine
b99b0bcfa0 [cwtv] handle api errors(closes #17905) 2018-10-17 06:22:56 +01:00
Sergey M․
baeabf7742 [rutube] Use geo verification headers (closes #17897) 2018-10-16 23:19:44 +07:00
Remita Amine
582797d780 [brightcove] remove unused variable 2018-10-15 20:47:12 +01:00
Remita Amine
160c2773f6 [brightcove:legacy] add another fall back to brightcove:new 2018-10-15 18:41:57 +01:00
Remita Amine
ee5fe42e44 [brightcove:legacy] fall back to brightcove:new(#13912) 2018-10-15 17:54:38 +01:00
Remita Amine
f0ee386851 [tv3] remove extractor(closes #10461)(closes #15339) 2018-10-15 16:26:29 +01:00
Remita Amine
a94e7c195e [ted] fix extraction for http and rtmp formats(closes #5941)(closes #17572)(closes #17894) 2018-10-15 11:52:01 +01:00
yonaikerlol
5d90a8a5f3 [openload] Add support for oload.cc 2018-10-07 20:05:45 +07:00
Remita Amine
19a352854f [patreon] extract post_file url(#17792) 2018-10-05 22:45:04 +01:00
Remita Amine
c9d891f19a [patreon] fix extraction(closes #14502)(closes #10471) 2018-10-05 20:11:01 +01:00
Sergey M․
d96f976b0c release 2018.10.05 2018-10-05 02:31:30 +07:00
Sergey M․
2e7ed29e34 [ChangeLog] Actualize
[ci skip]
2018-10-05 02:29:52 +07:00
Sergey M․
21c1a00dd7 [pluralsight] Improve authentication (closes #17762) 2018-10-05 02:27:14 +07:00
Sergey M․
0082f44a08 [dailymotion] Improve metadata extraction (closes #17706) 2018-10-05 02:03:25 +07:00
Enes
f60b9803a4 [dailymotion] Fix extraction (closes #17699) 2018-10-05 02:03:16 +07:00
Remita Amine
d98cb62e55 [crunchyroll] switch to HTTPS for RpcApi(closes #17749) 2018-10-02 19:43:06 +01:00
Remita Amine
05e7c184da [hotstar] fix extraction in python 2(closes #17696) 2018-10-02 06:07:43 +01:00
Sergey M․
66d106f270 [philharmoniedeparis] Fix extraction and add support for pad.philharmoniedeparis.fr (closes #17705) 2018-10-01 23:29:49 +07:00
Sergey M․
3c7da54c92 [jamendo] Add support for licensing.jamendo.com (closes #17724) 2018-10-01 22:05:18 +07:00
Enes
9795d93316 [openload] Add support for oload.cloud (closes #17710) 2018-10-01 21:48:59 +07:00
Sergey M․
365343131d [pluralsight] Fix subtitles extraction (closes #17726, closes #17728) 2018-10-01 21:45:24 +07:00
Sergey M․
85fa80d5f9 [vimeo] Add another config regex (closes #17690) 2018-10-01 21:13:43 +07:00
Remita Amine
245cbb33bc [spike] fix Paramount Network extraction(closes #17677) 2018-09-28 15:13:25 +01:00
Remita Amine
85cd69adcb [hotstar] fix extraction(closes #14694)(closes #14931)(closes #17637) 2018-09-26 08:13:36 +01:00
Sergey M․
4c89a675dd release 2018.09.26 2018-09-26 11:58:25 +07:00
Sergey M․
3d3499742c [ChangeLog] Actualize
[ci skip]
2018-09-26 11:56:15 +07:00
Sergey M․
c17e100b96 [pluralsight] Fix subtitles extraction (closes #17671) 2018-09-26 09:27:40 +07:00
Sergey M․
8fd12a0831 [mediaset] Improve embed support (closes #17668) 2018-09-26 05:38:41 +07:00
Sergey M․
60ce0c67fd [README.md] Document channel meta fields for output template 2018-09-25 23:43:41 +07:00
Sergey M․
cd5a74a28e [youtube] Add support for invidio.us (closes #17613) 2018-09-24 00:14:49 +07:00
Sergey M․
f6d7f7b474 [zattoo] Add support for more zattoo platform sites 2018-09-23 23:35:37 +07:00
Alexander Seiler
21160a1792 [zattoo] Fix extraction (closes #17175) 2018-09-23 21:34:47 +07:00
Sergey M․
4ac73fc170 [popcorntv] Remove debug output 2018-09-19 22:16:43 +07:00
Leonardo Taccari
28fcb7b061 [raiplay:playlist] Remove a debug leftover print() 2018-09-19 09:48:39 +07:00
Sergey M․
3a9c928426 release 2018.09.18 2018-09-18 01:46:36 +07:00
Sergey M․
d9b1cec171 [ChangeLog] Actualize
[ci skip]
2018-09-18 01:44:55 +07:00
Sergey M․
e504b09070 [adobepass] Don't pollute default headers dict 2018-09-17 22:15:27 +07:00
Sergey M․
c8f6ab8c38 [udemy] Don't pollute default headers dict 2018-09-17 22:14:53 +07:00
Sergey M․
e2f61598be [twitch] Don't pollute default headers dict 2018-09-17 22:14:28 +07:00
Sergey M․
c11485162b [youtube] Don't pollute default query dict (closes #17593) 2018-09-17 22:13:39 +07:00
Sergey M․
1084563eaa [crunchyroll] Prefer hardsubless formats and formats in locale language 2018-09-16 23:54:58 +07:00
Sergey M․
d9b0d118ad [vrv] Make format ids deterministic 2018-09-16 23:54:58 +07:00
Remita Amine
8b40c92724 [vimeo] redirect to feature url only in the case of a trailer(closes #14591) 2018-09-15 06:31:15 +01:00
Sergey M․
3661ebf2b6 [pornhub] Extract upload date (closes #17574) 2018-09-15 02:04:43 +07:00
Sergey M․
0e7b8d3eac [extractor/common] Fix typos 2018-09-15 01:53:01 +07:00
Sergey M․
127103b643 [porntube] Extract channel meta fields 2018-09-15 01:25:32 +07:00
Sergey M․
d03beddf0f [vimeo] Extract channel meta fields 2018-09-15 01:24:48 +07:00
Sergey M․
dd4c449219 [youtube] Extract channel meta fields (closes #9676, closes #12939) 2018-09-15 01:24:26 +07:00
Sergey M․
6f1f59f39c [extractor/common] Introduce channel meta fields 2018-09-15 01:23:36 +07:00
Sergey M․
15bf2ca0da [porntube] Fix extraction (closes #17541) 2018-09-15 00:50:36 +07:00
Sergey M․
0f2aa0dcaa [asiancrush] Fix extraction (closes #15630) 2018-09-14 23:56:03 +07:00
Sergey M․
db348e8849 [twitch:clips] Extend _VALID_URL (closes #17559) 2018-09-14 23:26:38 +07:00
Sergey M․
f5b0175349 [vzaar] Add support for HLS 2018-09-11 02:41:05 +07:00
Sergey M․
79facb2773 [tube8] Fix metadata extraction (closes #17520) 2018-09-11 02:32:01 +07:00
Sergey M․
96dbf70de6 [eporner] Extract JSON-LD (closes #17519) 2018-09-11 02:32:01 +07:00
Sergey M․
8476b4fd91 release 2018.09.10 2018-09-10 02:48:37 +07:00
Sergey M․
14f577e31c [ChangeLog] Actualize
[ci skip]
2018-09-10 02:45:44 +07:00
Sergey M․
25d110be30 [utils] Properly recognize AV1 codec (closes #17506) 2018-09-10 02:37:22 +07:00
Sergey M․
a2637a2dda [iprima] Add support for prima.iprima.cz (closes #17514) 2018-09-10 01:34:49 +07:00
Sergey M․
2e4350eec6 [generic] PEP 8 (#17455) 2018-09-08 17:24:59 +07:00
Sergey M․
2c9d3b9962 [seznamzpravy] PEP 8 (#17455) 2018-09-08 17:24:48 +07:00
Sergey M․
13ef64fd93 [motherless] PEP 8 (#17455) 2018-09-08 17:24:34 +07:00
Sergey M․
6f9f3340bb [dtube] PEP 8 (#17455) 2018-09-08 17:24:09 +07:00
Sergey M․
ae2384ff5f [tele5] Improve extraction (closes #7805, closes #7922, closes #17331, closes #17414) 2018-09-08 16:08:48 +07:00
Jens Rutschmann
d0de6a287a [tele5] Add extractor 2018-09-08 16:08:44 +07:00
Timendum
d0c5fabc12 [nbc] Fix extraction of percent encoded URLs (closes #17374) 2018-09-08 14:44:06 +07:00
Sergey M․
ad98d2eb74 release 2018.09.08 2018-09-08 03:42:28 +07:00
Sergey M․
a41a506077 [ChangeLog] Actualize
[ci skip]
2018-09-08 03:40:06 +07:00
Sergey M․
9a47fa35dd [youtube] Fix extraction (closes #17457, closes #17464) 2018-09-08 03:36:10 +07:00
Sergey M․
2d4fe594c6 [pornhub:uservideos] Add support for new URLs (closes #17388) 2018-09-06 00:51:20 +07:00
Sergey M․
09322cccdb [iprima] Confirm adult check (closes #17437) 2018-09-06 00:22:53 +07:00
Remita Amine
aa1d5eb905 [slideslive] make the check for video_service_name case-insensitive(closes #17429) 2018-09-04 10:37:51 +01:00
Sergey M․
93284ff2ea [radiojavan] Improve extraction (closes #17151) 2018-09-03 02:53:51 +07:00
Hormoz K
0a9a8118ce [radiojavan] Fix extraction 2018-09-03 02:53:41 +07:00
Mohammed Yaseen Mowzer
3d08f63dc5 [generic] Skip unsuccessful jwplayer extraction (closes #16735) 2018-09-03 02:15:17 +07:00
Sergey M․
27d8e089a2 release 2018.09.01 2018-09-01 18:40:23 +07:00
Sergey M․
7bbc1b189a [ChangeLog] Actualize
[ci skip]
2018-09-01 18:36:18 +07:00
LangerJan
0b87e88453 [ard] Add support for one.ard.de 2018-09-01 16:42:30 +07:00
Gorfiend
4d59db5b90 [niconico] Fix extraction on python3 (closes #17393) 2018-09-01 16:04:45 +07:00
Remita Amine
4627995882 [crunchyroll] limit VRVIE inheritance to CrunchyrollIE 2018-09-01 10:04:10 +01:00
Remita Amine
7f2611cb5b [ard] extract f4m formats 2018-09-01 08:40:38 +01:00
Remita Amine
54a5be4dba [crunchyroll] parse vilos media data(closes #17343) 2018-09-01 08:16:41 +01:00
Philipp Hagemeister
ed6919e737 [ard] beta mediathek: make regexp for JSON more robust 2018-09-01 01:59:13 +02:00
Philipp Hagemeister
2b83da2463 [ard] Better format handling
Skip f4m, doesn't work (yet); correctly extract m3u8, and prefer plain HTTP files.
2018-09-01 00:45:36 +02:00
Philipp Hagemeister
c1a37eb24a [ard] Add support for Beta ARD Mediathek
Thanks to https://blog.fefe.de/?ts=a577685d for pointing out support is missing.
2018-09-01 00:18:17 +02:00
Sergey M․
4991e16c2a [bandcamp] Extract more metadata (closes #13197) 2018-08-31 03:35:55 +07:00
Parmjit Virk
14b7a24c19 [bandcamp] Extract track_number (closes #17266) 2018-08-31 02:32:35 +07:00
Leonardo Taccari
73f3bdbeb4 [internazionale] Fix extraction of non-available-abroad videos 2018-08-31 02:15:46 +07:00
Sergey M․
9e21e6d96b [utils] Improve remote address skipping and add support for python 2.6 (closes #17362) 2018-08-29 01:18:03 +07:00
Andrew Udvare
8959018a5f [utils] Skip remote IP addresses non matching to source address' IP version (closes #13422) 2018-08-29 01:17:53 +07:00
Sergey M․
eebbce5656 release 2018.08.28 2018-08-28 03:10:09 +07:00
Sergey M․
56213aff1d [ChangeLog] Actualize
[ci skip]
2018-08-28 03:07:18 +07:00
Sergey M․
409b9324da [youtube:playlist] Add support for music album playlists (OLAK5uy_ prefix) (closes #17361) 2018-08-28 02:20:32 +07:00
Andrew Udvare
02df41354c [bitchute] Fix extraction by pass custom User-Agent 2018-08-27 22:04:56 +07:00
Sergey M․
dd88fd65a5 [webofstories:playlist] Fix extraction (closes #16914) 2018-08-26 21:42:30 +07:00
Sergey M․
287cf7e443 [generic] Remove unused import 2018-08-26 20:09:02 +07:00
Sergey M․
dac6f7654a [tvplayhome] Add extractor (closes #17344) 2018-08-26 20:08:55 +07:00
Sergey M․
e0b6e98871 [generic] Allow relative src for videojs embeds (closes #17324) 2018-08-24 23:12:53 +07:00
Sergey M․
beff09505c [xfileshare] Add support for vidto.se (closes #17317) 2018-08-24 04:00:35 +07:00
Sergey M․
135e6a1c10 [vidzi] Add support for vidzi.nu (closes #17316) 2018-08-24 02:36:56 +07:00
Sergey M․
c707d2067d [nova:embed] Add extractor (closes #17282) 2018-08-22 23:18:07 +07:00
Sergey M․
4c86163b60 release 2018.08.22 2018-08-22 02:32:18 +07:00
Sergey M․
b662273989 [ChangeLog] Actualize
[ci skip]
2018-08-22 02:28:25 +07:00
Sergey M․
df4d817bc3 [kinopoisk] Add extractor (closes #17283) 2018-08-22 02:19:30 +07:00
Sergey M․
db192b2932 [yourporn] Add extractor (closes #17298) 2018-08-22 01:44:22 +07:00
Sergey M․
52007de8ca [go] Add support for disneynow.go.com (closes #16299, closes #17264) 2018-08-22 01:14:47 +07:00
hmlinaric
28f96cf407 [6play] Add support for play.rtl.hr 2018-08-22 00:06:27 +07:00
Sergey M․
eda86b4335 [anvato] Fallback to generic API key for access key to API key lookup (closes #16788, closes #17254) 2018-08-21 23:45:18 +07:00
Sergey M․
bf1245d236 [lci] Fix extraction (closes #17274) 2018-08-20 02:15:48 +07:00
Sergey M․
6f356cbbcf [bbccouk] Extend _ID_REGEX (closes #17270) 2018-08-20 02:05:07 +07:00
Remita Amine
0a74b45191 [cwtv] fix extraction(closes #17256) 2018-08-17 11:59:49 +01:00
Sergey M․
d6ef8b4dd4 [nova] Fix extraction (closes #17241) 2018-08-16 00:11:41 +07:00
Sergey M․
60c0856223 [utils] Use pure browser header for User-Agent (closes #17236) 2018-08-14 23:27:12 +07:00
Sergey M․
57c68ec4c3 [generic] Add support for expressen embeds 2018-08-14 22:51:44 +07:00
Sergey M․
24e0cd709f [raywenderlich] Adapt to site redesign (closes #17225) 2018-08-13 00:15:59 +07:00
Remita Amine
4779420ce8 [redbulltv] add support redbull.com tv URLs(closes #17218) 2018-08-12 05:31:34 +01:00
Sergey M․
de4c41b437 [bitchute] Improve page offset 2018-08-12 01:52:50 +07:00
Sergey M․
b65e3b0636 [bitchute] Add extractor (closes #14052) 2018-08-12 01:47:10 +07:00
Sergey M․
d37dc6e1c9 [clyp] Add support for token protected media (closes #17184) 2018-08-07 23:27:08 +07:00
Sergey M․
a62460aa21 [imdb] Fix extension extraction (closes #17167) 2018-08-06 04:37:03 +07:00
Stanny Nuytkens
d588d4a5a6 [.gitignore] Add .vscode 2018-08-05 22:10:01 +07:00
Sergey M․
81cc22bab6 release 2018.08.04 2018-08-04 01:23:24 +07:00
Sergey M․
20f96f64bd [ChangeLog] Actualize
[ci skip]
2018-08-04 01:21:23 +07:00
Sergey M․
af322eb830 [funk:channel] Improve byChannelAlias extraction (closes #17142) 2018-08-04 00:26:58 +07:00
Sergey M․
cb1c3a3c07 [twitch] Update cliend id and modernize (closes #17126) 2018-08-03 22:44:31 +07:00
Tim Broder
48afc6ca3e [twitch] Fix authentication (closes #17024) 2018-08-03 22:43:23 +07:00
Sergey M․
644921b372 [twitch:vod] Improve _VALID_URL (closes #17135) 2018-08-02 23:16:15 +07:00
Sergey M․
19b9de13c4 [watchbox] Fix extraction (closes #17107) 2018-07-30 23:28:44 +07:00
Sergey M․
6f2d82a5a0 [pbs] Fix extraction (closes #17109) 2018-07-30 23:10:40 +07:00
Giuseppe Fabiano
7ff129d3ea [theplatform] Relax _VALID_URL (closes #16181) 2018-07-30 03:15:06 +07:00
Sergey M․
9d1b213845 [viqeo] Add extractor (closes #17066) 2018-07-30 03:05:36 +07:00
Sergey M․
5484828418 release 2018.07.29 2018-07-29 07:02:18 +07:00
Sergey M․
4eecef84f3 [ChangeLog] Actualize
[ci skip]
2018-07-29 06:59:39 +07:00
bato3
b2286f8fb2 [crunchyroll:playlist] Restrict _VALID_URL (closes #17069) 2018-07-29 06:56:52 +07:00
Giuseppe Fabiano
4938c8d573 [pornhub] Add support for subtitles (closes #16924) 2018-07-29 06:24:10 +07:00
bato3
1a88fc5a69 [ceskatelevize] Use https for API call (refs #16997) 2018-07-29 06:04:59 +07:00
Huyuumi
38e87f6c2a [utils] Remove return from __init__ 2018-07-29 05:52:42 +07:00
Remita Amine
ec240a4369 [dailymotion:playlist] fix extraction(closes #16894) 2018-07-28 20:30:44 +01:00
Sergey M․
cd3a3ff93b [ted] Improve extraction and update tests 2018-07-28 22:09:53 +07:00
Sergey M․
9a984265b9 [ted] Fix extraction for videos without nativeDownloads (closes #16756, closes #17085) 2018-07-28 21:26:23 +07:00
Remita Amine
a098c99f0d [telecinco] fix extraction(closes #17080) 2018-07-28 06:55:42 +01:00
Remita Amine
8e37a7e4cc [mitele] reduce number of requests and update tests 2018-07-28 06:55:42 +01:00
Sidney de Koning
722f1a0f8f [README.md] Actualize Firefox cookie export add-on
Previous one does not work with newer Firefox versions
2018-07-28 00:18:41 +07:00
Remita Amine
0c7b4f49eb [rai] return non http relinker URL intact(closes #17055) 2018-07-26 08:11:06 +01:00
Remita Amine
ad1bc71a8a [vk] fix extraction for inline only videos(fixes #16923) 2018-07-26 07:25:07 +01:00
Sergey M․
b5dec62ca6 [streamcloud] Fix extraction (closes #17054) 2018-07-25 23:07:12 +07:00
Remita Amine
631f93ee2d [facebook] fix tahoe request for authenticated users(closes #16655) 2018-07-23 06:20:18 +01:00
Sergey M․
d4e7065111 Credit @Kerruba for #16328 2018-07-22 21:36:58 +07:00
Sergey M․
234a85858c Credit @tmsbrg for #15462 2018-07-22 21:35:38 +07:00
Sergey M․
a789d1cc90 Credit @nathanrossi for #16554 2018-07-22 21:34:34 +07:00
Sergey M․
694079dff7 Credit @mrfade for #16269 and #16271 2018-07-22 21:31:46 +07:00
Sergey M․
d94fb1225e Credit @dnet for #16174 2018-07-22 21:29:25 +07:00
Sergey M․
7930f91494 Credit @haasn for #16326 2018-07-22 21:27:28 +07:00
Sergey M․
a702056fbe Credit @bastiandg for #16189 2018-07-22 21:26:12 +07:00
Sergey M․
8fd2a7be37 [puhutv] Improve extraction (closes #16269) 2018-07-22 20:36:48 +07:00
Enes
6de82b4476 [puhutv] Add extractor (closes #16010) 2018-07-22 20:36:48 +07:00
Sergey M․
8e66ffc3b7 release 2018.07.21 2018-07-21 21:00:18 +07:00
Sergey M․
6f27998e75 [ChangeLog] Actualize
[ci skip]
2018-07-21 20:58:30 +07:00
Sergey M․
3052a30d42 Improve URL extraction 2018-07-21 19:08:28 +07:00
Sergey M․
4ecf300d13 [iwara] Improve extraction 2018-07-21 18:03:58 +07:00
Sergey M․
af03000ad5 [utils] Introduce url_or_none 2018-07-21 18:03:58 +07:00
Remita Amine
b96b4be461 [bbc] add support for BBC Radio Play pages(closes #17022) 2018-07-21 11:50:14 +01:00
Kazuma Takahara
edb0e17188 [iwara] Fix download URLs (closes #17026) 2018-07-21 17:41:33 +07:00
Sergey M․
e9c671d5e8 [utils] Allow JSONP with empty func name (closes #17028) 2018-07-21 12:30:18 +07:00
Sergey M․
fd62b36680 [vrtnu] Relax title extraction and extract JSON-LD (closes #17018) 2018-07-20 02:39:20 +07:00
Sergey M․
25586c601c [theplatform] PEP 8
[ci skip]
2018-07-20 00:49:14 +07:00
Sergey M․
ecb6b6ae2d [viu] Pass area id 2018-07-20 00:46:50 +07:00
Sergey M․
c258570edd [viu] Pass Referer and Origin headers (closes #16992) 2018-07-20 00:01:43 +07:00
Sergey M․
6fc09f0155 [vimeo] Add another config regex (closes #17013) 2018-07-19 23:15:27 +07:00
Sergey M․
11330f5121 [facebook] Extract view count and update tests (closes #16942) 2018-07-19 02:26:05 +07:00
bato3
8da17f9680 [dailymotion] Improve description extraction (closes #16984) 2018-07-19 02:04:05 +07:00
Sergey M․
c63f5fb863 [slutload] Fix and improve extraction (closes #17001) 2018-07-19 01:59:00 +07:00
Remita Amine
38f1eb0ac3 [mediaset] fix extraction(closes #16977) 2018-07-18 18:34:04 +01:00
Remita Amine
371dcc1dd4 [theplatform] add support for theplatform Top-level domain customization(#16977) 2018-07-18 18:34:04 +01:00
Remita Amine
bd21ead2a2 [extractor/common] add support for DASH and MSS formats extraction in SMIL manifests 2018-07-18 18:34:04 +01:00
Jakub Wilk
905eef2b06 [imgur] Allow digits in filename extension 2018-07-18 23:47:26 +07:00
Sergey M․
79367a9820 [pornhub] Improve extraction and extract all formats (closes #12166, closes #15891, closes #16262, closes #16959) 2018-07-14 18:05:06 +07:00
167 changed files with 5496 additions and 1971 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.07.10*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.07.10**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.12.03*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.12.03**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.07.10
[debug] youtube-dl version 2018.12.03
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

3
.gitignore vendored
View File

@@ -48,3 +48,6 @@ youtube-dl.zsh
tmp/
venv/
# VS Code related files
.vscode

View File

@@ -15,6 +15,18 @@ env:
- YTDL_TEST_SET=download
matrix:
include:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
fast_finish: true

View File

@@ -239,3 +239,10 @@ Martin Weinelt
Surya Oktafendri
TingPing
Alexandre Macabies
Bastian de Groot
Niklas Haas
András Veres-Szentkirályi
Enes Solak
Nathan Rossi
Thomas van der Berg
Luca Cherubin

View File

@@ -296,5 +296,26 @@ title = self._search_regex(
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples
##### Safely extract optional description from parsed JSON
```python
description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
```
##### Safely extract more optional metadata
```python
video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
description = video.get('summary')
duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```

292
ChangeLog
View File

@@ -1,3 +1,295 @@
version 2018.12.03
Core
* [utils] Fix random_birthday to generate existing dates only (#18284)
Extractors
+ [tiktok] Add support for tiktok.com (#18108, #18135)
* [pornhub] Use actual URL host for requests (#18359)
* [lynda] Fix authentication (#18158, #18217)
* [gfycat] Update API endpoint (#18333, #18343)
+ [hotstar] Add support for alternative app state layout (#18320)
* [azmedien] Fix extraction (#18334, #18336)
+ [vimeo] Add support for VHX (Vimeo OTT) (#14835)
* [joj] Fix extraction (#18280, #18281)
+ [wistia] Add support for fast.wistia.com (#18287)
version 2018.11.23
Core
+ [setup.py] Add more relevant classifiers
Extractors
* [mixcloud] Fallback to hardcoded decryption key (#18016)
* [nbc:news] Fix article extraction (#16194)
* [foxsports] Fix extraction (#17543)
* [loc] Relax regular expression and improve formats extraction
+ [ciscolive] Add support for ciscolive.cisco.com (#17984)
* [nzz] Relax kaltura regex (#18228)
* [sixplay] Fix formats extraction
* [bitchute] Improve title extraction
* [kaltura] Limit requested MediaEntry fields
+ [americastestkitchen] Add support for zype embeds (#18225)
+ [pornhub] Add pornhub.net alias
* [nova:embed] Fix extraction (#18222)
version 2018.11.18
Extractors
+ [wwe] Extract subtitles
+ [wwe] Add support for playlistst (#14781)
+ [wwe] Add support for wwe.com (#14781, #17450)
* [vk] Detect geo restriction (#17767)
* [openload] Use original host during extraction (#18211)
* [atvat] Fix extraction (#18041)
+ [rte] Add support for new API endpoint (#18206)
* [tnaflixnetwork:embed] Fix extraction (#18205)
* [picarto] Use API and add token support (#16518)
+ [zype] Add support for player.zype.com (#18143)
* [vivo] Fix extraction (#18139)
* [ruutu] Update API endpoint (#18138)
version 2018.11.07
Extractors
+ [youtube] Add another JS signature function name regex (#18091, #18093,
#18094)
* [facebook] Fix tahoe request (#17171)
* [cliphunter] Fix extraction (#18083)
+ [youtube:playlist] Add support for invidio.us (#18077)
* [zattoo] Arrange API hosts for derived extractors (#18035)
+ [youtube] Add fallback metadata extraction from videoDetails (#18052)
version 2018.11.03
Core
* [extractor/common] Ensure response handle is not prematurely closed before
it can be read if it matches expected_status (#17195, #17846, #17447)
Extractors
* [laola1tv:embed] Set correct stream access URL scheme (#16341)
+ [ehftv] Add support for ehftv.com (#15408)
* [azmedien] Adopt to major site redesign (#17745, #17746)
+ [twitcasting] Add support for twitcasting.tv (#17981)
* [orf:tvthek] Fix extraction (#17737, #17956, #18024)
+ [openload] Add support for oload.fun (#18045)
* [njpwworld] Fix authentication (#17427)
+ [linkedin:learning] Add support for linkedin.com/learning (#13545)
* [theplatform] Improve error detection (#13222)
* [cnbc] Simplify extraction (#14280, #17110)
+ [cbnc] Add support for new URL schema (#14193)
* [aparat] Improve extraction and extract more metadata (#17445, #18008)
* [aparat] Fix extraction
version 2018.10.29
Core
+ [extractor/common] Add validation for JSON-LD URLs
Extractors
+ [sportbox] Add support for matchtv.ru
* [sportbox] Fix extraction (#17978)
* [screencast] Fix extraction (#14590, #14617, #17990)
+ [openload] Add support for oload.icu
+ [ivi] Add support for ivi.tv
* [crunchyroll] Improve extraction failsafeness (#17991)
* [dailymail] Fix formats extraction (#17976)
* [viewster] Reduce format requests
* [cwtv] Handle API errors (#17905)
+ [rutube] Use geo verification headers (#17897)
+ [brightcove:legacy] Add fallbacks to brightcove:new (#13912)
- [tv3] Remove extractor (#10461, #15339)
* [ted] Fix extraction for HTTP and RTMP formats (#5941, #17572, #17894)
+ [openload] Add support for oload.cc (#17823)
+ [patreon] Extract post_file URL (#17792)
* [patreon] Fix extraction (#14502, #10471)
version 2018.10.05
Extractors
* [pluralsight] Improve authentication (#17762)
* [dailymotion] Fix extraction (#17699)
* [crunchyroll] Switch to HTTPS for RpcApi (#17749)
+ [philharmoniedeparis] Add support for pad.philharmoniedeparis.fr (#17705)
* [philharmoniedeparis] Fix extraction (#17705)
+ [jamendo] Add support for licensing.jamendo.com (#17724)
+ [openload] Add support for oload.cloud (#17710)
* [pluralsight] Fix subtitles extraction (#17726, #17728)
+ [vimeo] Add another config regular expression (#17690)
* [spike] Fix Paramount Network extraction (#17677)
* [hotstar] Fix extraction (#14694, #14931, #17637)
version 2018.09.26
Extractors
* [pluralsight] Fix subtitles extraction (#17671)
* [mediaset] Improve embed support (#17668)
+ [youtube] Add support for invidio.us (#17613)
+ [zattoo] Add support for more zattoo platform sites
* [zattoo] Fix extraction (#17175, #17542)
version 2018.09.18
Core
+ [extractor/common] Introduce channel meta fields
Extractors
* [adobepass] Don't pollute default headers dict
* [udemy] Don't pollute default headers dict
* [twitch] Don't pollute default headers dict
* [youtube] Don't pollute default query dict (#17593)
* [crunchyroll] Prefer hardsubless formats and formats in locale language
* [vrv] Make format ids deterministic
* [vimeo] Fix ondemand playlist extraction (#14591)
+ [pornhub] Extract upload date (#17574)
+ [porntube] Extract channel meta fields
+ [vimeo] Extract channel meta fields
+ [youtube] Extract channel meta fields (#9676, #12939)
* [porntube] Fix extraction (#17541)
* [asiancrush] Fix extraction (#15630)
+ [twitch:clips] Extend URL regular expression (closes #17559)
+ [vzaar] Add support for HLS
* [tube8] Fix metadata extraction (#17520)
* [eporner] Extract JSON-LD (#17519)
version 2018.09.10
Core
+ [utils] Properly recognize AV1 codec (#17506)
Extractors
+ [iprima] Add support for prima.iprima.cz (#17514)
+ [tele5] Add support for tele5.de (#7805, #7922, #17331, #17414)
* [nbc] Fix extraction of percent encoded URLs (#17374)
version 2018.09.08
Extractors
* [youtube] Fix extraction (#17457, #17464)
+ [pornhub:uservideos] Add support for new URLs (#17388)
* [iprima] Confirm adult check (#17437)
* [slideslive] Make check for video service name case-insensitive (#17429)
* [radiojavan] Fix extraction (#17151)
* [generic] Skip unsuccessful jwplayer extraction (#16735)
version 2018.09.01
Core
* [utils] Skip remote IP addresses non matching to source address' IP version
when creating a connection (#13422, #17362)
Extractors
+ [ard] Add support for one.ard.de (#17397)
* [niconico] Fix extraction on python3 (#17393, #17407)
* [ard] Extract f4m formats
* [crunchyroll] Parse vilos media data (#17343)
+ [ard] Add support for Beta ARD Mediathek
+ [bandcamp] Extract more metadata (#13197)
* [internazionale] Fix extraction of non-available-abroad videos (#17386)
version 2018.08.28
Extractors
+ [youtube:playlist] Add support for music album playlists (OLAK5uy_ prefix)
(#17361)
* [bitchute] Fix extraction by pass custom User-Agent (#17360)
* [webofstories:playlist] Fix extraction (#16914)
+ [tvplayhome] Add support for new tvplay URLs (#17344)
+ [generic] Allow relative src for videojs embeds (#17324)
+ [xfileshare] Add support for vidto.se (#17317)
+ [vidzi] Add support for vidzi.nu (#17316)
+ [nova:embed] Add support for media.cms.nova.cz (#17282)
version 2018.08.22
Core
* [utils] Use pure browser header for User-Agent (#17236)
Extractors
+ [kinopoisk] Add support for kinopoisk.ru (#17283)
+ [yourporn] Add support for yourporn.sexy (#17298)
+ [go] Add support for disneynow.go.com (#16299, #17264)
+ [6play] Add support for play.rtl.hr (#17249)
* [anvato] Fallback to generic API key for access-key-to-API-key lookup
(#16788, #17254)
* [lci] Fix extraction (#17274)
* [bbccouk] Extend id URL regular expression (#17270)
* [cwtv] Fix extraction (#17256)
* [nova] Fix extraction (#17241)
+ [generic] Add support for expressen embeds
* [raywenderlich] Adapt to site redesign (#17225)
+ [redbulltv] Add support redbull.com tv URLs (#17218)
+ [bitchute] Add support for bitchute.com (#14052)
+ [clyp] Add support for token protected media (#17184)
* [imdb] Fix extension extraction (#17167)
version 2018.08.04
Extractors
* [funk:channel] Improve byChannelAlias extraction (#17142)
* [twitch] Fix authentication (#17024, #17126)
* [twitch:vod] Improve URL regular expression (#17135)
* [watchbox] Fix extraction (#17107)
* [pbs] Fix extraction (#17109)
* [theplatform] Relax URL regular expression (#16181, #17097)
+ [viqeo] Add support for viqeo.tv (#17066)
version 2018.07.29
Extractors
* [crunchyroll:playlist] Restrict URL regular expression (#17069, #17076)
+ [pornhub] Add support for subtitles (#16924, #17088)
* [ceskatelevize] Use https for API call (#16997, #16999)
* [dailymotion:playlist] Fix extraction (#16894)
* [ted] Improve extraction
* [ted] Fix extraction for videos without nativeDownloads (#16756, #17085)
* [telecinco] Fix extraction (#17080)
* [mitele] Reduce number of requests
* [rai] Return non HTTP relinker URL intact (#17055)
* [vk] Fix extraction for inline only videos (#16923)
* [streamcloud] Fix extraction (#17054)
* [facebook] Fix tahoe player extraction with authentication (#16655)
+ [puhutv] Add support for puhutv.com (#12712, #16010, #16269)
version 2018.07.21
Core
+ [utils] Introduce url_or_none
* [utils] Allow JSONP without function name (#17028)
+ [extractor/common] Extract DASH and MSS formats from SMIL manifests
Extractors
+ [bbc] Add support for BBC Radio Play pages (#17022)
* [iwara] Fix download URLs (#17026)
* [vrtnu] Relax title extraction and extract JSON-LD (#17018)
+ [viu] Pass Referer and Origin headers and area id (#16992)
+ [vimeo] Add another config regular expression (#17013)
+ [facebook] Extract view count (#16942)
* [dailymotion] Improve description extraction (#16984)
* [slutload] Fix and improve extraction (#17001)
* [mediaset] Fix extraction (#16977)
+ [theplatform] Add support for theplatform TLD customization (#16977)
* [imgur] Relax URL regular expression (#16987)
* [pornhub] Improve extraction and extract all formats (#12166, #15891, #16262,
#16959)
version 2018.07.10
Core

View File

@@ -511,6 +511,8 @@ The basic usage is not to set any template arguments when downloading a single f
- `timestamp` (numeric): UNIX timestamp of the moment the video became available
- `upload_date` (string): Video upload date (YYYYMMDD)
- `uploader_id` (string): Nickname or id of the video uploader
- `channel` (string): Full name of the channel the video is uploaded on
- `channel_id` (string): Id of the channel
- `location` (string): Physical location where the video was filmed
- `duration` (numeric): Length of the video in seconds
- `view_count` (numeric): How many users have watched the video on the platform
@@ -870,7 +872,7 @@ Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the op
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
@@ -1166,7 +1168,28 @@ title = self._search_regex(
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples
##### Safely extract optional description from parsed JSON
```python
description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
```
##### Safely extract more optional metadata
```python
video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
description = video.get('summary')
duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```
# EMBEDDING YOUTUBE-DL

View File

@@ -56,6 +56,7 @@
- **archive.org**: archive.org videos
- **ARD**
- **ARD:mediathek**
- **ARDBetaMediathek**
- **Arkena**
- **arte.tv**
- **arte.tv:+7**
@@ -83,8 +84,6 @@
- **awaan:season**
- **awaan:video**
- **AZMedien**: AZ Medien videos
- **AZMedienPlaylist**: AZ Medien playlists
- **AZMedienShowPlaylist**: AZ Medien show playlists
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
@@ -97,6 +96,7 @@
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BBVTV**
- **Beatport**
- **Beeg**
- **BehindKink**
@@ -108,6 +108,8 @@
- **BiliBili**
- **BioBioChileTV**
- **BIQLE**
- **BitChute**
- **BitChuteChannel**
- **BleacherReport**
- **BleacherReportCMS**
- **blinkx**
@@ -161,6 +163,8 @@
- **chirbit**
- **chirbit:profile**
- **Cinchcast**
- **CiscoLiveSearch**
- **CiscoLiveSession**
- **CJSW**
- **cliphunter**
- **Clippit**
@@ -174,6 +178,7 @@
- **Clyp**
- **cmt.com**
- **CNBC**
- **CNBCVideo**
- **CNN**
- **CNNArticle**
- **CNNBlogs**
@@ -189,7 +194,7 @@
- **Crackle**
- **Criterion**
- **CrooksAndLiars**
- **Crunchyroll**
- **crunchyroll**
- **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN
@@ -247,7 +252,9 @@
- **EchoMsk**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
- **ehftv**
- **eHow**
- **EinsUndEinsTV**
- **Einthusan**
- **eitb.tv**
- **EllenTube**
@@ -265,6 +272,7 @@
- **EsriVideo**
- **Europa**
- **EveryonesMixtape**
- **EWETV**
- **ExpoTV**
- **Expressen**
- **ExtremeTube**
@@ -324,6 +332,7 @@
- **Gfycat**
- **GiantBomb**
- **Giga**
- **GlattvisionTV**
- **Glide**: Glide mobile video messages (glide.me)
- **Globo**
- **GloboArticle**
@@ -353,7 +362,7 @@
- **HitRecord**
- **HornBunny**
- **HotNewHipHop**
- **HotStar**
- **hotstar**
- **hotstar:playlist**
- **Howcast**
- **HowStuffWorks**
@@ -405,6 +414,7 @@
- **Ketnet**
- **KhanAcademy**
- **KickStarter**
- **KinoPoisk**
- **KonserthusetPlay**
- **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью
@@ -437,6 +447,8 @@
- **limelight:channel**
- **limelight:channel_list**
- **LineTV**
- **linkedin:learning**
- **linkedin:learning:course**
- **LiTV**
- **LiveLeak**
- **LiveLeakEmbed**
@@ -490,6 +502,7 @@
- **Mixer:vod**
- **MLB**
- **Mnet**
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
@@ -521,6 +534,7 @@
- **Myvi**
- **MyVidster**
- **MyviEmbed**
- **MyVisionTV**
- **n-tv.de**
- **natgeo**
- **natgeo:episodeguide**
@@ -546,6 +560,7 @@
- **netease:program**: 网易云音乐 - 电台节目
- **netease:singer**: 网易云音乐 - 歌手
- **netease:song**: 网易云音乐
- **NetPlus**
- **Netzkino**
- **Newgrounds**
- **NewgroundsPlaylist**
@@ -577,6 +592,7 @@
- **Normalboots**
- **NosVideo**
- **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
- **NovaEmbed**
- **nowness**
- **nowness:playlist**
- **nowness:series**
@@ -621,6 +637,7 @@
- **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek
- **OsnatelTV**
- **PacktPub**
- **PacktPubCourse**
- **PandaTV**: 熊猫TV
@@ -672,6 +689,8 @@
- **PrimeShareTV**
- **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital
- **puhutv**
- **puhutv:serie**
- **Puls4**
- **Pyvideo**
- **qqmusic**: QQ音乐
@@ -679,6 +698,7 @@
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuantumTV**
- **Quickline**
- **QuicklineLive**
- **R7**
@@ -694,6 +714,7 @@
- **RaiPlayLive**
- **RaiPlayPlaylist**
- **RayWenderlich**
- **RayWenderlichCourse**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBullTV**
@@ -745,6 +766,7 @@
- **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses
- **SAKTV**
- **Sapo**: SAPO Vídeos
- **savefrom.net**
- **SBS**: sbs.com.au
@@ -800,7 +822,7 @@
- **Spiegeltv**
- **sport.francetvinfo.fr**
- **Sport5**
- **SportBoxEmbed**
- **SportBox**
- **SportDeutschland**
- **SpringboardPlatform**
- **Sprout**
@@ -839,6 +861,7 @@
- **techtv.mit.edu**
- **ted**
- **Tele13**
- **Tele5**
- **TeleBruxelles**
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf**
@@ -862,6 +885,8 @@
- **ThisAmericanLife**
- **ThisAV**
- **ThisOldHouse**
- **TikTok**
- **TikTokUser**
- **tinypic**: tinypic.com videos
- **TMZ**
- **TMZArticle**
@@ -890,7 +915,6 @@
- **TV2**
- **tv2.hu**
- **TV2Article**
- **TV3**
- **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+
- **TVA**
@@ -910,7 +934,9 @@
- **tvp:embed**: Telewizja Polska
- **tvp:series**
- **TVPlayer**
- **TVPlayHome**
- **Tweakers**
- **TwitCasting**
- **twitch:chapter**
- **twitch:clips**
- **twitch:profile**
@@ -955,6 +981,7 @@
- **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com**
- **vhx:embed**
- **Viafree**
- **vice**
- **vice:article**
@@ -999,6 +1026,7 @@
- **Vimple**: Vimple - one-click video hosting
- **Vine**
- **vine:user**
- **Viqeo**
- **Viu**
- **viu:ott**
- **viu:playlist**
@@ -1024,12 +1052,14 @@
- **vrv**
- **vrv:series**
- **VShare**
- **VTXTV**
- **vube**: Vube.com
- **VuClip**
- **VVVVID**
- **VyboryMos**
- **Vzaar**
- **Walla**
- **WalyTV**
- **washingtonpost**
- **washingtonpost:article**
- **wat.tv**
@@ -1055,6 +1085,7 @@
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal
- **WSJArticle**
- **WWE**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
@@ -1090,6 +1121,7 @@
- **YouNowLive**
- **YouNowMoment**
- **YouPorn**
- **YourPorn**
- **YourUpload**
- **youtube**: YouTube.com
- **youtube:channel**: YouTube.com channels
@@ -1113,3 +1145,4 @@
- **ZDF**
- **ZDFChannel**
- **zingmp3**: mp3.zing.vn
- **Zype**

View File

@@ -124,6 +124,8 @@ setup(
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'License :: Public Domain',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
@@ -132,6 +134,13 @@ setup(
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: Implementation',
'Programming Language :: Python :: Implementation :: CPython',
'Programming Language :: Python :: Implementation :: IronPython',
'Programming Language :: Python :: Implementation :: Jython',
'Programming Language :: Python :: Implementation :: PyPy',
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@@ -7,6 +7,7 @@ import json
import os.path
import re
import types
import ssl
import sys
import youtube_dl.extractor
@@ -244,3 +245,12 @@ def expect_warnings(ydl, warnings_re):
real_warning(w)
ydl.report_warning = _report_warning
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]

View File

@@ -9,11 +9,30 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, expect_dict, expect_value
from youtube_dl.compat import compat_etree_fromstring
from test.helper import FakeYDL, expect_dict, expect_value, http_server_port
from youtube_dl.compat import compat_etree_fromstring, compat_http_server
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
import threading
TEAPOT_RESPONSE_STATUS = 418
TEAPOT_RESPONSE_BODY = "<h1>418 I'm a teapot</h1>"
class InfoExtractorTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
def do_GET(self):
if self.path == '/teapot':
self.send_response(TEAPOT_RESPONSE_STATUS)
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write(TEAPOT_RESPONSE_BODY.encode())
else:
assert False
class TestIE(InfoExtractor):
@@ -743,6 +762,25 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
for i in range(len(entries)):
expect_dict(self, entries[i], expected_entries[i])
def test_response_with_expected_status_returns_content(self):
# Checks for mitigations against the effects of
# <https://bugs.python.org/issue15002> that affect Python 3.4.1+, which
# manifest as `_download_webpage`, `_download_xml`, `_download_json`,
# or the underlying `_download_webpage_handle` returning no content
# when a response matches `expected_status`.
httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), InfoExtractorTestRequestHandler)
port = http_server_port(httpd)
server_thread = threading.Thread(target=httpd.serve_forever)
server_thread.daemon = True
server_thread.start()
(content, urlh) = self.ie._download_webpage_handle(
'http://127.0.0.1:%d/teapot' % port, None,
expected_status=TEAPOT_RESPONSE_STATUS)
self.assertEqual(content, TEAPOT_RESPONSE_BODY)
if __name__ == '__main__':
unittest.main()

View File

@@ -39,7 +39,7 @@ class TestCompat(unittest.TestCase):
def test_compat_expanduser(self):
old_home = os.environ.get('HOME')
test_str = 'C:\Documents and Settings\тест\Application Data'
test_str = r'C:\Documents and Settings\тест\Application Data'
compat_setenv('HOME', test_str)
self.assertEqual(compat_expanduser('~'), test_str)
compat_setenv('HOME', old_home or '')

View File

@@ -9,26 +9,16 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import try_rm
from test.helper import http_server_port, try_rm
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server
from youtube_dl.downloader.http import HttpFD
from youtube_dl.utils import encodeFilename
import ssl
import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
TEST_SIZE = 10 * 1024

View File

@@ -8,6 +8,7 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import http_server_port
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server, compat_urllib_request
import ssl
@@ -16,15 +17,6 @@ import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass

View File

@@ -14,4 +14,4 @@ from youtube_dl.postprocessor import MetadataFromTitlePP
class TestMetadataFromTitle(unittest.TestCase):
def test_format_to_regex(self):
pp = MetadataFromTitlePP(None, '%(title)s - %(artist)s')
self.assertEqual(pp._titleregex, '(?P<title>.+)\ \-\ (?P<artist>.+)')
self.assertEqual(pp._titleregex, r'(?P<title>.+)\ \-\ (?P<artist>.+)')

View File

@@ -78,6 +78,7 @@ from youtube_dl.utils import (
uppercase_escape,
lowercase_escape,
url_basename,
url_or_none,
base_url,
urljoin,
urlencode_postdata,
@@ -507,6 +508,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
def test_url_or_none(self):
self.assertEqual(url_or_none(None), None)
self.assertEqual(url_or_none(''), None)
self.assertEqual(url_or_none('foo'), None)
self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de')
self.assertEqual(url_or_none('https://foo.de'), 'https://foo.de')
self.assertEqual(url_or_none('http$://foo.de'), None)
self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de')
self.assertEqual(url_or_none('//foo.de'), '//foo.de')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None)
@@ -717,6 +728,10 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'})
stripped = strip_jsonp('({"status": "success"});')
d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'})
def test_uppercase_escape(self):
self.assertEqual(uppercase_escape(''), '')
self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')
@@ -770,6 +785,10 @@ class TestUtil(unittest.TestCase):
'vcodec': 'h264',
'acodec': 'aac',
})
self.assertEqual(parse_codecs('av01.0.05M.08'), {
'vcodec': 'av01.0.05M.08',
'acodec': 'none',
})
def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]"

View File

@@ -1325,8 +1325,8 @@ class AdobePassIE(InfoExtractor):
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
def _download_webpage_handle(self, *args, **kwargs):
headers = kwargs.get('headers', {})
headers.update(self.geo_verification_headers())
headers = self.geo_verification_headers()
headers.update(kwargs.get('headers', {}))
kwargs['headers'] = headers
return super(AdobePassIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))

View File

@@ -7,6 +7,7 @@ from .turner import TurnerBaseIE
from ..utils import (
int_or_none,
strip_or_none,
url_or_none,
)
@@ -98,7 +99,7 @@ class AdultSwimIE(TurnerBaseIE):
if not video_id:
entries = []
for episode in video_data.get('archiveEpisodes', []):
episode_url = episode.get('url')
episode_url = url_or_none(episode.get('url'))
if not episode_url:
continue
entries.append(self.url_result(

View File

@@ -9,6 +9,7 @@ from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
url_or_none,
urlencode_postdata,
xpath_text,
)
@@ -304,7 +305,7 @@ class AfreecaTVIE(InfoExtractor):
file_elements = video_element.findall(compat_xpath('./file'))
one = len(file_elements) == 1
for file_num, file_element in enumerate(file_elements, start=1):
file_url = file_element.text
file_url = url_or_none(file_element.text)
if not file_url:
continue
key = file_element.get('key', '')

View File

@@ -43,10 +43,6 @@ class AmericasTestKitchenIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
video_data = self._parse_json(
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
@@ -58,7 +54,18 @@ class AmericasTestKitchenIE(InfoExtractor):
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
external_id = ep_data.get('external_id') or ep_meta['external_id']
zype_id = ep_meta.get('zype_id')
if zype_id:
embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
ie_key = 'Zype'
else:
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
external_id = ep_data.get('external_id') or ep_meta['external_id']
embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
ie_key = 'Kaltura'
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@@ -72,8 +79,8 @@ class AmericasTestKitchenIE(InfoExtractor):
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, external_id),
'ie_key': 'Kaltura',
'url': embed_url,
'ie_key': ie_key,
'title': title,
'description': description,
'thumbnail': thumbnail,

View File

@@ -3,11 +3,12 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
mimetype2ext,
determine_ext,
ExtractorError,
int_or_none,
mimetype2ext,
parse_iso8601,
url_or_none,
)
@@ -35,7 +36,7 @@ class AMPIE(InfoExtractor):
media_thumbnail = [media_thumbnail]
for thumbnail_data in media_thumbnail:
thumbnail = thumbnail_data.get('@attributes', {})
thumbnail_url = thumbnail.get('url')
thumbnail_url = url_or_none(thumbnail.get('url'))
if not thumbnail_url:
continue
thumbnails.append({
@@ -51,7 +52,7 @@ class AMPIE(InfoExtractor):
media_subtitle = [media_subtitle]
for subtitle_data in media_subtitle:
subtitle = subtitle_data.get('@attributes', {})
subtitle_href = subtitle.get('href')
subtitle_href = url_or_none(subtitle.get('href'))
if not subtitle_href:
continue
subtitles.setdefault(subtitle.get('lang') or 'en', []).append({
@@ -65,7 +66,7 @@ class AMPIE(InfoExtractor):
media_content = [media_content]
for media_data in media_content:
media = media_data.get('@attributes', {})
media_url = media.get('url')
media_url = url_or_none(media.get('url'))
if not media_url:
continue
ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
@@ -79,7 +80,7 @@ class AMPIE(InfoExtractor):
else:
formats.append({
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'],
'url': media_url,
'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')),
'ext': ext,

View File

@@ -8,6 +8,7 @@ from ..utils import (
determine_ext,
extract_attributes,
ExtractorError,
url_or_none,
urlencode_postdata,
urljoin,
)
@@ -165,7 +166,7 @@ class AnimeOnDemandIE(InfoExtractor):
}, fatal=False)
if not playlist:
continue
stream_url = playlist.get('streamurl')
stream_url = url_or_none(playlist.get('streamurl'))
if stream_url:
rtmp = re.search(
r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+/))(?P<playpath>mp[34]:.+)',

View File

@@ -134,9 +134,33 @@ class AnvatoIE(InfoExtractor):
'telemundo': 'anvato_mcp_telemundo_web_prod_c5278d51ad46fda4b6ca3d0ea44a7846a054f582'
}
_API_KEY = '3hwbSuqqT690uxjNYBktSQpa5ZrpYYR0Iofx7NcJHyA'
_ANVP_RE = r'<script[^>]+\bdata-anvp\s*=\s*(["\'])(?P<anvp>(?:(?!\1).)+)\1'
_AUTH_KEY = b'\x31\xc2\x42\x84\x9e\x73\xa0\xce'
_TESTS = [{
# from https://www.boston25news.com/news/watch-humpback-whale-breaches-right-next-to-fishing-boat-near-nh/817484874
'url': 'anvato:8v9BEynrwx8EFLYpgfOWcG1qJqyXKlRM:4465496',
'info_dict': {
'id': '4465496',
'ext': 'mp4',
'title': 'VIDEO: Humpback whale breaches right next to NH boat',
'description': 'VIDEO: Humpback whale breaches right next to NH boat. Footage courtesy: Zach Fahey.',
'duration': 22,
'timestamp': 1534855680,
'upload_date': '20180821',
'uploader': 'ANV',
},
'params': {
'skip_download': True,
},
}, {
# from https://sanfrancisco.cbslocal.com/2016/06/17/source-oakland-cop-on-leave-for-having-girlfriend-help-with-police-reports/
'url': 'anvato:DVzl9QRzox3ZZsP9bNu5Li3X7obQOnqP:3417601',
'only_matching': True,
}]
def __init__(self, *args, **kwargs):
super(AnvatoIE, self).__init__(*args, **kwargs)
self.__server_time = None
@@ -169,7 +193,8 @@ class AnvatoIE(InfoExtractor):
'api': {
'anvrid': anvrid,
'anvstk': md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time, self._ANVACK_TABLE[access_key])),
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY))),
'anvts': server_time,
},
}
@@ -284,5 +309,6 @@ class AnvatoIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
access_key, video_id = mobj.group('access_key_or_mcp', 'id')
if access_key not in self._ANVACK_TABLE:
access_key = self._MCP_TO_ACCESS_KEY_TABLE[access_key]
access_key = self._MCP_TO_ACCESS_KEY_TABLE.get(
access_key) or access_key
return self._get_anvato_videos(access_key, video_id)

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
url_or_none,
)
@@ -77,7 +78,7 @@ class AolIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
for rendition in video_data.get('renditions', []):
video_url = rendition.get('url')
video_url = url_or_none(rendition.get('url'))
if not video_url:
continue
ext = rendition.get('format')

View File

@@ -4,10 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
js_to_json,
url_or_none,
)
@@ -68,8 +68,8 @@ class APAIE(InfoExtractor):
for source in sources:
if not isinstance(source, dict):
continue
source_url = source.get('file')
if not source_url or not isinstance(source_url, compat_str):
source_url = url_or_none(source.get('file'))
if not source_url:
continue
ext = determine_ext(source_url)
if ext == 'm3u8':

View File

@@ -4,66 +4,92 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
merge_dicts,
mimetype2ext,
url_or_none,
)
class AparatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.aparat.com/v/wP8On',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
'title': 'تیم گلکسی 11 - زومیت',
'age_limit': 0,
'description': 'md5:096bdabcdcc4569f2b8a5e903a3b3028',
'duration': 231,
'timestamp': 1387394859,
'upload_date': '20131218',
'view_count': int,
},
# 'skip': 'Extremely unreliable',
}
}, {
# multiple formats
'url': 'https://www.aparat.com/v/8dflw/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
# Provides more metadata
webpage = self._download_webpage(url, video_id, fatal=False)
title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
if not webpage:
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
file_list = self._parse_json(
options = self._parse_json(
self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage,
'file list'),
r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
webpage, 'options', group='value'),
video_id)
player = options['plugins']['sabaPlayerPlugin']
formats = []
for item in file_list[0]:
file_url = item.get('file')
if not file_url:
continue
ext = mimetype2ext(item.get('type'))
label = item.get('label')
formats.append({
'url': file_url,
'ext': ext,
'format_id': label or ext,
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', label or '', 'height', default=None)),
})
self._sort_formats(formats)
for sources in player['multiSRC']:
for item in sources:
if not isinstance(item, dict):
continue
file_url = url_or_none(item.get('src'))
if not file_url:
continue
item_type = item.get('type')
if item_type == 'application/vnd.apple.mpegurl':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
ext = mimetype2ext(item.get('type'))
label = item.get('label')
formats.append({
'url': file_url,
'ext': ext,
'format_id': 'http-%s' % (label or ext),
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', label or '', 'height',
default=None)),
})
self._sort_formats(
formats, field_preference=('height', 'width', 'tbr', 'format_id'))
thumbnail = self._search_regex(
r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
info = self._search_json_ld(webpage, video_id, default={})
return {
if not info.get('title'):
info['title'] = player['title']
return merge_dicts(info, {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'age_limit': self._family_friendly_search(webpage),
'thumbnail': url_or_none(options.get('poster')),
'duration': int_or_none(player.get('duration')),
'formats': formats,
}
})

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from .generic import GenericIE
from ..compat import compat_str
from ..utils import (
determine_ext,
ExtractorError,
@@ -15,13 +14,14 @@ from ..utils import (
unified_strdate,
xpath_text,
update_url_query,
url_or_none,
)
from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{
# available till 26.07.2022
@@ -37,6 +37,9 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
'only_matching': True,
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@@ -100,7 +103,7 @@ class ARDMediathekIE(InfoExtractor):
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not isinstance(stream_url, compat_str) or '//' not in stream_url:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
@@ -282,3 +285,76 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date,
'thumbnail': thumbnail,
}
class ARDBetaMediathekIE(InfoExtractor):
_VALID_URL = r'https://beta\.ardmediathek\.de/[a-z]+/player/(?P<video_id>[a-zA-Z0-9]+)/(?P<display_id>[^/?#]+)'
_TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
'info_dict': {
'display_id': 'die-robuste-roswita',
'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'title': 'Tatort: Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.',
'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
'upload_date': '20180826',
'ext': 'mp4',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
data = self._parse_json(data_json, display_id)
res = {
'id': video_id,
'display_id': display_id,
}
formats = []
for widget in data.values():
if widget.get('_geoblocked'):
raise ExtractorError('This video is not available due to geoblocking', expected=True)
if '_duration' in widget:
res['duration'] = widget['_duration']
if 'clipTitle' in widget:
res['title'] = widget['clipTitle']
if '_previewImage' in widget:
res['thumbnail'] = widget['_previewImage']
if 'broadcastedOn' in widget:
res['upload_date'] = unified_strdate(widget['broadcastedOn'])
if 'synopsis' in widget:
res['description'] = widget['synopsis']
if '_subtitleUrl' in widget:
res['subtitles'] = {'de': [{
'ext': 'ttml',
'url': widget['_subtitleUrl'],
}]}
if '_quality' in widget:
format_url = widget['_stream']['json'][0]
if format_url.endswith('.f4m'):
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.11.0',
video_id, f4m_id='hds', fatal=False))
elif format_url.endswith('m3u8'):
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': 'http-' + widget['_quality'],
'url': format_url,
'preference': 10, # Plain HTTP, that's nice
})
self._sort_formats(formats)
res['formats'] = formats
return res

View File

@@ -8,7 +8,6 @@ from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
urlencode_postdata,
)
@@ -34,19 +33,40 @@ class AsianCrushIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://www.asiancrush.com/wp-admin/admin-ajax.php', video_id,
data=urlencode_postdata({
'postid': video_id,
'action': 'get_channel_kaltura_vars',
}))
webpage = self._download_webpage(url, video_id)
entry_id = data['entry_id']
entry_id, partner_id, title = [None] * 3
vars = self._parse_json(
self._search_regex(
r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
default='{}'), video_id, fatal=False)
if vars:
entry_id = vars.get('entry_id')
partner_id = vars.get('partner_id')
title = vars.get('vid_label')
if not entry_id:
entry_id = self._search_regex(
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
'kaltura id', group='id')
if not partner_id:
partner_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
return self.url_result(
'kaltura:%s:%s' % (data['partner_id'], entry_id),
ie=KalturaIE.ie_key(), video_id=entry_id,
video_title=data.get('vid_label'))
'kaltura:%s:%s' % (partner_id, kaltura_id),
ie=KalturaIE.ie_key(), video_id=kaltura_id,
video_title=title)
class AsianCrushPlaylistIE(InfoExtractor):

View File

@@ -28,8 +28,10 @@ class ATVAtIE(InfoExtractor):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="([^"]+)"',
webpage, 'player data')), display_id)['config']['initial_video']
[r'flashPlayerOptions\s*=\s*(["\'])(?P<json>(?:(?!\1).)+)\1',
r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="(?P<json>[^"]+)"'],
webpage, 'player data', group='json')),
display_id)['config']['initial_video']
video_id = video_data['id']
video_title = video_data['title']

View File

@@ -1,213 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
get_element_by_class,
get_element_by_id,
strip_or_none,
urljoin,
)
class AZMedienBaseIE(InfoExtractor):
def _kaltura_video(self, partner_id, entry_id):
return self.url_result(
'kaltura:%s:%s' % (partner_id, entry_id), ie=KalturaIE.ie_key(),
video_id=entry_id)
class AZMedienIE(AZMedienBaseIE):
class AZMedienIE(InfoExtractor):
IE_DESC = 'AZ Medien videos'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
(?P<host>
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
[0-9]+-show-[^/\#]+
(?:
/[0-9]+-episode-[^/\#]+
(?:
/[0-9]+-segment-(?:[^/\#]+\#)?|
\#
)|
\#
[^/]+/
(?P<id>
[^/]+-(?P<article_id>\d+)
)
(?P<id>[^\#]+)
(?:
\#video=
(?P<kaltura_id>
[_0-9a-z]+
)
)?
'''
_TESTS = [{
# URL with 'segment'
'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
'url': 'https://www.telezueri.ch/sonntalk/bundesrats-vakanzen-eu-rahmenabkommen-133214569',
'info_dict': {
'id': '1_2444peh4',
'id': '1_anruz3wy',
'ext': 'mp4',
'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
'uploader_id': 'TeleZ?ri',
'upload_date': '20161218',
'timestamp': 1482084490,
'title': 'Bundesrats-Vakanzen / EU-Rahmenabkommen',
'uploader_id': 'TVOnline',
'upload_date': '20180930',
'timestamp': 1538328802,
},
'params': {
'skip_download': True,
},
}, {
# URL with 'segment' and fragment:
'url': 'http://www.telebaern.tv/118-show-news/14240-episode-dienstag-17-januar-2017/33666-segment-achtung-gefahr#zu-wenig-pflegerinnen-und-pfleger',
'only_matching': True
}, {
# URL with 'episode' and fragment:
'url': 'http://www.telem1.ch/47-show-sonntalk/13986-episode-soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz#soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz',
'only_matching': True
}, {
# URL with 'show' and fragment:
'url': 'http://www.telezueri.ch/66-show-sonntalk#burka-plakate-trump-putin-china-besuch',
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'<script[^>]+src=["\'](?:https?:)?//(?:[^/]+\.)?kaltura\.com(?:/[^/]+)*/(?:p|partner_id)/([0-9]+)',
webpage, 'kaltura partner id')
entry_id = self._html_search_regex(
r'<a[^>]+data-id=(["\'])(?P<id>(?:(?!\1).)+)\1[^>]+data-slug=["\']%s'
% re.escape(video_id), webpage, 'kaltura entry id', group='id')
return self._kaltura_video(partner_id, entry_id)
class AZMedienPlaylistIE(AZMedienBaseIE):
IE_DESC = 'AZ Medien playlists'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
(?P<id>[0-9]+-
(?:
show|
topic|
themen
)-[^/\#]+
(?:
/[0-9]+-episode-[^/\#]+
)?
)$
'''
_TESTS = [{
# URL with 'episode'
'url': 'http://www.telebaern.tv/118-show-news/13735-episode-donnerstag-15-dezember-2016',
'info_dict': {
'id': '118-show-news/13735-episode-donnerstag-15-dezember-2016',
'title': 'News - Donnerstag, 15. Dezember 2016',
},
'playlist_count': 9,
}, {
# URL with 'themen'
'url': 'http://www.telem1.ch/258-themen-tele-m1-classics',
'info_dict': {
'id': '258-themen-tele-m1-classics',
'title': 'Tele M1 Classics',
},
'playlist_mincount': 15,
}, {
# URL with 'topic', contains nested playlists
'url': 'http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen',
'only_matching': True,
}, {
# URL with 'show' only
'url': 'http://www.telezueri.ch/86-show-talktaeglich',
'only_matching': True
}]
_PARTNER_ID = '1719221'
def _real_extract(self, url):
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
entry_id = mobj.group('kaltura_id')
entries = []
if not entry_id:
api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
payload = {
'query': '''query VideoContext($articleId: ID!) {
article: node(id: $articleId) {
... on Article {
mainAssetRelation {
asset {
... on VideoAsset {
kalturaId
}
}
}
}
}
}''',
'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
}
json_data = self._download_json(
api_url, video_id, headers={
'Content-Type': 'application/json',
},
data=json.dumps(payload).encode())
entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id', default=None)
if partner_id:
entries = [
self._kaltura_video(partner_id, m.group('id'))
for m in re.finditer(
r'data-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage)]
if not entries:
entries = [
self.url_result(m.group('url'), ie=AZMedienIE.ie_key())
for m in re.finditer(
r'<a[^>]+data-real=(["\'])(?P<url>http.+?)\1', webpage)]
if not entries:
entries = [
# May contain nested playlists (e.g. [1]) thus no explicit
# ie_key
# 1. http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen)
self.url_result(urljoin(url, m.group('url')))
for m in re.finditer(
r'<a[^>]+name=[^>]+href=(["\'])(?P<url>/.+?)\1', webpage)]
title = self._search_regex(
r'episodeShareTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
webpage, 'title',
default=strip_or_none(get_element_by_id(
'video-title', webpage)), group='title')
return self.playlist_result(entries, show_id, title)
class AZMedienShowPlaylistIE(AZMedienBaseIE):
IE_DESC = 'AZ Medien show playlists'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
(?:
all-episodes|
alle-episoden
)/
(?P<id>[^/?#&]+)
'''
_TEST = {
'url': 'http://www.telezueri.ch/all-episodes/astrotalk',
'info_dict': {
'id': 'astrotalk',
'title': 'TeleZüri: AstroTalk - alle episoden',
'description': 'md5:4c0f7e7d741d906004266e295ceb4a26',
},
'playlist_mincount': 13,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
episodes = get_element_by_class('search-mobile-box', webpage)
entries = [self.url_result(
urljoin(url, m.group('url'))) for m in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1', episodes)]
title = self._og_search_title(webpage, fatal=False)
description = self._og_search_description(webpage)
return self.playlist_result(entries, playlist_id, title, description)
return self.url_result(
'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
ie=KalturaIE.ie_key(), video_id=entry_id)

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals
import json
import random
import re
import time
@@ -16,14 +15,18 @@ from ..utils import (
int_or_none,
KNOWN_EXTENSIONS,
parse_filesize,
str_or_none,
try_get,
unescapeHTML,
update_url_query,
unified_strdate,
unified_timestamp,
url_or_none,
)
class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_TESTS = [{
'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
'md5': 'c557841d5e50261777a6585648adf439',
@@ -35,13 +38,44 @@ class BandcampIE(InfoExtractor):
},
'_skip': 'There is a limit of 200 free downloads / month for the test song'
}, {
# free download
'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
'md5': '0369ace6b939f0927e62c67a1a8d9fa7',
'md5': '853e35bf34aa1d6fe2615ae612564b36',
'info_dict': {
'id': '2650410135',
'ext': 'aiff',
'title': 'Ben Prunty - Lanius (Battle)',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Ben Prunty',
'timestamp': 1396508491,
'upload_date': '20140403',
'release_date': '20140403',
'duration': 260.877,
'track': 'Lanius (Battle)',
'track_number': 1,
'track_id': '2650410135',
'artist': 'Ben Prunty',
'album': 'FTL: Advanced Edition Soundtrack',
},
}, {
# no free download, mp3 128
'url': 'https://relapsealumni.bandcamp.com/track/hail-to-fire',
'md5': 'fec12ff55e804bb7f7ebeb77a800c8b7',
'info_dict': {
'id': '2584466013',
'ext': 'mp3',
'title': 'Mastodon - Hail to Fire',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Mastodon',
'timestamp': 1322005399,
'upload_date': '20111122',
'release_date': '20040207',
'duration': 120.79,
'track': 'Hail to Fire',
'track_number': 5,
'track_id': '2584466013',
'artist': 'Mastodon',
'album': 'Call of the Mastodon',
},
}]
@@ -50,19 +84,23 @@ class BandcampIE(InfoExtractor):
title = mobj.group('title')
webpage = self._download_webpage(url, title)
thumbnail = self._html_search_meta('og:image', webpage, default=None)
m_download = re.search(r'freeDownloadPage: "(.*?)"', webpage)
if not m_download:
m_trackinfo = re.search(r'trackinfo: (.+),\s*?\n', webpage)
if m_trackinfo:
json_code = m_trackinfo.group(1)
data = json.loads(json_code)[0]
track_id = compat_str(data['id'])
if not data.get('file'):
raise ExtractorError('Not streamable', video_id=track_id, expected=True)
track_id = None
track = None
track_number = None
duration = None
formats = []
for format_id, format_url in data['file'].items():
formats = []
track_info = self._parse_json(
self._search_regex(
r'trackinfo\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
webpage, 'track info', default='{}'), title)
if track_info:
file_ = track_info.get('file')
if isinstance(file_, dict):
for format_id, format_url in file_.items():
if not url_or_none(format_url):
continue
ext, abr_str = format_id.split('-', 1)
formats.append({
'format_id': format_id,
@@ -72,85 +110,110 @@ class BandcampIE(InfoExtractor):
'acodec': ext,
'abr': int_or_none(abr_str),
})
track = track_info.get('title')
track_id = str_or_none(track_info.get('track_id') or track_info.get('id'))
track_number = int_or_none(track_info.get('track_num'))
duration = float_or_none(track_info.get('duration'))
self._sort_formats(formats)
def extract(key):
return self._search_regex(
r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key,
webpage, key, default=None, group='value')
return {
'id': track_id,
'title': data['title'],
'thumbnail': thumbnail,
'formats': formats,
'duration': float_or_none(data.get('duration')),
}
else:
raise ExtractorError('No free songs found')
artist = extract('artist')
album = extract('album_title')
timestamp = unified_timestamp(
extract('publish_date') or extract('album_publish_date'))
release_date = unified_strdate(extract('album_release_date'))
download_link = m_download.group(1)
video_id = self._search_regex(
r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
webpage, 'video id')
download_link = self._search_regex(
r'freeDownloadPage\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'download link', default=None, group='url')
if download_link:
track_id = self._search_regex(
r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
webpage, 'track id')
download_webpage = self._download_webpage(
download_link, video_id, 'Downloading free downloads page')
download_webpage = self._download_webpage(
download_link, track_id, 'Downloading free downloads page')
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
'blob', group='blob'),
video_id, transform_source=unescapeHTML)
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
'blob', group='blob'),
track_id, transform_source=unescapeHTML)
info = blob['digital_items'][0]
info = try_get(
blob, (lambda x: x['digital_items'][0],
lambda x: x['download_items'][0]), dict)
if info:
downloads = info.get('downloads')
if isinstance(downloads, dict):
if not track:
track = info.get('title')
if not artist:
artist = info.get('artist')
if not thumbnail:
thumbnail = info.get('thumb_url')
downloads = info['downloads']
track = info['title']
download_formats = {}
download_formats_list = blob.get('download_formats')
if isinstance(download_formats_list, list):
for f in blob['download_formats']:
name, ext = f.get('name'), f.get('file_extension')
if all(isinstance(x, compat_str) for x in (name, ext)):
download_formats[name] = ext.strip('.')
artist = info.get('artist')
title = '%s - %s' % (artist, track) if artist else track
for format_id, f in downloads.items():
format_url = f.get('url')
if not format_url:
continue
# Stat URL generation algorithm is reverse engineered from
# download_*_bundle_*.js
stat_url = update_url_query(
format_url.replace('/download/', '/statdownload/'), {
'.rand': int(time.time() * 1000 * random.random()),
})
format_id = f.get('encoding_name') or format_id
stat = self._download_json(
stat_url, track_id, 'Downloading %s JSON' % format_id,
transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1],
fatal=False)
if not stat:
continue
retry_url = url_or_none(stat.get('retry_url'))
if not retry_url:
continue
formats.append({
'url': self._proto_relative_url(retry_url, 'http:'),
'ext': download_formats.get(format_id),
'format_id': format_id,
'format_note': f.get('description'),
'filesize': parse_filesize(f.get('size_mb')),
'vcodec': 'none',
})
download_formats = {}
for f in blob['download_formats']:
name, ext = f.get('name'), f.get('file_extension')
if all(isinstance(x, compat_str) for x in (name, ext)):
download_formats[name] = ext.strip('.')
formats = []
for format_id, f in downloads.items():
format_url = f.get('url')
if not format_url:
continue
# Stat URL generation algorithm is reverse engineered from
# download_*_bundle_*.js
stat_url = update_url_query(
format_url.replace('/download/', '/statdownload/'), {
'.rand': int(time.time() * 1000 * random.random()),
})
format_id = f.get('encoding_name') or format_id
stat = self._download_json(
stat_url, video_id, 'Downloading %s JSON' % format_id,
transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1],
fatal=False)
if not stat:
continue
retry_url = stat.get('retry_url')
if not isinstance(retry_url, compat_str):
continue
formats.append({
'url': self._proto_relative_url(retry_url, 'http:'),
'ext': download_formats.get(format_id),
'format_id': format_id,
'format_note': f.get('description'),
'filesize': parse_filesize(f.get('size_mb')),
'vcodec': 'none',
})
self._sort_formats(formats)
title = '%s - %s' % (artist, track) if artist else track
if not duration:
duration = float_or_none(self._html_search_meta(
'duration', webpage, default=None))
return {
'id': video_id,
'id': track_id,
'title': title,
'thumbnail': info.get('thumb_url') or thumbnail,
'uploader': info.get('artist'),
'artist': artist,
'thumbnail': thumbnail,
'uploader': artist,
'timestamp': timestamp,
'release_date': release_date,
'duration': duration,
'track': track,
'track_number': track_number,
'track_id': track_id,
'artist': artist,
'album': album,
'formats': formats,
}
@@ -306,7 +369,7 @@ class BandcampWeeklyIE(InfoExtractor):
formats = []
for format_id, format_url in show['audio_stream'].items():
if not isinstance(format_url, compat_str):
if not url_or_none(format_url):
continue
for known_ext in KNOWN_EXTENSIONS:
if known_ext in format_id:

View File

@@ -29,7 +29,7 @@ from ..compat import (
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_ID_REGEX = r'[pbw][\da-z]{7}'
_ID_REGEX = r'(?:[pbm][\da-z]{7}|w[\da-z]{7,14})'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?bbc\.co\.uk/
@@ -236,6 +236,12 @@ class BBCCoUkIE(InfoExtractor):
}, {
'url': 'http://www.bbc.co.uk/programmes/w3csv1y9',
'only_matching': True,
}, {
'url': 'https://www.bbc.co.uk/programmes/m00005xn',
'only_matching': True,
}, {
'url': 'https://www.bbc.co.uk/programmes/w172w4dww1jqt5s',
'only_matching': True,
}]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
@@ -778,6 +784,17 @@ class BBCIE(BBCCoUkIE):
'params': {
'skip_download': True,
}
}, {
# window.__PRELOADED_STATE__
'url': 'https://www.bbc.co.uk/radio/play/b0b9z4yl',
'info_dict': {
'id': 'b0b9z4vz',
'ext': 'mp4',
'title': 'Prom 6: An American in Paris and Turangalila',
'description': 'md5:51cf7d6f5c8553f197e58203bc78dff8',
'uploader': 'Radio 3',
'uploader_id': 'bbc_radio_three',
},
}]
@classmethod
@@ -1000,6 +1017,36 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
preload_state = self._parse_json(self._search_regex(
r'window\.__PRELOADED_STATE__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), playlist_id, fatal=False)
if preload_state:
current_programme = preload_state.get('programmes', {}).get('current') or {}
programme_id = current_programme.get('id')
if current_programme and programme_id and current_programme.get('type') == 'playable_item':
title = current_programme.get('titles', {}).get('tertiary') or playlist_title
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
synopses = current_programme.get('synopses') or {}
network = current_programme.get('network') or {}
duration = int_or_none(
current_programme.get('duration', {}).get('value'))
thumbnail = None
image_url = current_programme.get('image_url')
if image_url:
thumbnail = image_url.replace('{recipe}', '1920x1920')
return {
'id': programme_id,
'title': title,
'description': dict_get(synopses, ('long', 'medium', 'short')),
'thumbnail': thumbnail,
'duration': duration,
'uploader': network.get('short_title'),
'uploader_id': network.get('id'),
'formats': formats,
'subtitles': subtitles,
}
bbc3_config = self._parse_json(
self._search_regex(
r'(?s)bbcthreeConfig\s*=\s*({.+?})\s*;\s*<', webpage,

View File

@@ -0,0 +1,120 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
from ..utils import urlencode_postdata
class BitChuteIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bitchute\.com/(?:video|embed|torrent/[^/]+)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.bitchute.com/video/szoMrox2JEI/',
'md5': '66c4a70e6bfc40dcb6be3eb1d74939eb',
'info_dict': {
'id': 'szoMrox2JEI',
'ext': 'mp4',
'title': 'Fuck bitches get money',
'description': 'md5:3f21f6fb5b1d17c3dee9cf6b5fe60b3a',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Victoria X Rave',
},
}, {
'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/',
'only_matching': True,
}, {
'url': 'https://www.bitchute.com/torrent/Zee5BE49045h/szoMrox2JEI.webtorrent',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.bitchute.com/video/%s' % video_id, video_id, headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.57 Safari/537.36',
})
title = self._html_search_regex(
(r'<[^>]+\bid=["\']video-title[^>]+>([^<]+)', r'<title>([^<]+)'),
webpage, 'title', default=None) or self._html_search_meta(
'description', webpage, 'title',
default=None) or self._og_search_description(webpage)
formats = [
{'url': mobj.group('url')}
for mobj in re.finditer(
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage)]
self._sort_formats(formats)
description = self._html_search_regex(
r'(?s)<div\b[^>]+\bclass=["\']full hidden[^>]+>(.+?)</div>',
webpage, 'description', fatal=False)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_meta(
'twitter:image:src', webpage, 'thumbnail')
uploader = self._html_search_regex(
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>', webpage,
'uploader', fatal=False)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'formats': formats,
}
class BitChuteChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bitchute\.com/channel/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.bitchute.com/channel/victoriaxrave/',
'playlist_mincount': 185,
'info_dict': {
'id': 'victoriaxrave',
},
}
_TOKEN = 'zyG6tQcGPE5swyAEFLqKUwMuMMuF6IO2DZ6ZDQjGfsL0e4dcTLwqkTTul05Jdve7'
def _entries(self, channel_id):
channel_url = 'https://www.bitchute.com/channel/%s/' % channel_id
offset = 0
for page_num in itertools.count(1):
data = self._download_json(
'%sextend/' % channel_url, channel_id,
'Downloading channel page %d' % page_num,
data=urlencode_postdata({
'csrfmiddlewaretoken': self._TOKEN,
'name': '',
'offset': offset,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': channel_url,
'X-Requested-With': 'XMLHttpRequest',
'Cookie': 'csrftoken=%s' % self._TOKEN,
})
if data.get('success') is False:
break
html = data.get('html')
if not html:
break
video_ids = re.findall(
r'class=["\']channel-videos-image-container[^>]+>\s*<a\b[^>]+\bhref=["\']/video/([^"\'/]+)',
html)
if not video_ids:
break
offset += len(video_ids)
for video_id in video_ids:
yield self.url_result(
'https://www.bitchute.com/video/%s' % video_id,
ie=BitChuteIE.ie_key(), video_id=video_id)
def _real_extract(self, url):
channel_id = self._match_id(url)
return self.playlist_result(
self._entries(channel_id), playlist_id=channel_id)

View File

@@ -4,8 +4,10 @@ import re
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_str
from ..utils import int_or_none
from ..utils import (
int_or_none,
url_or_none,
)
class BreakIE(InfoExtractor):
@@ -55,8 +57,8 @@ class BreakIE(InfoExtractor):
formats = []
for video in content:
video_url = video.get('url')
if not video_url or not isinstance(video_url, compat_str):
video_url = url_or_none(video.get('url'))
if not video_url:
continue
bitrate = int_or_none(self._search_regex(
r'(\d+)_kbps', video_url, 'tbr', default=None))

View File

@@ -1,8 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import base64
import json
import re
import struct
from .common import InfoExtractor
from .adobepass import AdobePassIE
@@ -310,6 +312,10 @@ class BrightcoveLegacyIE(InfoExtractor):
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _brightcove_new_url_result(self, publisher_id, video_id):
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
@@ -323,6 +329,28 @@ class BrightcoveLegacyIE(InfoExtractor):
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
'error message', default=None)
if error_msg is not None:
publisher_id = query.get('publisherId')
if publisher_id and publisher_id[0].isdigit():
publisher_id = publisher_id[0]
if not publisher_id:
player_key = query.get('playerKey')
if player_key and ',' in player_key[0]:
player_key = player_key[0]
else:
player_id = query.get('playerID')
if player_id and player_id[0].isdigit():
player_page = self._download_webpage(
'http://link.brightcove.com/services/player/bcpid' + player_id[0],
video_id, headers=headers, fatal=False)
if player_page:
player_key = self._search_regex(
r'<param\s+name="playerKey"\s+value="([\w~,-]+)"',
player_page, 'player key', fatal=False)
if player_key:
enc_pub_id = player_key.split(',')[1].replace('~', '=')
publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0]
if publisher_id:
return self._brightcove_new_url_result(publisher_id, video_id)
raise ExtractorError(
'brightcove said: %s' % error_msg, expected=True)
@@ -444,8 +472,12 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
return ad_info
if 'url' not in info and not info.get('formats'):
raise ExtractorError('Unable to extract video url for %s' % video_id)
if not info.get('url') and not info.get('formats'):
uploader_id = info.get('uploader_id')
if uploader_id:
info.update(self._brightcove_new_url_result(uploader_id, video_id))
else:
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info

View File

@@ -2,10 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
url_or_none,
)
@@ -56,8 +56,8 @@ class CamModelsIE(InfoExtractor):
for media in encodings:
if not isinstance(media, dict):
continue
media_url = media.get('location')
if not media_url or not isinstance(media_url, compat_str):
media_url = url_or_none(media.get('location'))
if not media_url:
continue
format_id_list = [format_id]

View File

@@ -11,6 +11,7 @@ from ..utils import (
strip_or_none,
float_or_none,
int_or_none,
merge_dicts,
parse_iso8601,
)
@@ -248,9 +249,13 @@ class VrtNUIE(GigyaBaseIE):
webpage, urlh = self._download_webpage_handle(url, display_id)
title = self._html_search_regex(
info = self._search_json_ld(webpage, display_id, default={})
# title is optional here since it may be extracted by extractor
# that is delegated from here
title = strip_or_none(self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>',
webpage, 'title').strip()
webpage, 'title', default=None))
description = self._html_search_regex(
r'(?ms)<div class="content__description">(.+?)</div>',
@@ -295,7 +300,7 @@ class VrtNUIE(GigyaBaseIE):
# the first one
video_id = list(video.values())[0].get('videoid')
return {
return merge_dicts(info, {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(),
@@ -307,4 +312,4 @@ class VrtNUIE(GigyaBaseIE):
'season_number': season_number,
'episode_number': episode_number,
'release_date': release_date,
}
})

View File

@@ -4,13 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
int_or_none,
parse_duration,
parse_iso8601,
parse_resolution,
url_or_none,
)
@@ -53,8 +53,8 @@ class CCMAIE(InfoExtractor):
media_url = media['media']['url']
if isinstance(media_url, list):
for format_ in media_url:
format_url = format_.get('file')
if not format_url or not isinstance(format_url, compat_str):
format_url = url_or_none(format_.get('file'))
if not format_url:
continue
label = format_.get('label')
f = parse_resolution(label)

View File

@@ -108,7 +108,7 @@ class CeskaTelevizeIE(InfoExtractor):
for user_agent in (None, USER_AGENTS['Safari']):
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
'https://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')

View File

@@ -0,0 +1,142 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
clean_html,
float_or_none,
int_or_none,
try_get,
urlencode_postdata,
)
class CiscoLiveBaseIE(InfoExtractor):
# These appear to be constant across all Cisco Live presentations
# and are not tied to any user session or event
RAINFOCUS_API_URL = 'https://events.rainfocus.com/api/%s'
RAINFOCUS_API_PROFILE_ID = 'Na3vqYdAlJFSxhYTYQGuMbpafMqftalz'
RAINFOCUS_WIDGET_ID = 'n6l4Lo05R8fiy3RpUBm447dZN8uNWoye'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5647924234001/SyK2FdqjM_default/index.html?videoId=%s'
HEADERS = {
'Origin': 'https://ciscolive.cisco.com',
'rfApiProfileId': RAINFOCUS_API_PROFILE_ID,
'rfWidgetId': RAINFOCUS_WIDGET_ID,
}
def _call_api(self, ep, rf_id, query, referrer, note=None):
headers = self.HEADERS.copy()
headers['Referer'] = referrer
return self._download_json(
self.RAINFOCUS_API_URL % ep, rf_id, note=note,
data=urlencode_postdata(query), headers=headers)
def _parse_rf_item(self, rf_item):
event_name = rf_item.get('eventName')
title = rf_item['title']
description = clean_html(rf_item.get('abstract'))
presenter_name = try_get(rf_item, lambda x: x['participants'][0]['fullName'])
bc_id = rf_item['videos'][0]['url']
bc_url = self.BRIGHTCOVE_URL_TEMPLATE % bc_id
duration = float_or_none(try_get(rf_item, lambda x: x['times'][0]['length']))
location = try_get(rf_item, lambda x: x['times'][0]['room'])
if duration:
duration = duration * 60
return {
'_type': 'url_transparent',
'url': bc_url,
'ie_key': 'BrightcoveNew',
'title': title,
'description': description,
'duration': duration,
'creator': presenter_name,
'location': location,
'series': event_name,
}
class CiscoLiveSessionIE(CiscoLiveBaseIE):
_VALID_URL = r'https?://ciscolive\.cisco\.com/on-demand-library/\??[^#]*#/session/(?P<id>[^/?&]+)'
_TEST = {
'url': 'https://ciscolive.cisco.com/on-demand-library/?#/session/1423353499155001FoSs',
'md5': 'c98acf395ed9c9f766941c70f5352e22',
'info_dict': {
'id': '5803694304001',
'ext': 'mp4',
'title': '13 Smart Automations to Monitor Your Cisco IOS Network',
'description': 'md5:ec4a436019e09a918dec17714803f7cc',
'timestamp': 1530305395,
'upload_date': '20180629',
'uploader_id': '5647924234001',
'location': '16B Mezz.',
},
}
def _real_extract(self, url):
rf_id = self._match_id(url)
rf_result = self._call_api('session', rf_id, {'id': rf_id}, url)
return self._parse_rf_item(rf_result['items'][0])
class CiscoLiveSearchIE(CiscoLiveBaseIE):
_VALID_URL = r'https?://ciscolive\.cisco\.com/on-demand-library/'
_TESTS = [{
'url': 'https://ciscolive.cisco.com/on-demand-library/?search.event=ciscoliveus2018&search.technicallevel=scpsSkillLevel_aintroductory&search.focus=scpsSessionFocus_designAndDeployment#/',
'info_dict': {
'title': 'Search query',
},
'playlist_count': 5,
}, {
'url': 'https://ciscolive.cisco.com/on-demand-library/?search.technology=scpsTechnology_applicationDevelopment&search.technology=scpsTechnology_ipv6&search.focus=scpsSessionFocus_troubleshootingTroubleshooting#/',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if CiscoLiveSessionIE.suitable(url) else super(CiscoLiveSearchIE, cls).suitable(url)
@staticmethod
def _check_bc_id_exists(rf_item):
return int_or_none(try_get(rf_item, lambda x: x['videos'][0]['url'])) is not None
def _entries(self, query, url):
query['size'] = 50
query['from'] = 0
for page_num in itertools.count(1):
results = self._call_api(
'search', None, query, url,
'Downloading search JSON page %d' % page_num)
sl = try_get(results, lambda x: x['sectionList'][0], dict)
if sl:
results = sl
items = results.get('items')
if not items or not isinstance(items, list):
break
for item in items:
if not isinstance(item, dict):
continue
if not self._check_bc_id_exists(item):
continue
yield self._parse_rf_item(item)
size = int_or_none(results.get('size'))
if size is not None:
query['size'] = size
total = int_or_none(results.get('total'))
if total is not None and query['from'] + query['size'] > total:
break
query['from'] += query['size']
def _real_extract(self, url):
query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
query['type'] = 'session'
return self.playlist_result(
self._entries(query, url), playlist_title='Search query')

View File

@@ -1,19 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
_translation_table = {
'a': 'h', 'd': 'e', 'e': 'v', 'f': 'o', 'g': 'f', 'i': 'd', 'l': 'n',
'm': 'a', 'n': 'm', 'p': 'u', 'q': 't', 'r': 's', 'v': 'p', 'x': 'r',
'y': 'l', 'z': 'i',
'$': ':', '&': '.', '(': '=', '^': '&', '=': '/',
}
def _decode(s):
return ''.join(_translation_table.get(c, c) for c in s)
from ..utils import (
int_or_none,
url_or_none,
)
class CliphunterIE(InfoExtractor):
@@ -60,14 +51,14 @@ class CliphunterIE(InfoExtractor):
formats = []
for format_id, f in gexo_files.items():
video_url = f.get('url')
video_url = url_or_none(f.get('url'))
if not video_url:
continue
fmt = f.get('fmt')
height = f.get('h')
format_id = '%s_%sp' % (fmt, height) if fmt and height else format_id
formats.append({
'url': _decode(video_url),
'url': video_url,
'format_id': format_id,
'width': int_or_none(f.get('w')),
'height': int_or_none(height),

View File

@@ -1,15 +1,19 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
parse_iso8601,
unified_timestamp,
)
class ClypIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clyp\.it/(?P<id>[a-z0-9]+)'
_TEST = {
_TESTS = [{
'url': 'https://clyp.it/ojz2wfah',
'md5': '1d4961036c41247ecfdcc439c0cddcbb',
'info_dict': {
@@ -21,13 +25,34 @@ class ClypIE(InfoExtractor):
'timestamp': 1443515251,
'upload_date': '20150929',
},
}
}, {
'url': 'https://clyp.it/b04p1odi?token=b0078e077e15835845c528a44417719d',
'info_dict': {
'id': 'b04p1odi',
'ext': 'mp3',
'title': 'GJ! (Reward Edit)',
'description': 'Metal Resistance (THE ONE edition)',
'duration': 177.789,
'timestamp': 1528241278,
'upload_date': '20180605',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
token = qs.get('token', [None])[0]
query = {}
if token:
query['token'] = token
metadata = self._download_json(
'https://api.clyp.it/%s' % audio_id, audio_id)
'https://api.clyp.it/%s' % audio_id, audio_id, query=query)
formats = []
for secure in ('', 'Secure'):
@@ -45,7 +70,7 @@ class ClypIE(InfoExtractor):
title = metadata['Title']
description = metadata.get('Description')
duration = float_or_none(metadata.get('Duration'))
timestamp = parse_iso8601(metadata.get('DateCreated'))
timestamp = unified_timestamp(metadata.get('DateCreated'))
return {
'id': audio_id,

View File

@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
@@ -34,3 +35,32 @@ class CNBCIE(InfoExtractor):
{'force_smil_url': True}),
'id': video_id,
}
class CNBCVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cnbc\.com/video/(?:[^/]+/)+(?P<id>[^./?#&]+)'
_TEST = {
'url': 'https://www.cnbc.com/video/2018/07/19/trump-i-dont-necessarily-agree-with-raising-rates.html',
'info_dict': {
'id': '7000031301',
'ext': 'mp4',
'title': "Trump: I don't necessarily agree with raising rates",
'description': 'md5:878d8f0b4ebb5bb1dda3514b91b49de3',
'timestamp': 1531958400,
'upload_date': '20180719',
'uploader': 'NBCU-CNBC',
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'content_id["\']\s*:\s*["\'](\d+)', webpage, display_id,
'video id')
return self.url_result(
'http://video.cnbc.com/gallery/?video=%s' % video_id,
CNBCIE.ie_key())

View File

@@ -69,6 +69,7 @@ from ..utils import (
update_url_query,
urljoin,
url_basename,
url_or_none,
xpath_element,
xpath_text,
xpath_with_ns,
@@ -211,6 +212,11 @@ class InfoExtractor(object):
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
uploader_url: Full URL to a personal webpage of the video uploader.
channel: Full name of the channel the video is uploaded on.
Note that channel fields may or may not repeat uploader
fields. This depends on a particular extractor.
channel_id: Id of the channel.
channel_url: Full URL to a channel webpage.
location: Physical location where the video was filmed.
subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and
@@ -600,6 +606,11 @@ class InfoExtractor(object):
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status):
# Retain reference to error to prevent file object from
# being closed before it can be read. Works around the
# effects of <https://bugs.python.org/issue15002>
# introduced in Python 3.4.1.
err.fp._error = err
return err.fp
if errnote is False:
@@ -1208,10 +1219,10 @@ class InfoExtractor(object):
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
info.update({
'url': e.get('contentUrl'),
'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': e.get('thumbnailUrl') or e.get('thumbnailURL'),
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
@@ -1701,9 +1712,9 @@ class InfoExtractor(object):
# However, this is not always respected, for example, [2]
# contains EXT-X-STREAM-INF tag which references AUDIO
# rendition group but does not have CODECS and despite
# referencing audio group an audio group, it represents
# a complete (with audio and video) format. So, for such cases
# we will ignore references to rendition groups and treat them
# referencing an audio group it represents a complete
# (with audio and video) format. So, for such cases we will
# ignore references to rendition groups and treat them
# as complete formats.
if audio_group_id and codecs and f.get('vcodec') != 'none':
audio_group = groups.get(audio_group_id)
@@ -1859,9 +1870,7 @@ class InfoExtractor(object):
'height': height,
})
formats.extend(m3u8_formats)
continue
if src_ext == 'f4m':
elif src_ext == 'f4m':
f4m_url = src_url
if not f4m_params:
f4m_params = {
@@ -1871,9 +1880,13 @@ class InfoExtractor(object):
f4m_url += '&' if '?' in f4m_url else '?'
f4m_url += compat_urllib_parse_urlencode(f4m_params)
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
continue
if src_url.startswith('http') and self._is_valid_url(src, video_id):
elif src_ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src_url, video_id, mpd_id='dash', fatal=False))
elif re.search(r'\.ism/[Mm]anifest', src_url):
formats.extend(self._extract_ism_formats(
src_url, video_id, ism_id='mss', fatal=False))
elif src_url.startswith('http') and self._is_valid_url(src, video_id):
http_count += 1
formats.append({
'url': src_url,
@@ -1884,7 +1897,6 @@ class InfoExtractor(object):
'width': width,
'height': height,
})
continue
return formats

View File

@@ -4,16 +4,14 @@ from __future__ import unicode_literals, division
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_HTTPError,
)
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_age_limit,
parse_duration,
url_or_none,
ExtractorError
)
@@ -86,8 +84,8 @@ class CrackleIE(InfoExtractor):
for e in media['MediaURLs']:
if e.get('UseDRM') is True:
continue
format_url = e.get('Path')
if not format_url or not isinstance(format_url, compat_str):
format_url = url_or_none(e.get('Path'))
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'm3u8':
@@ -124,8 +122,8 @@ class CrackleIE(InfoExtractor):
for cc_file in cc_files:
if not isinstance(cc_file, dict):
continue
cc_url = cc_file.get('Path')
if not cc_url or not isinstance(cc_url, compat_str):
cc_url = url_or_none(cc_file.get('Path'))
if not cc_url:
continue
lang = cc_file.get('Locale') or 'en'
subtitles.setdefault(lang, []).append({'url': cc_url})

View File

@@ -3,11 +3,13 @@ from __future__ import unicode_literals
import re
import json
import xml.etree.ElementTree as etree
import zlib
from hashlib import sha1
from math import pow, sqrt, floor
from .common import InfoExtractor
from .vrv import VRVIE
from ..compat import (
compat_b64decode,
compat_etree_fromstring,
@@ -18,6 +20,8 @@ from ..compat import (
from ..utils import (
ExtractorError,
bytes_to_intlist,
extract_attributes,
float_or_none,
intlist_to_bytes,
int_or_none,
lowercase_escape,
@@ -26,7 +30,6 @@ from ..utils import (
unified_strdate,
urlencode_postdata,
xpath_text,
extract_attributes,
)
from ..aes import (
aes_cbc_decrypt,
@@ -43,7 +46,7 @@ class CrunchyrollBaseIE(InfoExtractor):
data['req'] = 'RpcApi' + method
data = compat_urllib_parse_urlencode(data).encode('utf-8')
return self._download_xml(
'http://www.crunchyroll.com/xml/',
'https://www.crunchyroll.com/xml/',
video_id, note, fatal=False, data=data, headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
@@ -139,7 +142,8 @@ class CrunchyrollBaseIE(InfoExtractor):
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
class CrunchyrollIE(CrunchyrollBaseIE):
class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
@@ -148,7 +152,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
'ext': 'mp4',
'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
'description': 'md5:2d17137920c64f2f49981a7797d275ef',
'thumbnail': 'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Yomiuri Telecasting Corporation (YTV)',
'upload_date': '20131013',
'url': 're:(?!.*&amp)',
@@ -221,7 +225,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
'info_dict': {
'id': '535080',
'ext': 'mp4',
'title': '11eyes Episode 1 Piros éjszaka - Red Night',
'title': '11eyes Episode 1 Red Night ~ Piros éjszaka',
'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
'uploader': 'Marvelous AQL Inc.',
'upload_date': '20091021',
@@ -262,6 +266,9 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# Just test metadata extraction
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.com/media-723735',
'only_matching': True,
}]
_FORMAT_IDS = {
@@ -392,7 +399,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'Downloading subtitles for ' + sub_name, data={
'subtitle_script_id': sub_id,
})
if sub_doc is None:
if not isinstance(sub_doc, etree.Element):
continue
sid = sub_doc.get('id')
iv = xpath_text(sub_doc, 'iv', 'subtitle iv')
@@ -434,13 +441,22 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
if 'To view this, please log in to verify you are 18 or older.' in webpage:
self.raise_login_required()
media = self._parse_json(self._search_regex(
r'vilos\.config\.media\s*=\s*({.+?});',
webpage, 'vilos media', default='{}'), video_id)
media_metadata = media.get('metadata') or {}
language = self._search_regex(
r'(?:vilos\.config\.player\.language|LOCALE)\s*=\s*(["\'])(?P<lang>(?:(?!\1).)+)\1',
webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex(
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title')
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = self._parse_json(self._html_search_regex(
video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id).get('description')
webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex(
@@ -453,92 +469,113 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', fatal=False)
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
attrs = extract_attributes(a)
href = attrs.get('href')
if href and '/freetrial' in href:
continue
available_fmts.append(fmt)
if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
video_encode_ids = []
formats = []
for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
stream_infos = []
streamdata = self._call_rpc_api(
'VideoPlayer_GetStandardConfig', video_id,
'Downloading media info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_quality': stream_quality,
'current_page': url,
})
if streamdata is not None:
stream_info = streamdata.find('./{default}preload/stream_info')
if stream_info is not None:
for stream in media.get('streams', []):
audio_lang = stream.get('audio_lang')
hardsub_lang = stream.get('hardsub_lang')
vrv_formats = self._extract_vrv_formats(
stream.get('url'), video_id, stream.get('format'),
audio_lang, hardsub_lang)
for f in vrv_formats:
if not hardsub_lang:
f['preference'] = 1
language_preference = 0
if audio_lang == language:
language_preference += 1
if hardsub_lang == language:
language_preference += 1
if language_preference:
f['language_preference'] = language_preference
formats.extend(vrv_formats)
if not formats:
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
attrs = extract_attributes(a)
href = attrs.get('href')
if href and '/freetrial' in href:
continue
available_fmts.append(fmt)
if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
if not available_fmts:
available_fmts = self._FORMAT_IDS.keys()
video_encode_ids = []
for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
stream_infos = []
streamdata = self._call_rpc_api(
'VideoPlayer_GetStandardConfig', video_id,
'Downloading media info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_quality': stream_quality,
'current_page': url,
})
if isinstance(streamdata, etree.Element):
stream_info = streamdata.find('./{default}preload/stream_info')
if stream_info is not None:
stream_infos.append(stream_info)
stream_info = self._call_rpc_api(
'VideoEncode_GetStreamInfo', video_id,
'Downloading stream info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_encode_quality': stream_quality,
})
if isinstance(stream_info, etree.Element):
stream_infos.append(stream_info)
stream_info = self._call_rpc_api(
'VideoEncode_GetStreamInfo', video_id,
'Downloading stream info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_encode_quality': stream_quality,
})
if stream_info is not None:
stream_infos.append(stream_info)
for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_file = xpath_text(stream_info, './file')
if not video_file:
continue
if video_file.startswith('http'):
formats.extend(self._extract_m3u8_formats(
video_file, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
video_url = xpath_text(stream_info, './host')
if not video_url:
continue
metadata = stream_info.find('./metadata')
format_info = {
'format': video_format,
'height': int_or_none(xpath_text(metadata, './height')),
'width': int_or_none(xpath_text(metadata, './width')),
}
if '.fplive.net/' in video_url:
video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
parsed_video_url = compat_urlparse.urlparse(video_url)
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'format_id': 'http-' + video_format,
'url': direct_video_url,
})
formats.append(format_info)
video_file = xpath_text(stream_info, './file')
if not video_file:
continue
if video_file.startswith('http'):
formats.extend(self._extract_m3u8_formats(
video_file, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
format_info.update({
'format_id': 'rtmp-' + video_format,
'url': video_url,
'play_path': video_file,
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats, ('height', 'width', 'tbr', 'fps'))
video_url = xpath_text(stream_info, './host')
if not video_url:
continue
metadata = stream_info.find('./metadata')
format_info = {
'format': video_format,
'height': int_or_none(xpath_text(metadata, './height')),
'width': int_or_none(xpath_text(metadata, './width')),
}
if '.fplive.net/' in video_url:
video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
parsed_video_url = compat_urlparse.urlparse(video_url)
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'format_id': 'http-' + video_format,
'url': direct_video_url,
})
formats.append(format_info)
continue
format_info.update({
'format_id': 'rtmp-' + video_format,
'url': video_url,
'play_path': video_file,
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats, ('preference', 'language_preference', 'height', 'width', 'tbr', 'fps'))
metadata = self._call_rpc_api(
'VideoPlayer_GetMediaMetadata', video_id,
@@ -546,16 +583,38 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'media_id': video_id,
})
subtitles = self.extract_subtitles(video_id, webpage)
subtitles = {}
for subtitle in media.get('subtitles', []):
subtitle_url = subtitle.get('url')
if not subtitle_url:
continue
subtitles.setdefault(subtitle.get('language', 'enUS'), []).append({
'url': subtitle_url,
'ext': subtitle.get('format', 'ass'),
})
if not subtitles:
subtitles = self.extract_subtitles(video_id, webpage)
# webpage provide more accurate data than series_title from XML
series = self._html_search_regex(
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
webpage, 'series', fatal=False)
season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
season = episode = episode_number = duration = thumbnail = None
if isinstance(metadata, etree.Element):
season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
duration = float_or_none(media_metadata.get('duration'), 1000)
thumbnail = xpath_text(metadata, 'episode_image_url')
if not episode:
episode = media_metadata.get('title')
if not episode_number:
episode_number = int_or_none(media_metadata.get('episode_number'))
if not thumbnail:
thumbnail = media_metadata.get('thumbnail', {}).get('url')
season_number = int_or_none(self._search_regex(
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
@@ -565,7 +624,8 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'id': video_id,
'title': video_title,
'description': video_description,
'thumbnail': xpath_text(metadata, 'episode_image_url'),
'duration': duration,
'thumbnail': thumbnail,
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series,
@@ -580,7 +640,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login))(?P<id>[\w\-]+))/?(?:\?|$)'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login|media-\d+))(?P<id>[\w\-]+))/?(?:\?|$)'
_TESTS = [{
'url': 'http://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',

View File

@@ -3,8 +3,12 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
smuggle_url,
str_or_none,
)
@@ -40,10 +44,15 @@ class CWTVIE(InfoExtractor):
'duration': 1263,
'series': 'Whose Line Is It Anyway?',
'season_number': 11,
'season': '11',
'episode_number': 20,
'upload_date': '20151006',
'timestamp': 1444107300,
'age_limit': 14,
'uploader': 'CWTV',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
@@ -58,60 +67,31 @@ class CWTVIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = None
formats = []
for partner in (154, 213):
vdata = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False)
if not vdata:
continue
video_data = vdata
for quality, quality_data in vdata.get('videos', {}).items():
quality_url = quality_data.get('uri')
if not quality_url:
continue
if quality == 'variantplaylist':
formats.extend(self._extract_m3u8_formats(
quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
tbr = int_or_none(quality_data.get('bitrate'))
format_id = 'http' + ('-%d' % tbr if tbr else '')
if self._is_valid_url(quality_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': quality_url,
'tbr': tbr,
})
video_metadata = video_data['assetFields']
ism_url = video_metadata.get('smoothStreamingUrl')
if ism_url:
formats.extend(self._extract_ism_formats(
ism_url, video_id, ism_id='mss', fatal=False))
self._sort_formats(formats)
data = self._download_json(
'http://images.cwtv.com/feed/mobileapp/video-meta/apiversion_8/guid_' + video_id,
video_id)
if data.get('result') != 'ok':
raise ExtractorError(data['msg'], expected=True)
video_data = data['video']
title = video_data['title']
mpx_url = video_data.get('mpx_url') or 'http://link.theplatform.com/s/cwtv/media/guid/2703454149/%s?formats=M3U' % video_id
thumbnails = [{
'url': image['uri'],
'width': image.get('width'),
'height': image.get('height'),
} for image_id, image in video_data['images'].items() if image.get('uri')] if video_data.get('images') else None
subtitles = {
'en': [{
'url': video_metadata['UnicornCcUrl'],
}],
} if video_metadata.get('UnicornCcUrl') else None
season = str_or_none(video_data.get('season'))
episode = str_or_none(video_data.get('episode'))
if episode and season:
episode = episode.lstrip(season)
return {
'_type': 'url_transparent',
'id': video_id,
'title': video_metadata['title'],
'description': video_metadata.get('description'),
'duration': int_or_none(video_metadata.get('duration')),
'series': video_metadata.get('seriesName'),
'season_number': int_or_none(video_metadata.get('seasonNumber')),
'season': video_metadata.get('seasonName'),
'episode_number': int_or_none(video_metadata.get('episodeNumber')),
'timestamp': parse_iso8601(video_data.get('startTime')),
'thumbnails': thumbnails,
'formats': formats,
'subtitles': subtitles,
'title': title,
'url': smuggle_url(mpx_url, {'force_smil_url': True}),
'description': video_data.get('description_long'),
'duration': int_or_none(video_data.get('duration_secs')),
'series': video_data.get('series_name'),
'season_number': int_or_none(season),
'episode_number': int_or_none(episode),
'timestamp': parse_iso8601(video_data.get('start_time')),
'age_limit': parse_age_limit(video_data.get('rating')),
'ie_key': 'ThePlatform',
}

View File

@@ -49,6 +49,9 @@ class DailyMailIE(InfoExtractor):
'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id)
video_sources = self._download_json(sources_url, video_id)
body = video_sources.get('body')
if body:
video_sources = body
formats = []
for rendition in video_sources['renditions']:

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import base64
import functools
import hashlib
import itertools
import json
@@ -16,11 +17,16 @@ from ..utils import (
error_to_compat_str,
ExtractorError,
int_or_none,
mimetype2ext,
OnDemandPagedList,
parse_iso8601,
sanitized_Request,
str_to_int,
try_get,
unescapeHTML,
mimetype2ext,
update_url_query,
url_or_none,
urlencode_postdata,
)
@@ -144,7 +150,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
age_limit = self._rta_search(webpage)
description = self._og_search_description(webpage) or self._html_search_meta(
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage, 'description')
view_count_str = self._search_regex(
@@ -167,10 +174,25 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)
metadata = player['metadata']
player = self._parse_json(player_v5, video_id, fatal=False) or {}
metadata = try_get(player, lambda x: x['metadata'], dict)
if not metadata:
metadata_url = url_or_none(try_get(
player, lambda x: x['context']['metadata_template_url1']))
if metadata_url:
metadata_url = metadata_url.replace(':videoId', video_id)
else:
metadata_url = update_url_query(
'https://www.dailymotion.com/player/metadata/video/%s'
% video_id, {
'embedder': url,
'integration': 'inline',
'GK_PV5_NEON': '1',
})
metadata = self._download_json(
metadata_url, video_id, 'Downloading metadata JSON')
if metadata.get('error', {}).get('type') == 'password_protected':
if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)
@@ -342,17 +364,93 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>[^/?#&]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': {
'title': 'SPORT',
'id': 'xv4bw_nqtv_sport',
'id': 'xv4bw',
},
'playlist_mincount': 20,
}]
_PAGE_SIZE = 100
def _fetch_page(self, playlist_id, authorizaion, page):
page += 1
videos = self._download_json(
'https://graphql.api.dailymotion.com',
playlist_id, 'Downloading page %d' % page,
data=json.dumps({
'query': '''{
collection(xid: "%s") {
videos(first: %d, page: %d) {
pageInfo {
hasNextPage
nextPage
}
edges {
node {
xid
url
}
}
}
}
}''' % (playlist_id, self._PAGE_SIZE, page)
}).encode(), headers={
'Authorization': authorizaion,
'Origin': 'https://www.dailymotion.com',
})['data']['collection']['videos']
for edge in videos['edges']:
node = edge['node']
yield self.url_result(
node['url'], DailymotionIE.ie_key(), node['xid'])
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
api = self._parse_json(self._search_regex(
r'__PLAYER_CONFIG__\s*=\s*({.+?});',
webpage, 'player config'), playlist_id)['context']['api']
auth = self._download_json(
api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
playlist_id, data=urlencode_postdata({
'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
'grant_type': 'client_credentials',
}))
authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
entries = OnDemandPagedList(functools.partial(
self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE)
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage))
class DailymotionUserIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
'title': 'Rémi Gaillard',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'skip': 'Takes too long time',
}]
def _extract_entries(self, id):
video_ids = set()
@@ -378,43 +476,6 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
'title': 'Rémi Gaillard',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'skip': 'Takes too long time',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')

View File

@@ -7,6 +7,7 @@ from ..utils import (
float_or_none,
int_or_none,
unified_timestamp,
url_or_none,
)
@@ -69,7 +70,7 @@ class DctpTvIE(InfoExtractor):
endpoint = next(
server['endpoint']
for server in servers
if isinstance(server.get('endpoint'), compat_str) and
if url_or_none(server.get('endpoint')) and
'cloudfront' in server['endpoint'])
else:
endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
@@ -92,8 +93,8 @@ class DctpTvIE(InfoExtractor):
for image in images:
if not isinstance(image, dict):
continue
image_url = image.get('url')
if not image_url or not isinstance(image_url, compat_str):
image_url = url_or_none(image.get('url'))
if not image_url:
continue
thumbnails.append({
'url': image_url,

View File

@@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
extract_attributes,
@@ -12,6 +11,7 @@ from ..utils import (
parse_age_limit,
remove_end,
unescapeHTML,
url_or_none,
)
@@ -69,9 +69,8 @@ class DiscoveryGoBaseIE(InfoExtractor):
captions = stream.get('captions')
if isinstance(captions, list):
for caption in captions:
subtitle_url = caption.get('fileUrl')
if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
not subtitle_url.startswith('http')):
subtitle_url = url_or_none(caption.get('fileUrl'))
if not subtitle_url or not subtitle_url.startswith('http'):
continue
lang = caption.get('fileLang', 'en')
ext = determine_ext(subtitle_url)

View File

@@ -7,7 +7,6 @@ import json
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..utils import (
@@ -17,6 +16,7 @@ from ..utils import (
parse_age_limit,
parse_duration,
unified_timestamp,
url_or_none,
)
@@ -139,8 +139,8 @@ class DramaFeverIE(DramaFeverBaseIE):
for sub in subs:
if not isinstance(sub, dict):
continue
sub_url = sub.get('url')
if not sub_url or not isinstance(sub_url, compat_str):
sub_url = url_or_none(sub.get('url'))
if not sub_url:
continue
subtitles.setdefault(
sub.get('code') or sub.get('language') or 'en', []).append({
@@ -163,8 +163,8 @@ class DramaFeverIE(DramaFeverBaseIE):
for format_id, format_dict in download_assets.items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url or not isinstance(format_url, compat_str):
format_url = url_or_none(format_dict.get('url'))
if not format_url:
continue
formats.append({
'url': format_url,

View File

@@ -59,7 +59,7 @@ class DTubeIE(InfoExtractor):
try:
self.to_screen('%s: Checking %s video format URL' % (video_id, format_id))
self._downloader._opener.open(video_url, timeout=5).close()
except timeout as e:
except timeout:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, format_id))
continue

View File

@@ -4,14 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
unsmuggle_url,
url_or_none,
)
@@ -177,7 +175,7 @@ class EaglePlatformIE(InfoExtractor):
video_id, 'Downloading mp4 JSON', fatal=False)
if mp4_data:
for format_id, format_url in mp4_data.get('data', {}).items():
if not isinstance(format_url, compat_str):
if not url_or_none(format_url):
continue
height = int_or_none(format_id)
if height is not None and m3u8_formats_dict.get(height):

View File

@@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
try_get,
unified_timestamp,
url_or_none,
)
@@ -34,8 +35,8 @@ class EggheadCourseIE(InfoExtractor):
entries = []
for lesson in lessons:
lesson_url = lesson.get('http_url')
if not lesson_url or not isinstance(lesson_url, compat_str):
lesson_url = url_or_none(lesson.get('http_url'))
if not lesson_url:
continue
lesson_id = lesson.get('id')
if lesson_id:
@@ -95,7 +96,8 @@ class EggheadLessonIE(InfoExtractor):
formats = []
for _, format_url in lesson['media_urls'].items():
if not format_url or not isinstance(format_url, compat_str):
format_url = url_or_none(format_url)
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'm3u8':

View File

@@ -9,8 +9,10 @@ from ..utils import (
encode_base_n,
ExtractorError,
int_or_none,
merge_dicts,
parse_duration,
str_to_int,
url_or_none,
)
@@ -24,10 +26,16 @@ class EpornerIE(InfoExtractor):
'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video',
'ext': 'mp4',
'title': 'Infamous Tiffany Teen Strip Tease Video',
'description': 'md5:764f39abf932daafa37485eb46efa152',
'timestamp': 1232520922,
'upload_date': '20090121',
'duration': 1838,
'view_count': int,
'age_limit': 18,
},
'params': {
'proxy': '127.0.0.1:8118'
}
}, {
# New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
@@ -82,8 +90,8 @@ class EpornerIE(InfoExtractor):
for format_id, format_dict in formats_dict.items():
if not isinstance(format_dict, dict):
continue
src = format_dict.get('src')
if not isinstance(src, compat_str) or not src.startswith('http'):
src = url_or_none(format_dict.get('src'))
if not src or not src.startswith('http'):
continue
if kind == 'hls':
formats.extend(self._extract_m3u8_formats(
@@ -103,12 +111,15 @@ class EpornerIE(InfoExtractor):
})
self._sort_formats(formats)
duration = parse_duration(self._html_search_meta('duration', webpage))
json_ld = self._search_json_ld(webpage, display_id, default={})
duration = parse_duration(self._html_search_meta(
'duration', webpage, default=None))
view_count = str_to_int(self._search_regex(
r'id="cinemaviews">\s*([0-9,]+)\s*<small>views',
webpage, 'view count', fatal=False))
return {
return merge_dicts(json_ld, {
'id': video_id,
'display_id': display_id,
'title': title,
@@ -116,4 +127,4 @@ class EpornerIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
'age_limit': 18,
}
})

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
@@ -11,7 +13,13 @@ from ..utils import (
class ExpressenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?expressen\.se/tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?expressen\.se/
(?:(?:tvspelare/video|videoplayer/embed)/)?
tv/(?:[^/]+/)*
(?P<id>[^/?#&]+)
'''
_TESTS = [{
'url': 'https://www.expressen.se/tv/ledare/ledarsnack/ledarsnack-om-arbetslosheten-bland-kvinnor-i-speciellt-utsatta-omraden/',
'md5': '2fbbe3ca14392a6b1b36941858d33a45',
@@ -28,8 +36,21 @@ class ExpressenIE(InfoExtractor):
}, {
'url': 'https://www.expressen.se/tv/kultur/kulturdebatt-med-expressens-karin-olsson/',
'only_matching': True,
}, {
'url': 'https://www.expressen.se/tvspelare/video/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}, {
'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?expressen\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
webpage)]
def _real_extract(self, url):
display_id = self._match_id(url)

View File

@@ -54,6 +54,7 @@ from .appletrailers import (
from .archiveorg import ArchiveOrgIE
from .arkena import ArkenaIE
from .ard import (
ARDBetaMediathekIE,
ARDIE,
ARDMediathekIE,
)
@@ -87,11 +88,7 @@ from .awaan import (
AWAANLiveIE,
AWAANSeasonIE,
)
from .azmedien import (
AZMedienIE,
AZMedienPlaylistIE,
AZMedienShowPlaylistIE,
)
from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
@@ -118,6 +115,10 @@ from .bilibili import (
BiliBiliBangumiIE,
)
from .biobiochiletv import BioBioChileTVIE
from .bitchute import (
BitChuteIE,
BitChuteChannelIE,
)
from .biqle import BIQLEIE
from .bleacherreport import (
BleacherReportIE,
@@ -193,6 +194,10 @@ from .chirbit import (
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .ciscolive import (
CiscoLiveSessionIE,
CiscoLiveSearchIE,
)
from .cjsw import CJSWIE
from .cliphunter import CliphunterIE
from .clippit import ClippitIE
@@ -204,7 +209,10 @@ from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
from .cmt import CMTIE
from .cnbc import CNBCIE
from .cnbc import (
CNBCIE,
CNBCVideoIE,
)
from .cnn import (
CNNIE,
CNNBlogsIE,
@@ -516,6 +524,7 @@ from .keezmovies import KeezMoviesIE
from .ketnet import KetnetIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .kinopoisk import KinoPoiskIE
from .keek import KeekIE
from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
@@ -534,6 +543,7 @@ from .la7 import LA7IE
from .laola1tv import (
Laola1TvEmbedIE,
Laola1TvIE,
EHFTVIE,
ITTFIE,
)
from .lci import LCIIE
@@ -563,6 +573,10 @@ from .limelight import (
LimelightChannelListIE,
)
from .line import LineTVIE
from .linkedin import (
LinkedInLearningIE,
LinkedInLearningCourseIE,
)
from .litv import LiTVIE
from .liveleak import (
LiveLeakIE,
@@ -736,7 +750,10 @@ from .nonktube import NonkTubeIE
from .noovo import NoovoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
from .nova import NovaIE
from .nova import (
NovaEmbedIE,
NovaIE,
)
from .novamov import (
AuroraVidIE,
CloudTimeIE,
@@ -860,6 +877,10 @@ from .pornhub import (
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
from .pornoxo import PornoXOIE
from .puhutv import (
PuhuTVIE,
PuhuTVSerieIE,
)
from .presstv import PressTVIE
from .primesharetv import PrimeShareTVIE
from .promptfile import PromptFileIE
@@ -891,7 +912,10 @@ from .rai import (
RaiPlayPlaylistIE,
RaiIE,
)
from .raywenderlich import RayWenderlichIE
from .raywenderlich import (
RayWenderlichIE,
RayWenderlichCourseIE,
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import RedBullTVIE
@@ -1027,7 +1051,7 @@ from .spike import (
)
from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE
from .sportbox import SportBoxIE
from .sportdeutschland import SportDeutschlandIE
from .springboardplatform import SpringboardPlatformIE
from .sprout import SproutIE
@@ -1070,6 +1094,7 @@ from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tele5 import Tele5IE
from .tele13 import Tele13IE
from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
@@ -1099,6 +1124,10 @@ from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .threeqsdn import ThreeQSDNIE
from .tiktok import (
TikTokIE,
TikTokUserIE,
)
from .tinypic import TinyPicIE
from .tmz import (
TMZIE,
@@ -1136,7 +1165,6 @@ from .tv2 import (
TV2ArticleIE,
)
from .tv2hu import TV2HuIE
from .tv3 import TV3IE
from .tv4 import TV4IE
from .tv5mondeplus import TV5MondePlusIE
from .tva import TVAIE
@@ -1166,12 +1194,14 @@ from .tvp import (
from .tvplay import (
TVPlayIE,
ViafreeIE,
TVPlayHomeIE,
)
from .tvplayer import TVPlayerIE
from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE
from .twitcasting import TwitCastingIE
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
@@ -1277,6 +1307,7 @@ from .vimeo import (
VimeoReviewIE,
VimeoUserIE,
VimeoWatchLaterIE,
VHXEmbedIE,
)
from .vimple import VimpleIE
from .vine import (
@@ -1287,6 +1318,7 @@ from .viki import (
VikiIE,
VikiChannelIE,
)
from .viqeo import ViqeoIE
from .viu import (
ViuIE,
ViuPlaylistIE,
@@ -1363,6 +1395,7 @@ from .wsj import (
WSJIE,
WSJArticleIE,
)
from .wwe import WWEIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
from .xfileshare import XFileShareIE
@@ -1412,6 +1445,7 @@ from .younow import (
YouNowMomentIE,
)
from .youporn import YouPornIE
from .yourporn import YourPornIE
from .yourupload import YourUploadIE
from .youtube import (
YoutubeIE,
@@ -1435,10 +1469,23 @@ from .youtube import (
from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zattoo import (
BBVTVIE,
EinsUndEinsTVIE,
EWETVIE,
GlattvisionTVIE,
MNetTVIE,
MyVisionTVIE,
NetPlusIE,
OsnatelTVIE,
QuantumTVIE,
QuicklineIE,
QuicklineLiveIE,
SAKTVIE,
VTXTVIE,
WalyTVIE,
ZattooIE,
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE
from .zype import ZypeIE

View File

@@ -20,6 +20,7 @@ from ..utils import (
int_or_none,
js_to_json,
limit_length,
parse_count,
sanitized_Request,
try_get,
urlencode_postdata,
@@ -56,7 +57,7 @@ class FacebookIE(InfoExtractor):
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary'
_TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
@@ -75,7 +76,7 @@ class FacebookIE(InfoExtractor):
'info_dict': {
'id': '274175099429670',
'ext': 'mp4',
'title': 'Asif Nawab Butt posted a video to his Timeline.',
'title': 're:^Asif Nawab Butt posted a video',
'uploader': 'Asif Nawab Butt',
'upload_date': '20140506',
'timestamp': 1399398998,
@@ -133,7 +134,7 @@ class FacebookIE(InfoExtractor):
}, {
# have 1080P, but only up to 720p in swf params
'url': 'https://www.facebook.com/cnn/videos/10155529876156509/',
'md5': '0d9813160b146b3bc8744e006027fcc6',
'md5': '9571fae53d4165bbbadb17a94651dcdc',
'info_dict': {
'id': '10155529876156509',
'ext': 'mp4',
@@ -142,6 +143,7 @@ class FacebookIE(InfoExtractor):
'upload_date': '20161030',
'uploader': 'CNN',
'thumbnail': r're:^https?://.*',
'view_count': int,
},
}, {
# bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall
@@ -149,7 +151,7 @@ class FacebookIE(InfoExtractor):
'info_dict': {
'id': '1417995061575415',
'ext': 'mp4',
'title': 'md5:a7b86ca673f51800cd54687b7f4012fe',
'title': 'md5:1db063d6a8c13faa8da727817339c857',
'timestamp': 1486648217,
'upload_date': '20170209',
'uploader': 'Yaroslav Korpan',
@@ -176,7 +178,7 @@ class FacebookIE(InfoExtractor):
'info_dict': {
'id': '1396382447100162',
'ext': 'mp4',
'title': 'md5:e2d2700afdf84e121f5d0f999bad13a3',
'title': 'md5:19a428bbde91364e3de815383b54a235',
'timestamp': 1486035494,
'upload_date': '20170202',
'uploader': 'Elisabeth Ahtn',
@@ -353,7 +355,6 @@ class FacebookIE(InfoExtractor):
tahoe_data = self._download_webpage(
self._VIDEO_PAGE_TAHOE_TEMPLATE % video_id, video_id,
data=urlencode_postdata({
'__user': 0,
'__a': 1,
'__pc': self._search_regex(
r'pkg_cohort["\']\s*:\s*["\'](.+?)["\']', webpage,
@@ -361,6 +362,9 @@ class FacebookIE(InfoExtractor):
'__rev': self._search_regex(
r'client_revision["\']\s*:\s*(\d+),', webpage,
'client revision', default='3944515'),
'fb_dtsg': self._search_regex(
r'"DTSGInitialData"\s*,\s*\[\]\s*,\s*{\s*"token"\s*:\s*"([^"]+)"',
webpage, 'dtsg token', default=''),
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
@@ -426,6 +430,10 @@ class FacebookIE(InfoExtractor):
'timestamp', default=None))
thumbnail = self._og_search_thumbnail(webpage)
view_count = parse_count(self._search_regex(
r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',
default=None))
info_dict = {
'id': video_id,
'title': video_title,
@@ -433,6 +441,7 @@ class FacebookIE(InfoExtractor):
'uploader': uploader,
'timestamp': timestamp,
'thumbnail': thumbnail,
'view_count': view_count,
}
return webpage, info_dict

View File

@@ -10,6 +10,7 @@ from ..utils import (
int_or_none,
qualities,
unified_strdate,
url_or_none,
)
@@ -88,8 +89,8 @@ class FirstTVIE(InfoExtractor):
formats = []
path = None
for f in item.get('mbr', []):
src = f.get('src')
if not src or not isinstance(src, compat_str):
src = url_or_none(f.get('src'))
if not src:
continue
tbr = int_or_none(self._search_regex(
r'_(\d{3,})\.mp4', src, 'tbr', default=None))

View File

@@ -3,15 +3,45 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_b64decode,
compat_str,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
str_or_none,
str_to_int,
try_get,
unified_timestamp,
url_or_none,
)
class FourTubeBaseIE(InfoExtractor):
_TKN_HOST = 'tkn.kodicdn.com'
def _extract_formats(self, url, video_id, media_id, sources):
token_url = 'https://%s/%s/desktop/%s' % (
self._TKN_HOST, media_id, '+'.join(sources))
parsed_url = compat_urlparse.urlparse(url)
tokens = self._download_json(token_url, video_id, data=b'', headers={
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
'Referer': url,
})
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
return formats
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
kind, video_id, display_id = mobj.group('kind', 'id', 'display_id')
@@ -68,21 +98,7 @@ class FourTubeBaseIE(InfoExtractor):
media_id = params[0]
sources = ['%s' % p for p in params[2]]
token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format(
media_id, '+'.join(sources))
parsed_url = compat_urlparse.urlparse(url)
tokens = self._download_json(token_url, video_id, data=b'', headers={
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
'Referer': url,
})
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
formats = self._extract_formats(url, video_id, media_id, sources)
return {
'id': video_id,
@@ -164,6 +180,7 @@ class FuxIE(FourTubeBaseIE):
class PornTubeIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s'
_TKN_HOST = 'tkn.porntube.com'
_TESTS = [{
'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759',
'info_dict': {
@@ -171,13 +188,32 @@ class PornTubeIE(FourTubeBaseIE):
'ext': 'mp4',
'title': 'Teen couple doing anal',
'uploader': 'Alexy',
'uploader_id': 'Alexy',
'uploader_id': '91488',
'upload_date': '20150606',
'timestamp': 1433595647,
'duration': 5052,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.porntube.com/videos/squirting-teen-ballerina-ecg_1331406',
'info_dict': {
'id': '1331406',
'ext': 'mp4',
'title': 'Squirting Teen Ballerina on ECG',
'uploader': 'Exploited College Girls',
'uploader_id': '665',
'channel': 'Exploited College Girls',
'channel_id': '665',
'upload_date': '20130920',
'timestamp': 1379685485,
'duration': 851,
'view_count': int,
'like_count': int,
'age_limit': 18,
},
'params': {
@@ -191,6 +227,55 @@ class PornTubeIE(FourTubeBaseIE):
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
video = self._parse_json(
self._search_regex(
r'INITIALSTATE\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'data', group='value'), video_id,
transform_source=lambda x: compat_urllib_parse_unquote(
compat_b64decode(x).decode('utf-8')))['page']['video']
title = video['title']
media_id = video['mediaId']
sources = [compat_str(e['height'])
for e in video['encodings'] if e.get('height')]
formats = self._extract_formats(url, video_id, media_id, sources)
thumbnail = url_or_none(video.get('masterThumb'))
uploader = try_get(video, lambda x: x['user']['username'], compat_str)
uploader_id = str_or_none(try_get(
video, lambda x: x['user']['id'], int))
channel = try_get(video, lambda x: x['channel']['name'], compat_str)
channel_id = str_or_none(try_get(
video, lambda x: x['channel']['id'], int))
like_count = int_or_none(video.get('likes'))
dislike_count = int_or_none(video.get('dislikes'))
view_count = int_or_none(video.get('playsQty'))
duration = int_or_none(video.get('durationInSeconds'))
timestamp = unified_timestamp(video.get('publishedAt'))
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'uploader': uploader or channel,
'uploader_id': uploader_id or channel_id,
'channel': channel,
'channel_id': channel_id,
'timestamp': timestamp,
'like_count': like_count,
'dislike_count': dislike_count,
'view_count': view_count,
'duration': duration,
'age_limit': 18,
}
class PornerBrosIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'

View File

@@ -1,43 +1,33 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
smuggle_url,
update_url_query,
)
class FoxSportsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxsports\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?foxsports\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = {
'url': 'http://www.foxsports.com/tennessee/video/432609859715',
'md5': 'b49050e955bebe32c301972e4012ac17',
'info_dict': {
'id': 'bwduI3X_TgUB',
'id': '432609859715',
'ext': 'mp4',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.',
'upload_date': '20150423',
'timestamp': 1429761109,
# TODO: fix timestamp
'upload_date': '19700101', # '20150423',
# 'timestamp': 1429761109,
'uploader': 'NEWA-FNG-FOXSPORTS',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
config = self._parse_json(
self._html_search_regex(
r"""class="[^"]*(?:fs-player|platformPlayer-wrapper)[^"]*".+?data-player-config='([^']+)'""",
webpage, 'data player config'),
video_id)
return self.url_result(smuggle_url(update_url_query(
config['releaseURL'], {
'mbr': 'true',
'switch': 'http',
}), {'force_smil_url': True}))
return self.url_result(
'https://feed.theplatform.com/f/BKQ29B/foxsports-all?byId=' + video_id, 'ThePlatformFeed')

View File

@@ -16,6 +16,7 @@ from ..utils import (
int_or_none,
parse_duration,
try_get,
url_or_none,
)
from .dailymotion import DailymotionIE
@@ -115,14 +116,13 @@ class FranceTVIE(InfoExtractor):
def sign(manifest_url, manifest_id):
for host in ('hdfauthftv-a.akamaihd.net', 'hdfauth.francetv.fr'):
signed_url = self._download_webpage(
signed_url = url_or_none(self._download_webpage(
'https://%s/esi/TA' % host, video_id,
'Downloading signed %s manifest URL' % manifest_id,
fatal=False, query={
'url': manifest_url,
})
if (signed_url and isinstance(signed_url, compat_str) and
re.search(r'^(?:https?:)?//', signed_url)):
}))
if signed_url:
return signed_url
return manifest_url

View File

@@ -11,6 +11,7 @@ from ..compat import (
from ..utils import (
ExtractorError,
parse_duration,
url_or_none,
urlencode_postdata,
)
@@ -80,7 +81,7 @@ class FrontendMastersPageBaseIE(FrontendMastersBaseIE):
chapters = []
lesson_elements = course.get('lessonElements')
if isinstance(lesson_elements, list):
chapters = [e for e in lesson_elements if isinstance(e, compat_str)]
chapters = [url_or_none(e) for e in lesson_elements if url_or_none(e)]
return chapters
@staticmethod

View File

@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
@@ -125,17 +126,31 @@ class FunkChannelIE(FunkBaseIE):
# Id-based channels are currently broken on their side: webplayer
# tries to process them via byChannelAlias endpoint and fails
# predictably.
by_channel_alias = self._download_json(
'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s'
% channel_id,
'Downloading byChannelAlias JSON', headers=headers, query={
'size': 100,
}, fatal=False)
if by_channel_alias:
for page_num in itertools.count():
by_channel_alias = self._download_json(
'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s'
% channel_id,
'Downloading byChannelAlias JSON page %d' % (page_num + 1),
headers=headers, query={
'filterFsk': 'false',
'sort': 'creationDate,desc',
'size': 100,
'page': page_num,
}, fatal=False)
if not by_channel_alias:
break
video_list = try_get(
by_channel_alias, lambda x: x['_embedded']['videoList'], list)
if video_list:
if not video_list:
break
try:
video = next(r for r in video_list if r.get('alias') == alias)
break
except StopIteration:
pass
if not try_get(
by_channel_alias, lambda x: x['_links']['next']):
break
if not video:
by_id_list = self._download_json(

View File

@@ -47,7 +47,7 @@ from .nbc import NBCSportsVPlayerIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
from .tvc import TVCIE
from .sportbox import SportBoxEmbedIE
from .sportbox import SportBoxIE
from .smotri import SmotriIE
from .myvi import MyviIE
from .condenast import CondeNastIE
@@ -112,6 +112,9 @@ from .peertube import PeerTubeIE
from .indavideo import IndavideoEmbedIE
from .apa import APAIE
from .foxnews import FoxNewsIE
from .viqeo import ViqeoIE
from .expressen import ExpressenIE
from .zype import ZypeIE
class GenericIE(InfoExtractor):
@@ -2059,6 +2062,44 @@ class GenericIE(InfoExtractor):
},
'skip': 'TODO: fix nested playlists processing in tests',
},
{
# Viqeo embeds
'url': 'https://viqeo.tv/',
'info_dict': {
'id': 'viqeo',
'title': 'All-new video platform',
},
'playlist_count': 6,
},
{
# Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
'info_dict': {
'id': '5b400b834b32992a310622b9',
'ext': 'mp4',
'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g',
},
'add_ie': [ZypeIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
# videojs embed
'url': 'https://video.sibnet.ru/shell.php?videoid=3422904',
'info_dict': {
'id': 'shell',
'ext': 'mp4',
'title': 'Доставщик пиццы спросил разрешения сыграть на фортепиано',
'description': 'md5:89209cdc587dab1e4a090453dbaa2cb1',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Failed to download MPD manifest'],
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -2610,9 +2651,9 @@ class GenericIE(InfoExtractor):
return self.url_result(tvc_url, 'TVC')
# Look for embedded SportBox player
sportbox_urls = SportBoxEmbedIE._extract_urls(webpage)
sportbox_urls = SportBoxIE._extract_urls(webpage)
if sportbox_urls:
return self.playlist_from_matches(sportbox_urls, video_id, video_title, ie='SportBoxEmbed')
return self.playlist_from_matches(sportbox_urls, video_id, video_title, ie=SportBoxIE.ie_key())
# Look for embedded XHamster player
xhamster_urls = XHamsterEmbedIE._extract_urls(webpage)
@@ -2997,7 +3038,7 @@ class GenericIE(InfoExtractor):
wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key())
# Look for Mediaset embeds
mediaset_urls = MediasetIE._extract_urls(webpage)
mediaset_urls = MediasetIE._extract_urls(self, webpage)
if mediaset_urls:
return self.playlist_from_matches(
mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key())
@@ -3086,13 +3127,28 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
foxnews_urls, video_id, video_title, ie=FoxNewsIE.ie_key())
sharevideos_urls = [mobj.group('url') for mobj in re.finditer(
sharevideos_urls = [sharevideos_mobj.group('url') for sharevideos_mobj in re.finditer(
r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1',
webpage)]
if sharevideos_urls:
return self.playlist_from_matches(
sharevideos_urls, video_id, video_title)
viqeo_urls = ViqeoIE._extract_urls(webpage)
if viqeo_urls:
return self.playlist_from_matches(
viqeo_urls, video_id, video_title, ie=ViqeoIE.ie_key())
expressen_urls = ExpressenIE._extract_urls(webpage)
if expressen_urls:
return self.playlist_from_matches(
expressen_urls, video_id, video_title, ie=ExpressenIE.ie_key())
zype_urls = ZypeIE._extract_urls(webpage)
if zype_urls:
return self.playlist_from_matches(
zype_urls, video_id, video_title, ie=ZypeIE.ie_key())
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
@@ -3114,9 +3170,13 @@ class GenericIE(InfoExtractor):
jwplayer_data = self._find_jwplayer_data(
webpage, video_id, transform_source=js_to_json)
if jwplayer_data:
info = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, base_url=url)
return merge_dicts(info, info_dict)
try:
info = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, base_url=url)
return merge_dicts(info, info_dict)
except ExtractorError:
# See https://github.com/rg3/youtube-dl/pull/16735
pass
# Video.js embed
mobj = re.search(

View File

@@ -53,7 +53,7 @@ class GfycatIE(InfoExtractor):
video_id = self._match_id(url)
gfy = self._download_json(
'http://gfycat.com/cajax/get/%s' % video_id,
'https://api.gfycat.com/v1/gfycats/%s' % video_id,
video_id, 'Downloading video info')
if 'error' in gfy:
raise ExtractorError('Gfycat said: ' + gfy['error'], expected=True)

View File

@@ -36,7 +36,8 @@ class GoIE(AdobePassIE):
'requestor_id': 'DisneyXD',
}
}
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))' % '|'.join(_SITE_INFO.keys())
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
% '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
_TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
'info_dict': {
@@ -62,6 +63,14 @@ class GoIE(AdobePassIE):
}, {
'url': 'http://abc.go.com/shows/world-news-tonight/episode-guide/2017-02/17-021717-intense-stand-off-between-man-with-rifle-and-police-in-oakland',
'only_matching': True,
}, {
# brand 004
'url': 'http://disneynow.go.com/shows/big-hero-6-the-series/season-01/episode-10-mr-sparkles-loses-his-sparkle/vdka4637915',
'only_matching': True,
}, {
# brand 008
'url': 'http://disneynow.go.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}]
def _extract_videos(self, brand, video_id='-1', show_id='-1'):
@@ -72,14 +81,23 @@ class GoIE(AdobePassIE):
def _real_extract(self, url):
sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
site_info = self._SITE_INFO[sub_domain]
brand = site_info['brand']
if not video_id:
webpage = self._download_webpage(url, display_id)
site_info = self._SITE_INFO.get(sub_domain, {})
brand = site_info.get('brand')
if not video_id or not site_info:
webpage = self._download_webpage(url, display_id or video_id)
video_id = self._search_regex(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)', webpage, 'video id', default=None)
r'data-video-id=["\']*(VDKA\w+)', webpage, 'video id',
default=None)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',
r'data-page-brand=\s*["\']\s*(\d+)'), webpage, 'brand',
default='004')
site_info = next(
si for _, si in self._SITE_INFO.items()
if si.get('brand') == brand)
if not video_id:
# show extraction works for Disney, DisneyJunior and DisneyXD
# ABC and Freeform has different layout

View File

@@ -8,6 +8,7 @@ from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
url_or_none,
urlencode_postdata,
)
@@ -80,8 +81,8 @@ class HiDiveIE(InfoExtractor):
bitrates = rendition.get('bitrates')
if not isinstance(bitrates, dict):
continue
m3u8_url = bitrates.get('hls')
if not isinstance(m3u8_url, compat_str):
m3u8_url = url_or_none(bitrates.get('hls'))
if not m3u8_url:
continue
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
@@ -93,9 +94,8 @@ class HiDiveIE(InfoExtractor):
if not isinstance(cc_file, list) or len(cc_file) < 3:
continue
cc_lang = cc_file[0]
cc_url = cc_file[2]
if not isinstance(cc_lang, compat_str) or not isinstance(
cc_url, compat_str):
cc_url = url_or_none(cc_file[2])
if not isinstance(cc_lang, compat_str) or not cc_url:
continue
subtitles.setdefault(cc_lang, []).append({
'url': cc_url,

View File

@@ -1,49 +1,56 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import hashlib
import hmac
import time
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
try_get,
)
class HotStarBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['IN']
_AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
def _download_json(self, *args, **kwargs):
response = super(HotStarBaseIE, self)._download_json(*args, **kwargs)
if response['resultCode'] != 'OK':
if kwargs.get('fatal'):
raise ExtractorError(
response['errorDescription'], expected=True)
return None
return response['resultObj']
def _download_content_info(self, content_id):
return self._download_json(
'https://account.hotstar.com/AVS/besc', content_id, query={
'action': 'GetAggregatedContentDetails',
'appVersion': '5.0.40',
'channel': 'PCTV',
'contentId': content_id,
})['contentInfo'][0]
def _call_api(self, path, video_id, query_name='contentId'):
st = int(time.time())
exp = st + 6000
auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
response = self._download_json(
'https://api.hotstar.com/' + path,
video_id, headers={
'hotstarauth': auth,
'x-country-code': 'IN',
'x-platform-code': 'JIO',
}, query={
query_name: video_id,
'tas': 10000,
})
if response['statusCode'] != 'OK':
raise ExtractorError(
response['body']['message'], expected=True)
return response['body']['results']
class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{
'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
# contentData
'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273',
'info_dict': {
'id': '1000076273',
'ext': 'mp4',
'title': 'On Air With AIB',
'title': 'Can You Not Spread Rumours?',
'description': 'md5:c957d8868e9bc793ccb813691cc4c434',
'timestamp': 1447227000,
'timestamp': 1447248600,
'upload_date': '20151111',
'duration': 381,
},
@@ -51,6 +58,10 @@ class HotStarIE(HotStarBaseIE):
# m3u8 download
'skip_download': True,
}
}, {
# contentDetail
'url': 'https://www.hotstar.com/movies/radha-gopalam/1000057157',
'only_matching': True,
}, {
'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
'only_matching': True,
@@ -58,47 +69,51 @@ class HotStarIE(HotStarBaseIE):
'url': 'http://www.hotstar.com/1000000515',
'only_matching': True,
}]
_GEO_BYPASS = False
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_content_info(video_id)
webpage = self._download_webpage(url, video_id)
app_state = self._parse_json(self._search_regex(
r'<script>window\.APP_STATE\s*=\s*({.+?})</script>',
webpage, 'app state'), video_id)
video_data = {}
getters = (
lambda x, k=k: x['initialState']['content%s' % k]['content']
for k in ('Data', 'Detail')
)
for v in app_state.values():
content = try_get(v, getters, dict)
if content and content.get('contentId') == video_id:
video_data = content
title = video_data['episodeTitle']
title = video_data['title']
if video_data.get('encrypted') == 'Y':
if video_data.get('drmProtected'):
raise ExtractorError('This video is DRM protected.', expected=True)
formats = []
for f in ('JIO',):
format_data = self._download_json(
'http://getcdn.hotstar.com/AVS/besc',
video_id, 'Downloading %s JSON metadata' % f,
fatal=False, query={
'action': 'GetCDN',
'asJson': 'Y',
'channel': f,
'id': video_id,
'type': 'VOD',
})
if format_data:
format_url = format_data.get('src')
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
# produce broken files
continue
else:
formats.append({
'url': format_url,
'width': int_or_none(format_data.get('width')),
'height': int_or_none(format_data.get('height')),
})
format_data = self._call_api('h/v1/play', video_id)['item']
format_url = format_data['playbackUrl']
ext = determine_ext(format_url)
if ext == 'm3u8':
try:
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls'))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
self.raise_geo_restricted(countries=['IN'])
raise
elif ext == 'f4m':
# produce broken files
pass
else:
formats.append({
'url': format_url,
'width': int_or_none(format_data.get('width')),
'height': int_or_none(format_data.get('height')),
})
self._sort_formats(formats)
return {
@@ -106,57 +121,43 @@ class HotStarIE(HotStarBaseIE):
'title': title,
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate')),
'timestamp': int_or_none(video_data.get('broadcastDate') or video_data.get('startDate')),
'formats': formats,
'channel': video_data.get('channelName'),
'channel_id': video_data.get('channelId'),
'series': video_data.get('showName'),
'season': video_data.get('seasonName'),
'season_number': int_or_none(video_data.get('seasonNo')),
'season_id': video_data.get('seasonId'),
'episode': title,
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('contentTitle'),
'episode_number': int_or_none(video_data.get('episodeNo')),
}
class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com/tv/[^/]+/(?P<content_id>\d+))/(?P<type>[^/]+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/tv/[^/]+/s-\w+/list/[^/]+/t-(?P<id>\w+)'
_TESTS = [{
'url': 'http://www.hotstar.com/tv/pratidaan/14982/episodes/14812/9993',
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'info_dict': {
'id': '14812',
'id': '3_2_26',
},
'playlist_mincount': 75,
'playlist_mincount': 20,
}, {
'url': 'http://www.hotstar.com/tv/pratidaan/14982/popular-clips/9998/9998',
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480',
'only_matching': True,
}]
_ITEM_TYPES = {
'episodes': 'EPISODE',
'popular-clips': 'CLIPS',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
base_url = mobj.group('url')
content_id = mobj.group('content_id')
playlist_type = mobj.group('type')
playlist_id = self._match_id(url)
content_info = self._download_content_info(content_id)
playlist_id = compat_str(content_info['categoryId'])
collection = self._download_json(
'https://search.hotstar.com/AVS/besc', playlist_id, query={
'action': 'SearchContents',
'appVersion': '5.0.40',
'channel': 'PCTV',
'moreFilters': 'series:%s;' % playlist_id,
'query': '*',
'searchOrder': 'last_broadcast_date desc,year desc,title asc',
'type': self._ITEM_TYPES.get(playlist_type, 'EPISODE'),
})
collection = self._call_api('o/v1/tray/find', playlist_id, 'uqId')
entries = [
self.url_result(
'%s/_/%s' % (base_url, video['contentId']),
'https://www.hotstar.com/%s' % video['contentId'],
ie=HotStarIE.ie_key(), video_id=video['contentId'])
for video in collection['response']['docs']
for video in collection['assets']['items']
if video.get('contentId')]
return self.playlist_result(entries, playlist_id)

View File

@@ -3,12 +3,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
mimetype2ext,
parse_duration,
qualities,
url_or_none,
)
@@ -61,10 +61,11 @@ class ImdbIE(InfoExtractor):
for encoding in video_metadata.get('encodings', []):
if not encoding or not isinstance(encoding, dict):
continue
video_url = encoding.get('videoUrl')
if not video_url or not isinstance(video_url, compat_str):
video_url = url_or_none(encoding.get('videoUrl'))
if not video_url:
continue
ext = determine_ext(video_url, mimetype2ext(encoding.get('mimeType')))
ext = mimetype2ext(encoding.get(
'mimeType')) or determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z]+)?$'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z0-9]+)?$'
_TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv',
@@ -43,6 +43,9 @@ class ImgurIE(InfoExtractor):
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
}, {
'url': 'https://i.imgur.com/crGpqCV.mp4',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -17,6 +17,7 @@ from ..utils import (
lowercase_escape,
std_headers,
try_get,
url_or_none,
)
@@ -170,7 +171,7 @@ class InstagramIE(InfoExtractor):
node = try_get(edge, lambda x: x['node'], dict)
if not node:
continue
node_video_url = try_get(node, lambda x: x['video_url'], compat_str)
node_video_url = url_or_none(node.get('video_url'))
if not node_video_url:
continue
entries.append({

View File

@@ -7,7 +7,7 @@ from ..utils import unified_timestamp
class InternazionaleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?internazionale\.it/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
_TESTS = [{
'url': 'https://www.internazionale.it/video/2015/02/19/richard-linklater-racconta-una-scena-di-boyhood',
'md5': '3e39d32b66882c1218e305acbf8348ca',
'info_dict': {
@@ -23,7 +23,23 @@ class InternazionaleIE(InfoExtractor):
'params': {
'format': 'bestvideo',
},
}
}, {
'url': 'https://www.internazionale.it/video/2018/08/29/telefono-stare-con-noi-stessi',
'md5': '9db8663704cab73eb972d1cee0082c79',
'info_dict': {
'id': '761344',
'display_id': 'telefono-stare-con-noi-stessi',
'ext': 'mp4',
'title': 'Usiamo il telefono per evitare di stare con noi stessi',
'description': 'md5:75ccfb0d6bcefc6e7428c68b4aa1fe44',
'timestamp': 1535528954,
'upload_date': '20180829',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
@@ -40,8 +56,13 @@ class InternazionaleIE(InfoExtractor):
DATA_RE % 'job-id', webpage, 'video id', group='value')
video_path = self._search_regex(
DATA_RE % 'video-path', webpage, 'video path', group='value')
video_available_abroad = self._search_regex(
DATA_RE % 'video-available_abroad', webpage,
'video available aboard', default='1', group='value')
video_available_abroad = video_available_abroad == '1'
video_base = 'https://video.internazionale.it/%s/%s.' % (video_path, video_id)
video_base = 'https://video%s.internazionale.it/%s/%s.' % \
('' if video_available_abroad else '-ita', video_path, video_id)
formats = self._extract_m3u8_formats(
video_base + 'm3u8', display_id, 'mp4',

View File

@@ -12,7 +12,7 @@ from ..utils import (
class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_VALID_URL = r'https?://(?:play|prima)\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_GEO_BYPASS = False
_TESTS = [{
@@ -33,14 +33,27 @@ class IPrimaIE(InfoExtractor):
# geo restricted
'url': 'http://play.iprima.cz/closer-nove-pripady/closer-nove-pripady-iv-1',
'only_matching': True,
}, {
# iframe api.play-backend.iprima.cz
'url': 'https://prima.iprima.cz/my-little-pony/mapa-znameni-2-2',
'only_matching': True,
}, {
# iframe prima.iprima.cz
'url': 'https://prima.iprima.cz/porady/jak-se-stavi-sen/rodina-rathousova-praha',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
self._set_cookie('play.iprima.cz', 'ott_adult_confirmed', '1')
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id')
video_id = self._search_regex(
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">'),
webpage, 'real id')
playerpage = self._download_webpage(
'http://play.iprima.cz/prehravac/init',

View File

@@ -20,6 +20,7 @@ from ..utils import (
merge_dicts,
parse_duration,
smuggle_url,
url_or_none,
xpath_with_ns,
xpath_element,
xpath_text,
@@ -250,8 +251,8 @@ class ITVIE(InfoExtractor):
for sub in subs:
if not isinstance(sub, dict):
continue
href = sub.get('Href')
if isinstance(href, compat_str):
href = url_or_none(sub.get('Href'))
if href:
extract_subtitle(href)
if not info.get('duration'):
info['duration'] = parse_duration(video_data.get('Duration'))

View File

@@ -15,7 +15,7 @@ from ..utils import (
class IviIE(InfoExtractor):
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?ivi\.(?:ru|tv)/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
@@ -65,7 +65,11 @@ class IviIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
}
},
{
'url': 'https://www.ivi.tv/watch/33560/',
'only_matching': True,
},
]
# Sorted by quality

View File

@@ -7,6 +7,7 @@ from ..utils import (
int_or_none,
mimetype2ext,
remove_end,
url_or_none,
)
@@ -73,11 +74,14 @@ class IwaraIE(InfoExtractor):
formats = []
for a_format in video_data:
format_uri = url_or_none(a_format.get('uri'))
if not format_uri:
continue
format_id = a_format.get('resolution')
height = int_or_none(self._search_regex(
r'(\d+)p', format_id, 'height', default=None))
formats.append({
'url': a_format['uri'],
'url': self._proto_relative_url(format_uri, 'https:'),
'format_id': format_id,
'ext': mimetype2ext(a_format.get('mime')) or 'mp4',
'height': height,

View File

@@ -26,8 +26,15 @@ class JamendoBaseIE(InfoExtractor):
class JamendoIE(JamendoBaseIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'''(?x)
https?://
(?:
licensing\.jamendo\.com/[^/]+|
(?:www\.)?jamendo\.com
)
/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)
'''
_TESTS = [{
'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
'md5': '6e9e82ed6db98678f171c25a8ed09ffd',
'info_dict': {
@@ -40,14 +47,19 @@ class JamendoIE(JamendoBaseIE):
'duration': 210,
'thumbnail': r're:^https?://.*\.jpg'
}
}
}, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url)
track_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(
'https://www.jamendo.com/track/%s/%s' % (track_id, display_id),
display_id)
title, artist, track = self._extract_meta(webpage)

View File

@@ -61,7 +61,7 @@ class JojIE(InfoExtractor):
bitrates = self._parse_json(
self._search_regex(
r'(?s)bitrates\s*=\s*({.+?});', webpage, 'bitrates',
r'(?s)(?:src|bitrates)\s*=\s*({.+?});', webpage, 'bitrates',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)

View File

@@ -192,6 +192,8 @@ class KalturaIE(InfoExtractor):
'entryId': video_id,
'service': 'baseentry',
'ks': '{1:result:ks}',
'responseProfile:fields': 'createdAt,dataUrl,duration,name,plays,thumbnailUrl,userId',
'responseProfile:type': 1,
},
{
'action': 'getbyentryid',

View File

@@ -4,16 +4,14 @@ import re
from .common import InfoExtractor
from ..aes import aes_decrypt_text
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..compat import compat_urllib_parse_unquote
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
str_to_int,
strip_or_none,
url_or_none,
)
@@ -55,7 +53,8 @@ class KeezMoviesIE(InfoExtractor):
encrypted = False
def extract_format(format_url, height=None):
if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//')):
format_url = url_or_none(format_url)
if not format_url or not format_url.startswith(('http', '//')):
return
if format_url in format_urls:
return

View File

@@ -0,0 +1,70 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
dict_get,
int_or_none,
)
class KinoPoiskIE(InfoExtractor):
_GEO_COUNTRIES = ['RU']
_VALID_URL = r'https?://(?:www\.)?kinopoisk\.ru/film/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.kinopoisk.ru/film/81041/watch/',
'md5': '4f71c80baea10dfa54a837a46111d326',
'info_dict': {
'id': '81041',
'ext': 'mp4',
'title': 'Алеша попович и тугарин змей',
'description': 'md5:43787e673d68b805d0aa1df5a5aea701',
'thumbnail': r're:^https?://.*',
'duration': 4533,
'age_limit': 12,
},
'params': {
'format': 'bestvideo',
},
}, {
'url': 'https://www.kinopoisk.ru/film/81041',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://ott-widget.kinopoisk.ru/v1/kp/', video_id,
query={'kpId': video_id})
data = self._parse_json(
self._search_regex(
r'(?s)<script[^>]+\btype=["\']application/json[^>]+>(.+?)<',
webpage, 'data'),
video_id)['models']
film = data['filmStatus']
title = film.get('title') or film['originalTitle']
formats = self._extract_m3u8_formats(
data['playlistEntity']['uri'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
description = dict_get(
film, ('descriptscription', 'description',
'shortDescriptscription', 'shortDescription'))
thumbnail = film.get('coverUrl') or film.get('posterUrl')
duration = int_or_none(film.get('duration'))
age_limit = int_or_none(film.get('restrictionAge'))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'age_limit': age_limit,
'formats': formats,
}

View File

@@ -2,11 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
url_or_none,
)
@@ -109,7 +109,8 @@ class KonserthusetPlayIE(InfoExtractor):
captions = source.get('captionsAvailableLanguages')
if isinstance(captions, dict):
for lang, subtitle_url in captions.items():
if lang != 'none' and isinstance(subtitle_url, compat_str):
subtitle_url = url_or_none(subtitle_url)
if lang != 'none' and subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
@@ -32,7 +33,8 @@ class Laola1TvEmbedIE(InfoExtractor):
def _extract_token_url(self, stream_access_url, video_id, data):
return self._download_json(
stream_access_url, video_id, headers={
self._proto_relative_url(stream_access_url, 'https:'), video_id,
headers={
'Content-Type': 'application/json',
}, data=json.dumps(data).encode())['data']['stream-access'][0]
@@ -119,9 +121,59 @@ class Laola1TvEmbedIE(InfoExtractor):
}
class Laola1TvIE(Laola1TvEmbedIE):
class Laola1TvBaseIE(Laola1TvEmbedIE):
def _extract_video(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
conf = self._parse_json(self._search_regex(
r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
display_id,
transform_source=lambda s: js_to_json(re.sub(r'shareurl:.+,', '', s)))
video_id = conf['videoid']
config = self._download_json(conf['configUrl'], video_id, query={
'videoid': video_id,
'partnerid': conf['partnerid'],
'language': conf.get('language', ''),
'portal': conf.get('portalid', ''),
})
error = config.get('error')
if error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
video_data = config['video']
title = video_data['title']
is_live = video_data.get('isLivestream') and video_data.get('isLive')
meta = video_data.get('metaInformation')
sports = meta.get('sports')
categories = sports.split(',') if sports else []
token_url = self._extract_token_url(
video_data['streamAccess'], video_id,
video_data['abo']['required'])
formats = self._extract_formats(token_url, video_id)
return {
'id': video_id,
'display_id': display_id,
'title': self._live_title(title) if is_live else title,
'description': video_data.get('description'),
'thumbnail': video_data.get('image'),
'categories': categories,
'formats': formats,
'is_live': is_live,
}
class Laola1TvIE(Laola1TvBaseIE):
IE_NAME = 'laola1tv'
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/[a-z]+-[a-z]+/[^/]+/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
'info_dict': {
@@ -169,52 +221,30 @@ class Laola1TvIE(Laola1TvEmbedIE):
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._extract_video(url)
webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
class EHFTVIE(Laola1TvBaseIE):
IE_NAME = 'ehftv'
_VALID_URL = r'https?://(?:www\.)?ehftv\.com/[a-z]+(?:-[a-z]+)?/[^/]+/(?P<id>[^/?#&]+)'
conf = self._parse_json(self._search_regex(
r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
display_id, js_to_json)
_TESTS = [{
'url': 'https://www.ehftv.com/int/video/paris-saint-germain-handball-pge-vive-kielce/1166761',
'info_dict': {
'id': '1166761',
'display_id': 'paris-saint-germain-handball-pge-vive-kielce',
'ext': 'mp4',
'title': 'Paris Saint-Germain Handball - PGE Vive Kielce',
'is_live': False,
'categories': ['Handball'],
},
'params': {
'skip_download': True,
},
}]
video_id = conf['videoid']
config = self._download_json(conf['configUrl'], video_id, query={
'videoid': video_id,
'partnerid': conf['partnerid'],
'language': conf.get('language', ''),
'portal': conf.get('portalid', ''),
})
error = config.get('error')
if error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
video_data = config['video']
title = video_data['title']
is_live = video_data.get('isLivestream') and video_data.get('isLive')
meta = video_data.get('metaInformation')
sports = meta.get('sports')
categories = sports.split(',') if sports else []
token_url = self._extract_token_url(
video_data['streamAccess'], video_id,
video_data['abo']['required'])
formats = self._extract_formats(token_url, video_id)
return {
'id': video_id,
'display_id': display_id,
'title': self._live_title(title) if is_live else title,
'description': video_data.get('description'),
'thumbnail': video_data.get('image'),
'categories': categories,
'formats': formats,
'is_live': is_live,
}
def _real_extract(self, url):
return self._extract_video(url)
class ITTFIE(InfoExtractor):

View File

@@ -20,5 +20,7 @@ class LCIIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
wat_id = self._search_regex(r'data-watid=[\'"](\d+)', webpage, 'wat id')
wat_id = self._search_regex(
(r'data-watid=[\'"](\d+)', r'idwat["\']?\s*:\s*["\']?(\d+)'),
webpage, 'wat id')
return self.url_result('wat:' + wat_id, 'Wat', wat_id)

View File

@@ -16,16 +16,15 @@ from ..utils import (
class LibraryOfCongressIE(InfoExtractor):
IE_NAME = 'loc'
IE_DESC = 'Library of Congress'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9a-z_.]+)'
_TESTS = [{
# embedded via <div class="media-player"
'url': 'http://loc.gov/item/90716351/',
'md5': '353917ff7f0255aa6d4b80a034833de8',
'md5': '6ec0ae8f07f86731b1b2ff70f046210a',
'info_dict': {
'id': '90716351',
'ext': 'mp4',
'title': "Pa's trip to Mars",
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 0,
'view_count': int,
},
@@ -57,6 +56,12 @@ class LibraryOfCongressIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.loc.gov/item/ihas.200197114/',
'only_matching': True,
}, {
'url': 'https://www.loc.gov/item/afc1981005_afs20503/',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -67,12 +72,13 @@ class LibraryOfCongressIE(InfoExtractor):
(r'id=(["\'])media-player-(?P<id>.+?)\1',
r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'),
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1',
r'data-tab="share-media-(?P<id>[0-9A-F]{32})"'),
webpage, 'media id', group='id')
data = self._download_json(
'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
video_id)['mediaObject']
media_id)['mediaObject']
derivative = data['derivatives'][0]
media_url = derivative['derivativeUrl']
@@ -89,25 +95,29 @@ class LibraryOfCongressIE(InfoExtractor):
if ext not in ('mp4', 'mp3'):
media_url += '.mp4' if is_video else '.mp3'
if 'vod/mp4:' in media_url:
formats = [{
'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8',
formats = []
if '/vod/mp4:' in media_url:
formats.append({
'url': media_url.replace('/vod/mp4:', '/hls-vod/media/') + '.m3u8',
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
'quality': 1,
}]
elif 'vod/mp3:' in media_url:
formats = [{
'url': media_url.replace('vod/mp3:', ''),
'vcodec': 'none',
}]
})
http_format = {
'url': re.sub(r'(://[^/]+/)(?:[^/]+/)*(?:mp4|mp3):', r'\1', media_url),
'format_id': 'http',
'quality': 1,
}
if not is_video:
http_format['vcodec'] = 'none'
formats.append(http_format)
download_urls = set()
for m in re.finditer(
r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
format_id = m.group('id').lower()
if format_id == 'gif':
if format_id in ('gif', 'jpeg'):
continue
download_url = m.group('url')
if download_url in download_urls:

View File

@@ -0,0 +1,175 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
urlencode_postdata,
)
class LinkedInLearningBaseIE(InfoExtractor):
_NETRC_MACHINE = 'linkedin'
def _call_api(self, course_slug, fields, video_slug=None, resolution=None):
query = {
'courseSlug': course_slug,
'fields': fields,
'q': 'slugs',
}
sub = ''
if video_slug:
query.update({
'videoSlug': video_slug,
'resolution': '_%s' % resolution,
})
sub = ' %dp' % resolution
api_url = 'https://www.linkedin.com/learning-api/detailedCourses'
return self._download_json(
api_url, video_slug, 'Downloading%s JSON metadata' % sub, headers={
'Csrf-Token': self._get_cookies(api_url)['JSESSIONID'].value,
}, query=query)['elements'][0]
def _get_video_id(self, urn, course_slug, video_slug):
if urn:
mobj = re.search(r'urn:li:lyndaCourse:\d+,(\d+)', urn)
if mobj:
return mobj.group(1)
return '%s/%s' % (course_slug, video_slug)
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
login_page = self._download_webpage(
'https://www.linkedin.com/uas/login?trk=learning',
None, 'Downloading login page')
action_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page, 'post url',
default='https://www.linkedin.com/uas/login-submit', group='url')
data = self._hidden_inputs(login_page)
data.update({
'session_key': email,
'session_password': password,
})
login_submit_page = self._download_webpage(
action_url, None, 'Logging in',
data=urlencode_postdata(data))
error = self._search_regex(
r'<span[^>]+class="error"[^>]*>\s*(.+?)\s*</span>',
login_submit_page, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
class LinkedInLearningIE(LinkedInLearningBaseIE):
IE_NAME = 'linkedin:learning'
_VALID_URL = r'https?://(?:www\.)?linkedin\.com/learning/(?P<course_slug>[^/]+)/(?P<id>[^/?#]+)'
_TEST = {
'url': 'https://www.linkedin.com/learning/programming-foundations-fundamentals/welcome?autoplay=true',
'md5': 'a1d74422ff0d5e66a792deb996693167',
'info_dict': {
'id': '90426',
'ext': 'mp4',
'title': 'Welcome',
'timestamp': 1430396150.82,
'upload_date': '20150430',
},
}
def _real_extract(self, url):
course_slug, video_slug = re.match(self._VALID_URL, url).groups()
video_data = None
formats = []
for width, height in ((640, 360), (960, 540), (1280, 720)):
video_data = self._call_api(
course_slug, 'selectedVideo', video_slug, height)['selectedVideo']
video_url_data = video_data.get('url') or {}
progressive_url = video_url_data.get('progressiveUrl')
if progressive_url:
formats.append({
'format_id': 'progressive-%dp' % height,
'url': progressive_url,
'height': height,
'width': width,
'source_preference': 1,
})
title = video_data['title']
audio_url = video_data.get('audio', {}).get('progressiveUrl')
if audio_url:
formats.append({
'abr': 64,
'ext': 'm4a',
'format_id': 'audio',
'url': audio_url,
'vcodec': 'none',
})
streaming_url = video_url_data.get('streamingUrl')
if streaming_url:
formats.extend(self._extract_m3u8_formats(
streaming_url, video_slug, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('width', 'height', 'source_preference', 'tbr', 'abr'))
return {
'id': self._get_video_id(video_data.get('urn'), course_slug, video_slug),
'title': title,
'formats': formats,
'thumbnail': video_data.get('defaultThumbnail'),
'timestamp': float_or_none(video_data.get('publishedOn'), 1000),
'duration': int_or_none(video_data.get('durationInSeconds')),
}
class LinkedInLearningCourseIE(LinkedInLearningBaseIE):
IE_NAME = 'linkedin:learning:course'
_VALID_URL = r'https?://(?:www\.)?linkedin\.com/learning/(?P<id>[^/?#]+)'
_TEST = {
'url': 'https://www.linkedin.com/learning/programming-foundations-fundamentals',
'info_dict': {
'id': 'programming-foundations-fundamentals',
'title': 'Programming Foundations: Fundamentals',
'description': 'md5:76e580b017694eb89dc8e8923fff5c86',
},
'playlist_mincount': 61,
}
@classmethod
def suitable(cls, url):
return False if LinkedInLearningIE.suitable(url) else super(LinkedInLearningCourseIE, cls).suitable(url)
def _real_extract(self, url):
course_slug = self._match_id(url)
course_data = self._call_api(course_slug, 'chapters,description,title')
entries = []
for chapter in course_data.get('chapters', []):
chapter_title = chapter.get('title')
for video in chapter.get('videos', []):
video_slug = video.get('slug')
if not video_slug:
continue
entries.append({
'_type': 'url_transparent',
'id': self._get_video_id(video.get('urn'), course_slug, video_slug),
'title': video.get('title'),
'url': 'https://www.linkedin.com/learning/%s/%s' % (course_slug, video_slug),
'chapter': chapter_title,
'ie_key': LinkedInLearningIE.ie_key(),
})
return self.playlist_result(
entries, course_slug,
course_data.get('title'),
course_data.get('description'))

View File

@@ -15,7 +15,7 @@ from ..utils import (
class LyndaBaseIE(InfoExtractor):
_SIGNIN_URL = 'https://www.lynda.com/signin'
_SIGNIN_URL = 'https://www.lynda.com/signin/lynda'
_PASSWORD_URL = 'https://www.lynda.com/signin/password'
_USER_URL = 'https://www.lynda.com/signin/user'
_ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.'

View File

@@ -3,142 +3,161 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from .theplatform import ThePlatformBaseIE
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import (
determine_ext,
parse_duration,
try_get,
unified_strdate,
ExtractorError,
int_or_none,
update_url_query,
)
class MediasetIE(InfoExtractor):
class MediasetIE(ThePlatformBaseIE):
_TP_TLD = 'eu'
_VALID_URL = r'''(?x)
(?:
mediaset:|
https?://
(?:www\.)?video\.mediaset\.it/
(?:(?:www|static3)\.)?mediasetplay\.mediaset\.it/
(?:
(?:video|on-demand)/(?:[^/]+/)+[^/]+_|
player/playerIFrame(?:Twitter)?\.shtml\?.*?\bid=
player/index\.html\?.*?\bprogramGuid=
)
)(?P<id>[0-9]+)
)(?P<id>[0-9A-Z]{16})
'''
_TESTS = [{
# full episode
'url': 'http://www.video.mediaset.it/video/hello_goodbye/full/quarta-puntata_661824.html',
'url': 'https://www.mediasetplay.mediaset.it/video/hellogoodbye/quarta-puntata_FAFU000000661824',
'md5': '9b75534d42c44ecef7bf1ffeacb7f85d',
'info_dict': {
'id': '661824',
'id': 'FAFU000000661824',
'ext': 'mp4',
'title': 'Quarta puntata',
'description': 'md5:7183696d6df570e3412a5ef74b27c5e2',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1414,
'creator': 'mediaset',
'duration': 1414.26,
'upload_date': '20161107',
'series': 'Hello Goodbye',
'categories': ['reality'],
'timestamp': 1478532900,
'uploader': 'Rete 4',
'uploader_id': 'R4',
},
'expected_warnings': ['is not a supported codec'],
}, {
'url': 'http://www.video.mediaset.it/video/matrix/full_chiambretti/puntata-del-25-maggio_846685.html',
'md5': '1276f966ac423d16ba255ce867de073e',
'url': 'https://www.mediasetplay.mediaset.it/video/matrix/puntata-del-25-maggio_F309013801000501',
'md5': '288532f0ad18307705b01e581304cd7b',
'info_dict': {
'id': '846685',
'id': 'F309013801000501',
'ext': 'mp4',
'title': 'Puntata del 25 maggio',
'description': 'md5:ee2e456e3eb1dba5e814596655bb5296',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 6565,
'creator': 'mediaset',
'upload_date': '20180525',
'duration': 6565.007,
'upload_date': '20180526',
'series': 'Matrix',
'categories': ['infotainment'],
'timestamp': 1527326245,
'uploader': 'Canale 5',
'uploader_id': 'C5',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
# clip
'url': 'http://www.video.mediaset.it/video/gogglebox/clip/un-grande-classico-della-commedia-sexy_661680.html',
'url': 'https://www.mediasetplay.mediaset.it/video/gogglebox/un-grande-classico-della-commedia-sexy_FAFU000000661680',
'only_matching': True,
}, {
# iframe simple
'url': 'http://www.video.mediaset.it/player/playerIFrame.shtml?id=665924&autoplay=true',
'url': 'https://static3.mediasetplay.mediaset.it/player/index.html?appKey=5ad3966b1de1c4000d5cec48&programGuid=FAFU000000665924&id=665924',
'only_matching': True,
}, {
# iframe twitter (from http://www.wittytv.it/se-prima-mi-fidavo-zero/)
'url': 'https://www.video.mediaset.it/player/playerIFrameTwitter.shtml?id=665104&playrelated=false&autoplay=false&related=true&hidesocial=true',
'url': 'https://static3.mediasetplay.mediaset.it/player/index.html?appKey=5ad3966b1de1c4000d5cec48&programGuid=FAFU000000665104&id=665104',
'only_matching': True,
}, {
'url': 'mediaset:661824',
'url': 'mediaset:FAFU000000665924',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>https?://(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml\?.*?\bid=\d+.*?)\1',
webpage)]
def _extract_urls(ie, webpage):
def _qs(url):
return compat_parse_qs(compat_urllib_parse_urlparse(url).query)
def _program_guid(qs):
return qs.get('programGuid', [None])[0]
entries = []
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml.*?)\1',
webpage):
embed_url = mobj.group('url')
embed_qs = _qs(embed_url)
program_guid = _program_guid(embed_qs)
if program_guid:
entries.append(embed_url)
continue
video_id = embed_qs.get('id', [None])[0]
if not video_id:
continue
urlh = ie._request_webpage(
embed_url, video_id, note='Following embed URL redirect')
embed_url = compat_str(urlh.geturl())
program_guid = _program_guid(_qs(embed_url))
if program_guid:
entries.append(embed_url)
return entries
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'https://www.video.mediaset.it/html/metainfo.sjson',
video_id, 'Downloading media info', query={
'id': video_id
})['video']
title = video['title']
media_id = video.get('guid') or video_id
video_list = self._download_json(
'http://cdnsel01.mediaset.net/GetCdn2018.aspx',
video_id, 'Downloading video CDN JSON', query={
'streamid': media_id,
'format': 'json',
})['videoList']
guid = self._match_id(url)
tp_path = 'PR1GhC/media/guid/2702976343/' + guid
info = self._extract_theplatform_metadata(tp_path, guid)
formats = []
for format_url in video_list:
ext = determine_ext(format_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
elif ext == 'ism' or '.ism' in format_url:
formats.extend(self._extract_ism_formats(
format_url, video_id, ism_id='mss', fatal=False))
else:
formats.append({
'url': format_url,
'format_id': determine_ext(format_url),
})
subtitles = {}
first_e = None
for asset_type in ('SD', 'HD'):
for f in ('MPEG4', 'MPEG-DASH', 'M3U', 'ISM'):
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query('http://link.theplatform.%s/s/%s' % (self._TP_TLD, tp_path), {
'mbr': 'true',
'formats': f,
'assetTypes': asset_type,
}), guid, 'Downloading %s %s SMIL data' % (f, asset_type))
except ExtractorError as e:
if not first_e:
first_e = e
break
for tp_f in tp_formats:
tp_f['quality'] = 1 if asset_type == 'HD' else 0
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
if first_e and not formats:
raise first_e
self._sort_formats(formats)
creator = try_get(
video, lambda x: x['brand-info']['publisher'], compat_str)
category = try_get(
video, lambda x: x['brand-info']['category'], compat_str)
categories = [category] if category else None
fields = []
for templ, repls in (('tvSeason%sNumber', ('', 'Episode')), ('mediasetprogram$%s', ('brandTitle', 'numberOfViews', 'publishInfo'))):
fields.extend(templ % repl for repl in repls)
feed_data = self._download_json(
'https://feed.entertainment.tv.theplatform.eu/f/PR1GhC/mediaset-prod-all-programs/guid/-/' + guid,
guid, fatal=False, query={'fields': ','.join(fields)})
if feed_data:
publish_info = feed_data.get('mediasetprogram$publishInfo') or {}
info.update({
'episode_number': int_or_none(feed_data.get('tvSeasonEpisodeNumber')),
'season_number': int_or_none(feed_data.get('tvSeasonNumber')),
'series': feed_data.get('mediasetprogram$brandTitle'),
'uploader': publish_info.get('description'),
'uploader_id': publish_info.get('channel'),
'view_count': int_or_none(feed_data.get('mediasetprogram$numberOfViews')),
})
return {
'id': video_id,
'title': title,
'description': video.get('short-description'),
'thumbnail': video.get('thumbnail'),
'duration': parse_duration(video.get('duration')),
'creator': creator,
'upload_date': unified_strdate(video.get('production-date')),
'webpage_url': video.get('url'),
'series': video.get('brand-value'),
'season': video.get('season'),
'categories': categories,
info.update({
'id': guid,
'formats': formats,
}
'subtitles': subtitles,
})
return info

View File

@@ -15,6 +15,7 @@ from ..utils import (
mimetype2ext,
unescapeHTML,
unsmuggle_url,
url_or_none,
urljoin,
)
@@ -156,8 +157,8 @@ class MediasiteIE(InfoExtractor):
stream_formats = []
for unum, VideoUrl in enumerate(video_urls):
video_url = VideoUrl.get('Location')
if not video_url or not isinstance(video_url, compat_str):
video_url = url_or_none(VideoUrl.get('Location'))
if not video_url:
continue
# XXX: if Stream.get('CanChangeScheme', False), switch scheme to HTTP/HTTPS

View File

@@ -1,84 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import uuid
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
int_or_none,
extract_attributes,
determine_ext,
smuggle_url,
parse_duration,
)
class MiTeleBaseIE(InfoExtractor):
def _get_player_info(self, url, webpage):
player_data = extract_attributes(self._search_regex(
r'(?s)(<ms-video-player.+?</ms-video-player>)',
webpage, 'ms video player'))
video_id = player_data['data-media-id']
if player_data.get('data-cms-id') == 'ooyala':
return self.url_result(
'ooyala:%s' % video_id, ie=OoyalaIE.ie_key(), video_id=video_id)
config_url = compat_urlparse.urljoin(url, player_data['data-config'])
config = self._download_json(
config_url, video_id, 'Downloading config JSON')
mmc_url = config['services']['mmc']
duration = None
formats = []
for m_url in (mmc_url, mmc_url.replace('/flash.json', '/html5.json')):
mmc = self._download_json(
m_url, video_id, 'Downloading mmc JSON')
if not duration:
duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
gcp = location.get('gcp')
ogn = location.get('ogn')
if None in (gat, gcp, ogn):
continue
token_data = {
'gcp': gcp,
'ogn': ogn,
'sta': 0,
}
media = self._download_json(
gat, video_id, data=json.dumps(token_data).encode('utf-8'),
headers={
'Content-Type': 'application/json;charset=utf-8',
'Referer': url,
})
stream = media.get('stream') or media.get('file')
if not stream:
continue
ext = determine_ext(stream)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'thumbnail': player_data.get('data-poster') or config.get('poster', {}).get('imageUrl'),
'duration': duration,
}
class MiTeleIE(InfoExtractor):
IE_DESC = 'mitele.es'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
@@ -86,7 +16,7 @@ class MiTeleIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
'info_dict': {
'id': '57b0dfb9c715da65618b4afa',
'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
'ext': 'mp4',
'title': 'Tor, la web invisible',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
@@ -104,7 +34,7 @@ class MiTeleIE(InfoExtractor):
# no explicit title
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
'info_dict': {
'id': '57b0de3dc915da14058b4876',
'id': 'oyNG1iNTE6TAPP-JmCjbwfwJqqMMX3Vq',
'ext': 'mp4',
'title': 'Cuarto Milenio Temporada 6 Programa 226',
'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
@@ -128,40 +58,21 @@ class MiTeleIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
gigya_url = self._search_regex(
r'<gigya-api>[^>]*</gigya-api>[^>]*<script\s+src="([^"]*)">[^>]*</script>',
webpage, 'gigya', default=None)
gigya_sc = self._download_webpage(
compat_urlparse.urljoin('http://www.mitele.es/', gigya_url),
video_id, 'Downloading gigya script')
# Get a appKey/uuid for getting the session key
appKey = self._search_regex(
r'constant\s*\(\s*["\']_appGridApplicationKey["\']\s*,\s*["\']([0-9a-f]+)',
gigya_sc, 'appKey')
session_json = self._download_json(
'https://appgrid-api.cloud.accedo.tv/session',
video_id, 'Downloading session keys', query={
'appKey': appKey,
'uuid': compat_str(uuid.uuid4()),
})
paths = self._download_json(
'https://appgrid-api.cloud.accedo.tv/metadata/general_configuration,%20web_configuration',
video_id, 'Downloading paths JSON',
query={'sessionKey': compat_str(session_json['sessionKey'])})
'https://www.mitele.es/amd/agp/web/metadata/general_configuration',
video_id, 'Downloading paths JSON')
ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com')
full_path = ooyala_s.get('full_path', '/search/v1/full/providers/')
source = self._download_json(
'http://%s%s%s/docs/%s' % (
ooyala_s['base_url'], ooyala_s['full_path'],
ooyala_s['provider_id'], video_id),
'%s://%s%s%s/docs/%s' % (
ooyala_s.get('protocol', 'https'), base_url, full_path,
ooyala_s.get('provider_id', '104951'), video_id),
video_id, 'Downloading data JSON', query={
'include_titles': 'Series,Season',
'product_name': 'test',
'product_name': ooyala_s.get('product_name', 'test'),
'format': 'full',
})['hits']['hits'][0]['_source']

View File

@@ -161,11 +161,17 @@ class MixcloudIE(InfoExtractor):
stream_info = info_json['streamInfo']
formats = []
def decrypt_url(f_url):
for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'):
decrypted_url = self._decrypt_xor_cipher(k, f_url)
if re.search(r'^https?://[0-9a-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
return decrypted_url
for url_key in ('url', 'hlsUrl', 'dashUrl'):
format_url = stream_info.get(url_key)
if not format_url:
continue
decrypted = self._decrypt_xor_cipher(key, compat_b64decode(format_url))
decrypted = decrypt_url(compat_b64decode(format_url))
if not decrypted:
continue
if url_key == 'hlsUrl':

View File

@@ -167,9 +167,9 @@ class MotherlessGroupIE(InfoExtractor):
if not entries:
entries = [
self.url_result(
compat_urlparse.urljoin(base, '/' + video_id),
ie=MotherlessIE.ie_key(), video_id=video_id)
for video_id in orderedSet(re.findall(
compat_urlparse.urljoin(base, '/' + entry_id),
ie=MotherlessIE.ie_key(), video_id=entry_id)
for entry_id in orderedSet(re.findall(
r'data-codename=["\']([A-Z0-9]+)', webpage))]
return entries

View File

@@ -7,11 +7,10 @@ import re
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_unquote
from ..utils import (
find_xpath_attr,
smuggle_url,
try_get,
unescapeHTML,
update_url_query,
int_or_none,
)
@@ -75,11 +74,16 @@ class NBCIE(AdobePassIE):
'url': 'https://www.nbc.com/classic-tv/charles-in-charge/video/charles-in-charge-pilot/n3310',
'only_matching': True,
},
{
# Percent escaped url
'url': 'https://www.nbc.com/up-all-night/video/day-after-valentine%27s-day/n2189',
'only_matching': True,
}
]
def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + permalink
permalink = 'http' + compat_urllib_parse_unquote(permalink)
response = self._download_json(
'https://api.nbc.com/v3/videos', video_id, query={
'filter[permalink]': permalink,
@@ -263,27 +267,14 @@ class CSNNEIE(InfoExtractor):
class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/
(?:video/.+?/(?P<id>\d+)|
([^/]+/)*(?:.*-)?(?P<mpx_id>[^/?]+))
'''
_VALID_URL = r'(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/([^/]+/)*(?:.*-)?(?P<id>[^/?]+)'
_TESTS = [
{
'url': 'http://www.nbcnews.com/video/nbc-news/52753292',
'md5': '47abaac93c6eaf9ad37ee6c4463a5179',
'info_dict': {
'id': '52753292',
'ext': 'flv',
'title': 'Crew emerges after four-month Mars food study',
'description': 'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
},
},
{
'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
'md5': 'af1adfa51312291a017720403826bb64',
'info_dict': {
'id': 'p_tweet_snow_140529',
'id': '269389891880',
'ext': 'mp4',
'title': 'How Twitter Reacted To The Snowden Interview',
'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
@@ -307,7 +298,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
'md5': '73135a2e0ef819107bbb55a5a9b2a802',
'info_dict': {
'id': 'nn_netcast_150204',
'id': '394064451844',
'ext': 'mp4',
'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
@@ -320,7 +311,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
'md5': 'a49e173825e5fcd15c13fc297fced39d',
'info_dict': {
'id': 'x_lon_vwhorn_150922',
'id': '529953347624',
'ext': 'mp4',
'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
'description': 'md5:c8be487b2d80ff0594c005add88d8351',
@@ -333,7 +324,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
'md5': '118d7ca3f0bea6534f119c68ef539f71',
'info_dict': {
'id': 'tdy_al_space_160420',
'id': '669831235788',
'ext': 'mp4',
'title': 'See the aurora borealis from space in stunning new NASA video',
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
@@ -346,7 +337,7 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
'info_dict': {
'id': 'n_hayes_Aimm_140801_272214',
'id': '314487875924',
'ext': 'mp4',
'title': 'The chaotic GOP immigration vote',
'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
@@ -368,60 +359,22 @@ class NBCNewsIE(ThePlatformIE):
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if video_id is not None:
all_info = self._download_xml('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = all_info.find('video')
return {
'id': video_id,
'title': info.find('headline').text,
'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
'description': info.find('caption').text,
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
}
else:
# "feature" and "nightly-news" pages use theplatform.com
video_id = mobj.group('mpx_id')
video_id = self._match_id(url)
if not video_id.isdigit():
webpage = self._download_webpage(url, video_id)
filter_param = 'byId'
bootstrap_json = self._search_regex(
[r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"',
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);'],
webpage, 'bootstrap json', default=None)
if bootstrap_json:
bootstrap = self._parse_json(
bootstrap_json, video_id, transform_source=unescapeHTML)
data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.+});', webpage,
'bootstrap json'), video_id)
video_id = data['article']['content'][0]['primaryMedia']['video']['mpxMetadata']['id']
info = None
if 'results' in bootstrap:
info = bootstrap['results'][0]['video']
elif 'video' in bootstrap:
info = bootstrap['video']
elif 'msnbcVideoInfo' in bootstrap:
info = bootstrap['msnbcVideoInfo']['meta']
elif 'msnbcThePlatform' in bootstrap:
info = bootstrap['msnbcThePlatform']['videoPlayer']['video']
else:
info = bootstrap
if 'guid' in info:
video_id = info['guid']
filter_param = 'byGuid'
elif 'mpxId' in info:
video_id = info['mpxId']
return {
'_type': 'url_transparent',
'id': video_id,
# http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {filter_param: video_id}),
'ie_key': 'ThePlatformFeed',
}
return {
'_type': 'url_transparent',
'id': video_id,
# http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {'byId': video_id}),
'ie_key': 'ThePlatformFeed',
}
class NBCOlympicsIE(InfoExtractor):

View File

@@ -252,7 +252,7 @@ class NiconicoIE(InfoExtractor):
},
'timing_constraint': 'unlimited'
}
}))
}).encode())
resolution = video_quality.get('resolution', {})

View File

@@ -31,6 +31,8 @@ class NJPWWorldIE(InfoExtractor):
'skip': 'Requires login',
}
_LOGIN_URL = 'https://front.njpwworld.com/auth/login'
def _real_initialize(self):
self._login()
@@ -40,13 +42,17 @@ class NJPWWorldIE(InfoExtractor):
if not username:
return True
# Setup session (will set necessary cookies)
self._request_webpage(
'https://njpwworld.com/', None, note='Setting up session')
webpage, urlh = self._download_webpage_handle(
'https://njpwworld.com/auth/login', None,
self._LOGIN_URL, None,
note='Logging in', errnote='Unable to login',
data=urlencode_postdata({'login_id': username, 'pw': password}),
headers={'Referer': 'https://njpwworld.com/auth'})
headers={'Referer': 'https://front.njpwworld.com/auth'})
# /auth/login will return 302 for successful logins
if urlh.geturl() == 'https://njpwworld.com/auth/login':
if urlh.geturl() == self._LOGIN_URL:
self.report_warning('unable to login')
return False

View File

@@ -6,28 +6,90 @@ import re
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
js_to_json,
qualities,
unified_strdate,
url_or_none,
)
class NovaEmbedIE(InfoExtractor):
_VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
'md5': 'b3834f6de5401baabf31ed57456463f7',
'info_dict': {
'id': '8o0n0r',
'ext': 'mp4',
'title': '2180. díl',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 2578,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
bitrates = self._parse_json(
self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES)
formats = []
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list):
continue
for format_url in format_list:
format_url = url_or_none(format_url)
if not format_url:
continue
f = {
'url': format_url,
}
f_id = format_id
for quality in QUALITIES:
if '%s.mp4' % quality in format_url:
f_id += '-%s' % quality
f.update({
'quality': quality_key(quality),
'format_note': quality.upper(),
})
break
f['format_id'] = f_id
formats.append(f)
self._sort_formats(formats)
title = self._og_search_title(
webpage, default=None) or self._search_regex(
(r'<value>(?P<title>[^<]+)',
r'videoTitle\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1'), webpage,
'title', group='value')
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._search_regex(
r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='value')
duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}
class NovaIE(InfoExtractor):
IE_DESC = 'TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz'
_VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
_TESTS = [{
'url': 'http://tvnoviny.nova.cz/clanek/novinky/co-na-sebe-sportaci-praskli-vime-jestli-pujde-hrdlicka-na-materskou.html?utm_source=tvnoviny&utm_medium=cpfooter&utm_campaign=novaplus',
'info_dict': {
'id': '1608920',
'display_id': 'co-na-sebe-sportaci-praskli-vime-jestli-pujde-hrdlicka-na-materskou',
'ext': 'flv',
'title': 'Duel: Michal Hrdlička a Petr Suchoň',
'description': 'md5:d0cc509858eee1b1374111c588c6f5d5',
'thumbnail': r're:^https?://.*\.(?:jpg)',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
'md5': '1dd7b9d5ea27bc361f110cd855a19bd3',
'info_dict': {
@@ -38,33 +100,6 @@ class NovaIE(InfoExtractor):
'description': 'md5:f0a42dd239c26f61c28f19e62d20ef53',
'thumbnail': r're:^https?://.*\.(?:jpg)',
}
}, {
'url': 'http://novaplus.nova.cz/porad/policie-modrava/video/5591-policie-modrava-15-dil-blondynka-na-hrbitove',
'info_dict': {
'id': '1756825',
'display_id': '5591-policie-modrava-15-dil-blondynka-na-hrbitove',
'ext': 'flv',
'title': 'Policie Modrava - 15. díl - Blondýnka na hřbitově',
'description': 'md5:dc24e50be5908df83348e50d1431295e', # Make sure this description is clean of html tags
'thumbnail': r're:^https?://.*\.(?:jpg)',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://novaplus.nova.cz/porad/televizni-noviny/video/5585-televizni-noviny-30-5-2015/',
'info_dict': {
'id': '1756858',
'ext': 'flv',
'title': 'Televizní noviny - 30. 5. 2015',
'thumbnail': r're:^https?://.*\.(?:jpg)',
'upload_date': '20150530',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://fanda.nova.cz/clanek/fun-and-games/krvavy-epos-zaklinac-3-divoky-hon-vychazi-vyhrajte-ho-pro-sebe.html',
'info_dict': {
@@ -79,6 +114,20 @@ class NovaIE(InfoExtractor):
# rtmp download
'skip_download': True,
}
}, {
# media.cms.nova.cz embed
'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
'info_dict': {
'id': '8o0n0r',
'ext': 'mp4',
'title': '2180. díl',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 2578,
},
'params': {
'skip_download': True,
},
'add_ie': [NovaEmbedIE.ie_key()],
}, {
'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
'only_matching': True,
@@ -103,6 +152,15 @@ class NovaIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
# novaplus
embed_id = self._search_regex(
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
webpage, 'embed url', default=None)
if embed_id:
return self.url_result(
'https://media.cms.nova.cz/embed/%s' % embed_id,
ie=NovaEmbedIE.ie_key(), video_id=embed_id)
video_id = self._search_regex(
[r"(?:media|video_id)\s*:\s*'(\d+)'",
r'media=(\d+)',
@@ -111,8 +169,21 @@ class NovaIE(InfoExtractor):
webpage, 'video id')
config_url = self._search_regex(
r'src="(http://tn\.nova\.cz/bin/player/videojs/config\.php\?[^"]+)"',
r'src="(https?://(?:tn|api)\.nova\.cz/bin/player/videojs/config\.php\?[^"]+)"',
webpage, 'config url', default=None)
config_params = {}
if not config_url:
player = self._parse_json(
self._search_regex(
r'(?s)Player\s*\(.+?\s*,\s*({.+?\bmedia\b["\']?\s*:\s*["\']?\d+.+?})\s*\)', webpage,
'player', default='{}'),
video_id, transform_source=js_to_json, fatal=False)
if player:
config_url = url_or_none(player.get('configUrl'))
params = player.get('configParams')
if isinstance(params, dict):
config_params = params
if not config_url:
DEFAULT_SITE_ID = '23000'
@@ -127,14 +198,20 @@ class NovaIE(InfoExtractor):
}
site_id = self._search_regex(
r'site=(\d+)', webpage, 'site id', default=None) or SITES.get(site, DEFAULT_SITE_ID)
r'site=(\d+)', webpage, 'site id', default=None) or SITES.get(
site, DEFAULT_SITE_ID)
config_url = ('http://tn.nova.cz/bin/player/videojs/config.php?site=%s&media=%s&jsVar=vjsconfig'
% (site_id, video_id))
config_url = 'https://api.nova.cz/bin/player/videojs/config.php'
config_params = {
'site': site_id,
'media': video_id,
'quality': 3,
'version': 1,
}
config = self._download_json(
config_url, display_id,
'Downloading config JSON',
'Downloading config JSON', query=config_params,
transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1])
mediafile = config['mediafile']

View File

@@ -11,20 +11,27 @@ from ..utils import (
class NZZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nzz\.ch/(?:[^/]+/)*[^/?#]+-ld\.(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.nzz.ch/zuerich/gymizyte/gymizyte-schreiben-schueler-heute-noch-diktate-ld.9153',
'info_dict': {
'id': '9153',
},
'playlist_mincount': 6,
}
}, {
'url': 'https://www.nzz.ch/video/nzz-standpunkte/cvp-auf-der-suche-nach-dem-mass-der-mitte-ld.1368112',
'info_dict': {
'id': '1368112',
},
'playlist_count': 1,
}]
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
entries = []
for player_element in re.findall(r'(<[^>]+class="kalturaPlayer"[^>]*>)', webpage):
for player_element in re.findall(
r'(<[^>]+class="kalturaPlayer[^"]*"[^>]*>)', webpage):
player_params = extract_attributes(player_element)
if player_params.get('data-type') not in ('kaltura_singleArticle',):
self.report_warning('Unsupported player type')

View File

@@ -243,7 +243,18 @@ class PhantomJSwrapper(object):
class OpenloadIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream|site|xyz|win|download))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_VALID_URL = r'''(?x)
https?://
(?P<host>
(?:www\.)?
(?:
openload\.(?:co|io|link)|
oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun)
)
)/
(?:f|embed)/
(?P<id>[a-zA-Z0-9-_]+)
'''
_TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o',
@@ -307,10 +318,22 @@ class OpenloadIE(InfoExtractor):
}, {
'url': 'https://oload.download/f/kUEfGclsU9o',
'only_matching': True,
}, {
'url': 'https://oload.cloud/f/4ZDnBXRWiB8',
'only_matching': True,
}, {
# Its title has not got its extension but url has it
'url': 'https://oload.download/f/N4Otkw39VCw/Tomb.Raider.2018.HDRip.XviD.AC3-EVO.avi.mp4',
'only_matching': True,
}, {
'url': 'https://oload.cc/embed/5NEAbI2BDSk',
'only_matching': True,
}, {
'url': 'https://oload.icu/f/-_i4y_F_Hs8',
'only_matching': True,
}, {
'url': 'https://oload.fun/f/gb6G1H4sHXY',
'only_matching': True,
}]
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
@@ -322,8 +345,11 @@ class OpenloadIE(InfoExtractor):
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
url_pattern = 'https://openload.co/%%s/%s/' % video_id
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
url_pattern = 'https://%s/%%s/%s/' % (host, video_id)
headers = {
'User-Agent': self._USER_AGENT,
}
@@ -356,7 +382,7 @@ class OpenloadIE(InfoExtractor):
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage,
'stream URL'))
video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id
video_url = 'https://%s/stream/%s?mime=true' % (host, decoded_id)
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
@@ -367,7 +393,7 @@ class OpenloadIE(InfoExtractor):
entry = entries[0] if entries else {}
subtitles = entry.get('subtitles')
info_dict = {
return {
'id': video_id,
'title': title,
'thumbnail': entry.get('thumbnail') or self._og_search_thumbnail(webpage, default=None),
@@ -376,4 +402,3 @@ class OpenloadIE(InfoExtractor):
'subtitles': subtitles,
'http_headers': headers,
}
return info_dict

Some files were not shown because too many files have changed in this diff Show More