Compare commits

..

284 Commits

Author SHA1 Message Date
Philipp Hagemeister
174aba3223 release 2016.05.01 2016-05-01 10:19:14 +02:00
Sergey M․
4bd143a3a0 [postprocessor/ffmpeg] Simplify metadata preparation and add track related metafields (Closes #9357) 2016-05-01 10:56:54 +06:00
Sergey M․
6f27bf1c74 Credit @blahgeek for xiami (#9079) 2016-05-01 08:08:51 +06:00
Sergey M․
68bb2fef95 [tagesschau] Restrict playlist entry regex 2016-05-01 07:15:23 +06:00
Sergey M․
854cc54bc1 [tagesschau] Expand video id 2016-05-01 07:01:55 +06:00
Sergey M․
651ad35ce0 [tagesschau] Relax _VALID_URL 2016-05-01 06:57:19 +06:00
Sergey M․
6a0f9a24d0 [tagesschau] Separate player extractor 2016-05-01 06:45:44 +06:00
remitamine
9cf79e8f4b [ccc] improve extraction 2016-05-01 01:45:17 +01:00
Sergey M․
2844b09336 [tagesschau] Fix article media ids 2016-05-01 04:42:05 +06:00
Sergey M․
1a2b377cc2 [tagesschau] Fix audio support 2016-05-01 04:38:46 +06:00
Sergey M․
4c1b2e5c0e [tagesschau] Add support for playlists 2016-05-01 04:18:56 +06:00
Sergey M․
9e1b96ae40 [rtlnl] Match formats only by height 2016-05-01 03:20:36 +06:00
Sergey M․
fc35cd9e0c [tagesschau] Relax _VALID_URL 2016-05-01 02:56:32 +06:00
Sergey M․
339fe7228a [tagesschau] Update _FORMATS map 2016-05-01 02:56:32 +06:00
remitamine
ea7e7fecbd [discovery] remove unused imports 2016-04-30 21:55:28 +01:00
remitamine
d00b93d58c [discovery] extract more info using BrightcoveNewIE 2016-04-30 21:49:32 +01:00
remitamine
93f7a31bf3 [discovery] extract subtitle 2016-04-30 20:51:32 +01:00
remitamine
33a1ec950c [discovery] extract http formats 2016-04-30 20:51:32 +01:00
Sergey M․
4e0c0c1508 [xiami] Improve extraction (Closes #9079)
* Switch to JSON source
* Add abstract IE for playlists
* Extract more track related metadata
2016-04-30 21:50:23 +06:00
BlahGeek
89c0dc9a5f [xiami] Add xiami extractor 2016-04-30 21:48:40 +06:00
remitamine
f628d800fb [ted] add support for youtube embeds and update tests 2016-04-30 16:34:57 +01:00
remitamine
11fa3d7f99 [ted] extract all http formats 2016-04-30 15:44:30 +01:00
Sergey M․
d41ee7b774 [vlive] Pass Referer as bytestring (Closes #9352) 2016-04-30 19:22:42 +06:00
remitamine
e0e9bbb0e9 [pbs] extract srt and vtt subtitles 2016-04-30 14:02:17 +01:00
remitamine
7691184a31 [pbs] remove duplicate format 2016-04-30 12:57:30 +01:00
remitamine
35cd2f4c25 [pbs] extract only the formats that we know that they will be available as http format
https://projects.pbs.org/confluence/display/coveapi/COVE+Video+Specifications
2016-04-30 11:32:13 +01:00
remitamine
350d7963db [pbs] fix the least bitrate http url construction 2016-04-30 11:12:11 +01:00
remitamine
cbc032c8b7 [pbs] extract all http formats 2016-04-30 01:24:36 +01:00
remitamine
69c4cde4ba [wsj] improve extraction 2016-04-29 21:37:05 +01:00
Sergey M․
ca278a182b [rtlnl] Replace test 2016-04-30 02:07:29 +06:00
Sergey M․
373e1230e4 [rtlnl] Clarify tests 2016-04-30 01:50:26 +06:00
Sergey M․
cd63d091ce [rtlnl] Fix tests 2016-04-30 01:48:14 +06:00
Sergey M․
0571ffda7d [rtlnl] Improve extraction (Closes #9329)
* Make hls extraction non fatal and revert ext
* Extract progressive formats' metadata from corresponding hls formats
2016-04-30 01:43:39 +06:00
Reino17
5556047465 [rtlnl] Update 720p PG_URL_TEMPLATE
- Fixed the format_id for the 720p progressive videostream and added the video's resolution.
- The adaptive videostreams have the m3u8-extension, so I removed the confusing mp4-extension in order to make a better distinction between the these and the progressive videostreams.
2016-04-30 01:43:13 +06:00
remitamine
65a3bfb379 [dfb] extract m3u8 formats 2016-04-29 19:21:17 +01:00
Yen Chi Hsuan
cef3f3011f [funimation] Detect blocking and support CloudFlare cookies 2016-04-30 00:17:09 +08:00
Yen Chi Hsuan
e9c6cdf4a1 [common] Fix format_id construction for HLS 2016-04-29 22:50:16 +08:00
Sergey M․
00a17a9e12 [crunchyroll] Sort formats 2016-04-29 19:44:10 +06:00
Sergey M․
8312b1a3d1 [crunchyroll] Add even more relaxed fmt fallback 2016-04-29 19:43:53 +06:00
Sergey M․
6ff4469528 [crunchyroll] Relax fmt regex 2016-04-29 19:39:27 +06:00
Yen Chi Hsuan
68835d687a Merge branch 'Kagami-vlive-hls' 2016-04-29 19:30:51 +08:00
Yen Chi Hsuan
9d186afac8 [vlive] Coding style and PEP8 2016-04-29 19:29:50 +08:00
Yen Chi Hsuan
151d98130b Merge branch 'vlive-hls' of https://github.com/Kagami/youtube-dl into Kagami-vlive-hls 2016-04-29 19:26:39 +08:00
Kagami Hiiragi
b24d6336a7 [vlive] Add support for live videos 2016-04-29 14:22:50 +03:00
remitamine
065216d94f [crunchyroll] reduce requests for formats extraction 2016-04-29 11:46:42 +01:00
remitamine
67167920db [viewlift] replace SnagFilms extractors
- add support for other sites that use the same logic
- improve format extraction and sorting
2016-04-29 11:24:10 +01:00
Yen Chi Hsuan
14638e2915 [sexykarma] Rename to WatchIndianPornIE and fix extraction 2016-04-29 18:17:08 +08:00
Yen Chi Hsuan
1910077ed7 Revert "[sexykarma] Remove the extractor"
This reverts commit 31ff3c074e.
2016-04-29 17:59:23 +08:00
Yen Chi Hsuan
5819edef03 [ooyala] Skip an invalid test
Ooyala is used by lots of extractors and its correctness can be verified
by these websites.
2016-04-29 14:27:15 +08:00
Yen Chi Hsuan
f5535ed0e3 [orf] Skip the expired test 2016-04-29 14:24:07 +08:00
Yen Chi Hsuan
31ff3c074e [sexykarma] Remove the extractor
Its domain name is on sale.

Closes #9317
2016-04-29 13:36:52 +08:00
Sergey M․
72670c39de [arte:+7] Fix typo in _VALID_URL 2016-04-29 04:46:23 +06:00
Sergey M․
683d892bf9 [viewster] Remove unused import 2016-04-29 01:30:53 +06:00
Sergey M․
497971cd4a [yandexmusic] Clarify blockage even more 2016-04-29 01:28:07 +06:00
remitamine
e757fb3d05 [crunchyroll] improve extraction
- extract more metadata(series, episode, episode_number)
- reduce duplicate requests for extracting formats
- remove duplicate formats
2016-04-28 18:42:20 +01:00
remitamine
0ba9e3ca22 [viewster] extract formats for videos with multiple audios/subtitles 2016-04-28 17:45:09 +01:00
Sergey M․
4b53762914 [yandexmusic] Clarify blockage 2016-04-28 21:45:33 +06:00
Sergey M․
eebe6b382e [yandexmusic] Improve error handling 2016-04-28 21:37:34 +06:00
Yen Chi Hsuan
0cbcbdd89d [nuvid] Fix extraction
Closes #7620
2016-04-28 17:51:20 +08:00
Yen Chi Hsuan
7f776fa4b5 [yandexmusic] Skip tests as Travis CI blocked 2016-04-28 17:08:41 +08:00
Yen Chi Hsuan
eb5ad31ce1 Merge branch 'pmrowla-mwave-meetgreet' 2016-04-28 16:03:43 +08:00
Yen Chi Hsuan
a5941305b6 [mwave] Coding style 2016-04-28 16:03:08 +08:00
Yen Chi Hsuan
f8dddaf456 Merge branch 'mwave-meetgreet' of https://github.com/pmrowla/youtube-dl into pmrowla-mwave-meetgreet 2016-04-28 15:56:32 +08:00
Yen Chi Hsuan
618c71dc64 [cloudy] New domain name for the test_cloudy_1
I'm sure whether videoraj.ch still works or not, so keep it.
2016-04-28 15:46:00 +08:00
Sergey M․
52af8f222b [cwtv] Relax _VALID_URL (Closes #9327) 2016-04-28 04:01:21 +06:00
Yen Chi Hsuan
3cc8649c9d [20min] Detect embedded YouTube videos
Fixes #9331
2016-04-28 02:58:11 +08:00
Yen Chi Hsuan
dcf094d626 [theplatform] Fix for Python 3.2
test_AENetworks{,_1} fails as in Python < 3.3, binascii.a2b_* functions
accepts only bytes-like objects
2016-04-27 18:35:33 +08:00
Peter Rowlands
5b5d7cc11e [mwave] Add Mwave Meet & Greet extractor 2016-04-27 15:57:17 +09:00
Yen Chi Hsuan
2ac2cbc0a3 [malemotion] Remove the extractor
Announcement from their homepage:

```
MaleMotion is closed

After another system crash, I'm forced to close the site

This week all content will be erased

Don't forget to cancel your subscription if any !
```

Closes #9311.
2016-04-27 13:55:32 +08:00
Yen Chi Hsuan
a7e03861e8 [scivee] Skip the test
Not accessible from either Travis CI or my machine.

Closes #9315
2016-04-27 13:52:04 +08:00
Sergey M
046ea04a7d [README.md] Mention mpv 2016-04-27 00:22:08 +06:00
Sergey M
7464360379 [README.md] Add FAQ entry on output template conflicts 2016-04-27 00:16:48 +06:00
Sergey M․
175c2e9ec3 [youtube:search_url] Reimplement in terms of youtube:playlistbase 2016-04-26 22:29:29 +06:00
remitamine
f1f879098a [viewster] extract more metadata for http formats 2016-04-26 13:40:40 +01:00
Sergey M․
c9fd530670 [ok] Extract start time 2016-04-25 22:15:15 +06:00
Sergey M․
749b0046a8 [ok] Allow embeds without title (Closes #9303) 2016-04-25 22:05:47 +06:00
Yen Chi Hsuan
e3de3d6f2f [normalboots] Fix extraction
Now it's using ScreenwaveMedia
2016-04-25 23:49:12 +08:00
Yen Chi Hsuan
ad58942d57 [muzu] Remove extractor
MUZU is shutting down in October 2015. [1]

[1] http://www.musicbusinessworldwide.com/youtube-rival-muzu-is-heading-into-liquidation/
2016-04-25 23:35:05 +08:00
Yen Chi Hsuan
4645432d7a [eagleplatform] Checking direct HTTP links
Sometimes they fail with 404
2016-04-25 22:48:17 +08:00
Yen Chi Hsuan
6bdc2d5358 [mitele] Comment out unstable MD5
Also Akamai f4f fragments
2016-04-25 22:27:25 +08:00
Yen Chi Hsuan
2beff95da5 [nrk] Comment out unstable MD5 checksums
Both are Akamai f4f fragments.
2016-04-25 22:26:19 +08:00
Yen Chi Hsuan
abc1723edd [unistra] Sort formats
Originally URLs are passed to set() and not sorted, so the result is not
deterministic, causing occasional FAILs on Travis CI.
2016-04-25 22:24:40 +08:00
Yen Chi Hsuan
b248e6485b Merge branch 'remitamine-akamai_pv' 2016-04-25 21:02:30 +08:00
Yen Chi Hsuan
d6712378e7 Merge branch 'akamai_pv' of https://github.com/remitamine/youtube-dl into remitamine-akamai_pv 2016-04-25 21:02:02 +08:00
remitamine
fb72ec58ae [extractor/common] do not process f4m manifest that contain akamai playerVerificationChallenge 2016-04-25 13:37:03 +01:00
Sergey M․
c83a352227 [openload] Make thumbnail optional 2016-04-25 00:26:06 +06:00
Sergey M․
e9063b5de9 [openload] Add test 2016-04-25 00:22:55 +06:00
Sergey M․
594b0c4c69 [openload] Fix ext extraction 2016-04-25 00:03:29 +06:00
Sergey M․
eb9ee19422 [utils] Allow None mimetypes in mimetype2ext 2016-04-25 00:03:12 +06:00
Sergey M․
a1394b820d [openload] Fix title extraction (Closes #9298) 2016-04-25 00:01:37 +06:00
Yen Chi Hsuan
aa9dc24f5a [douyutv] Improve extraction and update tests
The JSON API sometimes return HTML pages with errors
2016-04-24 23:52:17 +08:00
Yen Chi Hsuan
51762e1a31 [xminus] Fix extraction (closes #9228) 2016-04-24 23:21:45 +08:00
Philipp Hagemeister
8b38f2ac40 release 2016.04.24 2016-04-24 17:06:46 +02:00
Yen Chi Hsuan
a82398bd72 [kwuo:song] Fix extraction and update the test 2016-04-24 22:20:45 +08:00
remitamine
c14dc00df3 [viewster] improve http formats extraction 2016-04-24 14:34:28 +01:00
Yen Chi Hsuan
03dd60ca41 [kuwo:category] Fix the test
Sometimes there are 24 songs and sometimes 30 lol
2016-04-24 21:16:06 +08:00
Yen Chi Hsuan
0738187f9b [ThePlatform] Fix tests failed since 79ba9140dc 2016-04-24 20:46:06 +08:00
Yen Chi Hsuan
a956cb6306 [onionstudios] Fix description extraction
\1 does not work in []. Fixes test_Generic_75
(http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537)
2016-04-24 20:41:17 +08:00
Yen Chi Hsuan
a8062eabcd [mwave] Skip checking unstable MD5
On my PC the checksum is 02eda6d09fb63131a17a8d44e6237463, while a
recent Travis CI build
(https://travis-ci.org/rg3/youtube-dl/jobs/125341081) shows it's
c930e27b7720aaa3c9d0018dfc8ff6cc
2016-04-24 20:05:24 +08:00
Yen Chi Hsuan
2a7dee8cc5 [yahoo] Improve error detection and update tests 2016-04-24 18:12:16 +08:00
Yen Chi Hsuan
d9ed362116 [yahoo] Extract all <iframe>s
Fixes test_yahoo_6

(https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html)
2016-04-24 17:46:25 +08:00
Yen Chi Hsuan
4f54958097 [yahoo] Update some tests
One has new fields as ThePlatformIE changed, and others have changed
files.
2016-04-24 17:29:01 +08:00
Yen Chi Hsuan
2a7c38831c [yahoo] Extend _VALID_URL and fix extraction
Closes #9271
2016-04-24 17:01:18 +08:00
Yen Chi Hsuan
949b6497cc [generic] Unescape the video URL
Fixes #9279
2016-04-24 16:25:37 +08:00
Sergey M
2c21152ca7 [README.md] Document track metafields in output template 2016-04-24 12:22:18 +06:00
remitamine
fda9a1ca9e [viewster] simplify qualities_basename regex 2016-04-24 03:06:46 +01:00
remitamine
864d5e7231 [viewster] extract all http formats 2016-04-24 02:32:56 +01:00
Sergey M․
5448b781f6 [dplay] Sign unsigned final download hls URLs 2016-04-23 17:28:45 +06:00
Sergey M․
e239413fbc [dplay] Extract subtitles (Closes #9284) 2016-04-23 16:50:31 +06:00
Sergey M․
fd0ff8bad8 [dplay] Improve extraction and document workarounds and tests 2016-04-23 16:36:17 +06:00
Sergey M․
397ec446f3 [dplay] Try secure api for no tld (Closes #9282) 2016-04-23 15:59:30 +06:00
remitamine
29a7e8f6f8 [nhl] Add new extractor(closes #8419)(closes #8798) 2016-04-22 20:18:27 +01:00
Yen Chi Hsuan
eb01e97e10 [youku] Skip streams with channel_type=tail
Fixes #9275

These video segments look like ads and they don't appear in the web
player.
2016-04-23 02:54:09 +08:00
remitamine
cb7d4d0efd [nbc] add support for today.com(closes #2909) 2016-04-22 18:08:20 +01:00
Yen Chi Hsuan
c80037918b [iqiyi] Improve error detection (#9276) 2016-04-23 00:06:49 +08:00
remitamine
237a41108a [eagleplatform] extract all http formats 2016-04-22 14:32:38 +01:00
remitamine
e962ae15d3 [newstube] extract http formats(closes #9253) 2016-04-22 11:26:43 +01:00
remitamine
7c36ea7d54 [rtbf] improve extraction(fixes #9267) 2016-04-21 22:52:49 +01:00
remitamine
9260cf1d97 [tubitv] fix extraction(closes #8741) 2016-04-21 20:30:19 +01:00
Sergey M․
bdbb8530c7 [vimeo] Pass Referer for check-password request 2016-04-22 00:02:39 +06:00
Sergey M․
09a9fadb84 [dump] Remove extractor 2016-04-21 23:31:34 +06:00
Sergey M․
bf09af3acb Add --hls-prefer-ffmpeg 2016-04-21 23:02:17 +06:00
Sergey M․
88296ac326 [planetaplay] Remove remainings of extractor 2016-04-21 22:57:38 +06:00
Sergey M․
870d525848 [options] Remove experimental mark for --hls-prefer-native 2016-04-21 22:44:01 +06:00
Sergey M․
6577112890 [planetaplay] Remove extractor (Closes #9256) 2016-04-21 22:33:54 +06:00
Sergey M․
1988647dda [tvigle] Skip hls completely (#9259) 2016-04-21 22:15:20 +06:00
Yen Chi Hsuan
a292cba256 [mgtv] Fix _VALID_URL and add localized name 2016-04-22 00:07:43 +08:00
Yen Chi Hsuan
982e518a96 [dispeak] Rename DigitalSpeaking to DigitallySpeaking 2016-04-22 00:07:43 +08:00
Yen Chi Hsuan
748e730099 [dispeak] Several fixes 2016-04-22 00:07:43 +08:00
Sergey M
b6c0d4f431 Merge pull request #9110 from remitamine/parse_duration
[utils] imporove parse_duration to handle more formats
2016-04-21 22:53:16 +07:00
remitamine
acaff49575 [utils] imporove parse_duration to handle more formats 2016-04-21 16:34:54 +01:00
Yen Chi Hsuan
1da19488f9 [mgtv] Add new extractor (closes #9212) 2016-04-21 23:29:51 +08:00
Yen Chi Hsuan
442c4d361f [dispeak/gdcvault] Add the test case from #5784 2016-04-21 19:47:10 +08:00
Yen Chi Hsuan
ec59d657e7 [dispeak] Add new extractor
Both GDCVault and GPUTechConf uses the service of DigitalSpeaking.
2016-04-21 19:36:33 +08:00
Yen Chi Hsuan
99ef96f84c [gdcvault] Fix for videos with hard-coded hostnames
Fixes #9248
2016-04-21 18:07:03 +08:00
Yen Chi Hsuan
4dccea8ad0 [streetvoice] Fix extraction
The old API results in URLs with HTTP 403 from time to time.

Hopefully fixes #9219.
2016-04-21 13:07:53 +08:00
Yen Chi Hsuan
2c0d9c6217 [extractor/common] Allow empty post data 2016-04-21 13:06:06 +08:00
Sergey M․
12a5134596 [tvigle] Fix extraction (Closes #9259) 2016-04-20 23:52:41 +06:00
Sergey M․
16e633a5d7 [quickvid] Remove extractor (Closes #9258) 2016-04-20 23:29:02 +06:00
Sergey M․
494ab6db73 [youtube] Capture and output login error message 2016-04-20 22:14:32 +06:00
Sergey M․
107701fcfc [people] Remove bogus comment 2016-04-20 03:40:02 +06:00
Sergey M․
f77970765a [people] Add extractor 2016-04-20 03:37:23 +06:00
Philipp Hagemeister
81215d5652 release 2016.04.19 2016-04-19 03:03:52 +02:00
Sergey M․
241a318f27 [vimeo] Improve _VALID_URL (Closes #9229) 2016-04-18 21:40:28 +06:00
Sergey M․
4fdf082375 [theonion] Remove extractor (Closes #9220)
It now uses generic onionstudios embed
2016-04-17 23:12:23 +06:00
Jaime Marquínez Ferrándiz
1b6182d8f7 [youtube:playlist] Fetch all the videos in a mix (fixes #3837)
Since there doesn't seem to be any indication, it stops when there aren't new videos in the webpage.
2016-04-17 17:07:57 +02:00
remitamine
7bab22a402 [vice] remove unused import and variable 2016-04-17 14:06:19 +01:00
Yen Chi Hsuan
0f97fb4d00 [musicplayon] Relax _VALID_URL and improve metadata extraction
In r'pl=\d+&play=\d+' pages, several metadata items are missing

Closes #9222.
2016-04-17 17:24:33 +08:00
Yen Chi Hsuan
b1cf58f48f [musicplayon] Fix extraction (closes #9222) 2016-04-17 15:08:51 +08:00
remitamine
3014b0ae83 Merge pull request #9195 from remitamine/ffmpeg-pipe
[downloader/external] enable piping for FFmpegFD(closes #2124)
2016-04-16 22:00:49 +01:00
remitamine
b9f2fdd37f [ffmpeg] Clarify rationale for pipe(-) exclusion in _ffmpeg_filename_argument 2016-04-16 21:50:13 +01:00
remitamine
bbb3f730bb [onionstudios] extract m3u8 formats 2016-04-16 20:53:13 +01:00
remitamine
d868f43c58 [ffmpeg] check for - file name in _ffmpeg_filename_argument 2016-04-16 19:45:56 +01:00
Yen Chi Hsuan
21525bb8ca [kuwo:category] Update the test
Now the webpage says there are 24 songs.
2016-04-17 02:38:05 +08:00
Sergey M․
d8f103159f [nerdist] Remove extractor
It now uses brightcove
2016-04-17 00:16:31 +06:00
remitamine
663ee5f0a9 [vice] extract youtube embed 2016-04-16 17:49:39 +01:00
Sergey M․
b6b950bf58 [cbs] Remove unused import 2016-04-16 22:47:10 +06:00
Sergey M․
11e60fcad8 [extractor/generic] Improve instagram embeds (Closes #9213) 2016-04-16 22:39:20 +06:00
Sergey M․
c23533a100 [instagram] Add support for iframe embeds 2016-04-16 22:31:05 +06:00
Sergey M․
0dafea02e6 [instagram] Add support for embed URLs 2016-04-16 22:23:08 +06:00
Sergey M․
5d6360c3b7 [mooshare] Remove extractor 2016-04-16 21:31:50 +06:00
Yen Chi Hsuan
5e5c30c3fd [mdr] Fix extraction and update tests
It's strange that the date is changed. Anyway, new data matches what the
webpage says.
2016-04-16 21:57:28 +08:00
Yen Chi Hsuan
9154c87fc4 [huffpost] Fix a typo 2016-04-16 21:41:22 +08:00
Yen Chi Hsuan
ef0e4e7bc0 [generic] Fix test_Generic_2
Now a HEAD request returns 400 Bad Request
2016-04-16 19:44:45 +08:00
Yen Chi Hsuan
67d46a3f90 [ustream] Fix /embed/ URLs and add a test 2016-04-16 19:39:25 +08:00
Yen Chi Hsuan
bec47a0748 [tudou] Improve error detection (closes #9175) 2016-04-16 19:11:25 +08:00
Yen Chi Hsuan
36b7d9dbfa [twitter] Don't check /cards/ URLs
Fixes #9181

In this tweet, there are two cards:
1. https://twitter.com/i/cards/tfw/v1/719944006306701313
   This shows #TeamCap vs. #TeamIronMan
2. https://twitter.com/i/videos/tweet/719944021058060289
   This is the real video and can be handled by TwitterCardIE

In all current test_Twitter* tests, /videos/tweet/ approach works fine.
2016-04-16 18:57:50 +08:00
Yen Chi Hsuan
8c65e4a527 [bbc] Fix a test 2016-04-16 18:00:19 +08:00
Yen Chi Hsuan
6ad2ef8b7c [audiomack] Update the test
The original test raises 404
2016-04-16 17:54:39 +08:00
Yen Chi Hsuan
00b426d66d [varzesh3] Add md5 to the test 2016-04-16 17:41:56 +08:00
Yen Chi Hsuan
0de968b584 [newgrounds] Support videos (closes #9138) 2016-04-16 17:41:56 +08:00
remitamine
0841d5013c [cbs] do not catch Exceptions raised by by _extract_theplatform_smil 2016-04-16 10:25:59 +01:00
remitamine
a71fca8577 [theplatform] remove _sort_formats from _extract_theplatform_smil 2016-04-16 10:23:56 +01:00
Yen Chi Hsuan
ee94e7e66d [varzesh3] Fix metadata extraction (closes #9197) 2016-04-16 17:13:22 +08:00
Yen Chi Hsuan
759e37c9e6 [gazeta] Relax _VALID_URL and update tests
Closes #9196
2016-04-16 16:48:47 +08:00
Yen Chi Hsuan
ae65567102 [eagleplatform] Fix error handling 2016-04-16 16:47:16 +08:00
Yen Chi Hsuan
c394b4f4cb [puls4] Fix error detection (#9194) 2016-04-16 16:22:44 +08:00
Yen Chi Hsuan
260c7036ba [sportbox] Fix SportBoxEmbedIE
Also fixes test_Generic_29 (http://www.vestifinance.ru/articles/25753)
2016-04-16 16:13:14 +08:00
remitamine
f74197a074 [cbs] extract rtmp formats 2016-04-15 22:38:37 +01:00
remitamine
f3a58d46bf [youtube:user] check if the url didn't match only the other youtube extractors 2016-04-15 19:06:13 +01:00
Sergey M․
b6612c9b11 [karaoketv] Fix extraction 2016-04-15 21:26:54 +06:00
Yen Chi Hsuan
7e176effb2 [iqiyi] Also suuport pps.tv URLs
PPS is acquired by Baidu and merged with iQiyi in 2013 [1]. Now they
have the same page layouts.

[1] http://www.chinanews.com/it/2013/05-07/4792526.shtml
2016-04-15 22:39:18 +08:00
Yen Chi Hsuan
4a252cc2d2 [karaoketv] Update and mark as not _WORKING 2016-04-15 21:49:17 +08:00
Yen Chi Hsuan
f0ec61b525 [huffpost] Fix extraction 2016-04-15 20:55:56 +08:00
Yen Chi Hsuan
66d40ae3a5 Merge pull request #9041 from kasper93/master
[generic] Add support for LiveLeak embeds
2016-04-15 17:23:55 +08:00
Yen Chi Hsuan
e6da9240d4 [mixcloud:stream] Add new extractor
Closes #7633
2016-04-15 17:14:17 +08:00
Yen Chi Hsuan
dd91dfcd67 [mixcloud] Fix extraction by decrypting play info
Fixes #7521
2016-04-15 15:48:22 +08:00
Yen Chi Hsuan
c773082692 Merge branch 'Phaeilo-mixcloud' 2016-04-15 14:33:04 +08:00
Yen Chi Hsuan
9c250931f5 [mixcloud] Improve and simplify mixcloud:user and mixcloud:playlist 2016-04-15 14:32:02 +08:00
Yen Chi Hsuan
56f1750049 [tdslifeway] Use the new Brightcove API
Thanks for @remitamine's suggestion.
2016-04-15 04:28:54 +08:00
Yen Chi Hsuan
f2159c9815 [wayofthemaster] Remove extractor
Now it's using YouTube embeds.
2016-04-15 04:02:23 +08:00
Yen Chi Hsuan
b0cf2e7c1b [ubu] Remove extractor
1. Videos on ubu.com are now hosted on Vimeo
2. The duration is far from correct, and may not exist on other videos
   (For example http://ubu.com/film/hammons_king.html)
2016-04-15 03:48:23 +08:00
Yen Chi Hsuan
74b47d00c3 [xboxclips] Use http:// URL
xboxclips has misconfigured certificates
2016-04-15 03:30:38 +08:00
Yen Chi Hsuan
8cb57bab8e [ministrygrid] Fix extraction and modernize 2016-04-15 02:48:12 +08:00
Yen Chi Hsuan
e1bf277e19 [tdslifeway] Add TDSLifewayIE
Used by MinistryGridIE
2016-04-15 02:48:12 +08:00
remitamine
ce599d5a7e [downloader/external] enable piping for FFmpegFD(closes #2124) 2016-04-14 18:49:02 +01:00
Sergey M․
9e28538726 [arte:creative] Improve _VALID_URL 2016-04-14 21:54:41 +06:00
Sergey M․
404284132c [arte:info] Add extractor (Closes #9182) 2016-04-14 21:52:05 +06:00
remitamine
5565be9dd9 [aol] relex _VALID_URL regex 2016-04-14 08:47:55 +01:00
Yen Chi Hsuan
b3a9474ad1 Merge branch 'mixcloud' of https://github.com/Phaeilo/youtube-dl into Phaeilo-mixcloud 2016-04-14 15:31:58 +08:00
Yen Chi Hsuan
86475d59b1 [metacritic] Add a new valid test case 2016-04-14 15:12:59 +08:00
Yen Chi Hsuan
73d93f948e [lecture2go] Fix extraction
RTSP stream fails to download. Seems it's a mpv bug as direct playback
works well:

$ mpv --ytdl-format rtsp https://lecture2go.uni-hamburg.de/veranstaltungen/-/v/17473
2016-04-14 15:08:01 +08:00
Yen Chi Hsuan
f5d8743e0a [downloader/rtsp] Print the command 2016-04-14 15:07:31 +08:00
Yen Chi Hsuan
d1c4e4ba15 [laola1tv] Improve error detection and skip an invalid test 2016-04-14 14:11:28 +08:00
Yen Chi Hsuan
f141fefab7 [karrierevideos] Fix extraction
The server serves malformed header "Content Type: text/xml" for the XML
request (it should be Content-Type but not Content Type). Python 3.x,
which uses email.feedparser rejects such headers. As a result,
Content-Encoding header is not parsed, so the returned content is kept
not decompressed, and thus XML parsing error.
2016-04-14 14:06:05 +08:00
aystroganov@gmail.com
8334637f4a Make tbr field 'int' rather than 'tuple'
Closes #9180.
2016-04-13 14:29:34 +02:00
Philipp Hagemeister
b0ba11cc64 release 2016.04.13 2016-04-13 08:02:03 +02:00
Kacper Michajłow
b8f67449ec [generic] Add support for LiveLeak embeds 2016-04-13 01:54:19 +02:00
Yen Chi Hsuan
75af5d59ae [netease] Skip all tests: completely georestricted 2016-04-13 04:52:07 +08:00
Sergey M․
b969d12490 Credit @Phaeilo for presstv (#7113) 2016-04-13 01:52:50 +06:00
Philip Huppert
6d67169509 [mixcloud] improved extraction of user description 2016-04-12 21:18:13 +02:00
Philip Huppert
dcaf00fb3e [mixcloud] support older urllib versions 2016-04-12 21:18:13 +02:00
Philip Huppert
f896e1ccef [mixcloud] fixed some tests 2016-04-12 21:18:13 +02:00
Philip Huppert
c96eca426b [mixcloud] Added support for user uploads, playlists, favorites and listens.
Fixes #3750 and #5272
2016-04-12 21:18:13 +02:00
Sergey M․
466a614537 [youtube:playlist] Recognize popular uploads playlist as mix (Closes #9170) 2016-04-12 21:38:31 +06:00
Sergey M․
ffa2cecf72 [ard] Change subtitles extension to ttml (Closes #9169)
ttml is now served instead of srt
2016-04-12 21:20:31 +06:00
Yen Chi Hsuan
a837416025 [jadorecettepub] Remove extractor: website gone 2016-04-12 18:30:53 +08:00
Yen Chi Hsuan
c9d448876f [izlesene] Fix extraction
description may be absent
2016-04-12 18:29:28 +08:00
Yen Chi Hsuan
8865b8abfd [howstuffworks] Skip a broken test case 2016-04-12 17:30:14 +08:00
Yen Chi Hsuan
c77a0c01cb [groupon] Fix extraction 2016-04-12 17:26:09 +08:00
Yen Chi Hsuan
12355ac473 [goshgay] Fix extraction
isFamilyFriendly no longer exists in the webpage and I can't find
another indicator.
2016-04-12 17:23:00 +08:00
Sergey M․
49f523ca50 [mixcloud] Capture error message (#9156) 2016-04-11 20:45:58 +06:00
remitamine
4a903b93a9 Revert "[openclassroom] Add new extractor(closes #9147)"
This reverts commit 13267a2be3.
2016-04-11 14:44:35 +01:00
remitamine
13267a2be3 [openclassroom] Add new extractor(closes #9147) 2016-04-11 14:24:08 +01:00
Yen Chi Hsuan
134c207e3f [arte.tv:embed] Extended support (#2620) 2016-04-11 19:32:27 +08:00
Yen Chi Hsuan
0f56bd2178 Merge branch 'Phaeilo-presstv' 2016-04-11 16:17:05 +08:00
Yen Chi Hsuan
dfbc7f7f3f [presstv] Improve and simplify 2016-04-11 16:14:07 +08:00
Yen Chi Hsuan
7d58ea7c5b Merge branch 'presstv' of https://github.com/Phaeilo/youtube-dl into Phaeilo-presstv 2016-04-11 15:48:10 +08:00
Sergey M․
452908b257 [telebruxelles] Fix extraction (Closes #9142) 2016-04-11 00:06:05 +06:00
Sergey M․
5899e988d5 [glide] Improve extraction and extract upload info 2016-04-10 23:56:23 +06:00
Sergey M․
4a121d29bb [glide] Fix extraction (Closes #9141) 2016-04-10 23:45:17 +06:00
Sergey M․
7ebc36900d [jwplatform:base] Improve subtitles extraction 2016-04-10 22:55:07 +06:00
Sergey M․
d7eb052fa2 [screencastomatic] Add duration to test 2016-04-10 22:48:04 +06:00
Sergey M․
a6d6722c8f [jwplatform:base] Extract duration 2016-04-10 22:47:38 +06:00
Sergey M․
66fa495868 [screencastomatic] Fix extraction (Closes #9136) 2016-04-10 22:37:14 +06:00
Sergey M․
443285aabe [ebaumsworlds] Update _VALID_URL (Closes #9135) 2016-04-10 22:15:11 +06:00
Philip Huppert
de728757ad [presstv] Refactored extractor. 2016-04-10 16:36:44 +02:00
Sergey M․
f44c276842 [extractor/extractors] Remove non-existant imports 2016-04-10 19:21:58 +06:00
Sergey M․
a1fa60a934 [cliprs] Add extractor (Closes #9099) 2016-04-10 18:43:40 +06:00
Sergey M․
49caf3307f [extractor/common] Remove irrelevant comment 2016-04-10 17:10:27 +06:00
Jaime Marquínez Ferrándiz
6a801f4470 [test/InfoExtractors] add test for _download_json 2016-04-09 23:18:41 +02:00
Sergey M․
61dd350a04 [1tv] Fix extraction (Closes #9103) 2016-04-10 03:02:35 +06:00
Jaime Marquínez Ferrándiz
eb9c3edd5e [test/utils] Add test for date_from_str 2016-04-09 22:40:05 +02:00
Philip Huppert
95153a960d [presstv] updated extractor and tests to work with current PressTV website 2016-04-09 16:14:05 +02:00
Yen Chi Hsuan
6c4c7539f2 [test/helper] Check got values to be strings for md5: fields
Seen in PBSIE tests
2016-04-09 22:04:48 +08:00
Yen Chi Hsuan
c991106706 [videodetective] Adapt to InternetVideoArchiveIE 2016-04-09 21:47:35 +08:00
Yen Chi Hsuan
dae2a058de [rottentomatoes] Adapt to InternetVideoArchiveIE 2016-04-09 21:47:12 +08:00
Yen Chi Hsuan
c05025fdd7 [internetvideoarchive] Fix extraction and support json URLs 2016-04-09 21:46:51 +08:00
Philip Huppert
bfe96d7bea [presstv] Added extractor PressTV.
Fixes #7060
2016-04-09 14:55:54 +02:00
Yen Chi Hsuan
ab481b48e5 [funnyordie] Relax M3U8 URL matching
Also, m3u8_url extraction should be fatal as all formats depends
directly or indirectly on it.

This change fixes test_Generic_26 and TestFunnyOrDieSubtitles
2016-04-09 20:17:35 +08:00
Sergey M․
92c7f3157a [aol] Add coding cookie 2016-04-09 17:32:23 +06:00
Yen Chi Hsuan
cacd996662 [utils] Don't touch URLs if not necessary
Fix test_Generic_15 (Google redirect)
2016-04-09 19:27:54 +08:00
remitamine
bffb245a48 [aol] add support for videos with vidible IDs(closes #9124) 2016-04-09 10:51:23 +01:00
Yen Chi Hsuan
680efb6723 Merge pull request #8497 from jaimeMF/lazy-load
Add experimenta lazy loading of info extractors
2016-04-09 14:08:13 +08:00
Jaime Marquínez Ferrándiz
5a9858bfa9 setup.py: add command for building the lazy_extractors module 2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
8a5dc1c1e1 lazy extractors: Initialize the real info extractor
According to the docs '__init__' is only called automatically if '__new__' returns an instance of the original class.
2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
e0986e31cf lazy extractors: Output if it's enabled in the verbose log 2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
6b97ca96fc lazy extractors: Style fixes
* Sort extractors alphabetically
* Add newlines when needed (youtube_dl/extractors/lazy_extractors.py pass the flake8 test now)
2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
c1ce6acdd7 lazy extractors: Fix building with python2.6 2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
0d778b1db9 lazy extractors: specify the encoding
When building with python3 the unicode characters are not escaped, python2 needs to know the encoding.
2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
779822d945 Add experimental support for lazy loading the info extractors
'make lazy-extractors' creates the youtube_dl/extractor/lazy_extractors.py (imported by youtube_dl/extractor/__init__.py), which contains simplified classes that only have the 'suitable' class method and that load the appropiate class with the '__new__' method when a instance is created.
2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
1b3d5e05a8 Move the extreactors import to youtube_dl/extractor/extractors.py 2016-04-08 21:47:51 +02:00
Jaime Marquínez Ferrándiz
e52d7f85f2 Delay initialization of InfoExtractors until they are needed 2016-04-08 21:43:24 +02:00
Sergey M․
568d2f78d6 [tnaflix] Fix metadata extraction 2016-04-09 00:27:24 +06:00
Sergey M․
2f2fcf1a33 [tnaflix] Fix extraction (Closes #9074) 2016-04-08 23:34:59 +06:00
Sergey M․
bacec0397f [extractor/common] Relax _hidden_inputs 2016-04-08 23:33:45 +06:00
Sergey M․
3c6c7e7d7e [gdcvault] Fix extraction (Closes #9107, closes #9114) 2016-04-08 23:16:02 +06:00
Sergey M․
fb38aa8b53 [extractor/common] Support arbitrary format strings for template based identifiers in mpd manifests (Closes #9119, closes #9120) 2016-04-08 22:48:08 +06:00
Sergey M․
18da24634c [democracynow] Improve extraction 2016-04-08 22:27:27 +06:00
Sergey M․
a134426d61 [democracynow] Fix tests 2016-04-08 22:21:14 +06:00
Sergey M․
a64c0c9b06 [democracynow] Make description optional (Closes #9115) 2016-04-08 22:15:36 +06:00
Sergey M․
56019444cb [novamov] Improve _VALID_URL template (Closes #9116) 2016-04-08 21:26:42 +06:00
remitamine
a1ff3cd5f9 [acast] fix channel extraction(closes #9117) 2016-04-08 15:15:34 +01:00
remitamine
9a32e80477 [acast] fix extraction(#9117) 2016-04-08 14:51:00 +01:00
Sergey M․
536a55dabd [YoutubeDL] Sanitize single thumbnail URL 2016-04-08 00:17:47 +06:00
Sergey M․
ed6fb8b804 [vrt] Add support for direct hls playlists and YouTube (Closes #9108) 2016-04-07 23:22:43 +06:00
Sergey M․
3afef2e3fc [beeg] Improve extraction 2016-04-07 22:40:35 +06:00
Sergey M․
e90d175436 [yandexmusic] Extract music album metafields (Closes #7354) 2016-04-07 02:56:13 +06:00
Sergey M․
7a93ab5f3f [extractor/common] Introduce music album metafields 2016-04-07 02:53:53 +06:00
Philipp Hagemeister
c41cf65d4a release 2016.04.06 2016-04-06 15:13:08 +02:00
Jaime Marquínez Ferrándiz
ec4a4c6fcc Makefile: remove ISSUE_TEMPLATE.md from the 'all' target (fixes #9088)
It isn't included in the tar file, causing build failures.
Since it's only used for GitHub, I think we don't need to store it in the tar file.
2016-04-06 14:16:05 +02:00
Jaime Marquínez Ferrándiz
be0c7009fb Makefile: use full path for the ISSUE_TEMPLATE.md file 2016-04-06 14:09:31 +02:00
Yen Chi Hsuan
92d5477d84 [compat] Handle tuples properly in urlencode()
Fixes #9055
2016-04-06 18:29:54 +08:00
Yen Chi Hsuan
8790249c68 [iqiyi] Improve error detection for VIP-only videos
Closes #9071
2016-04-06 16:12:16 +08:00
151 changed files with 4575 additions and 2966 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.05*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.05**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.05.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.05.01**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.05
[debug] youtube-dl version 2016.05.01
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

1
.gitignore vendored
View File

@@ -13,6 +13,7 @@ README.txt
youtube-dl.1
youtube-dl.bash-completion
youtube-dl.fish
youtube_dl/extractor/lazy_extractors.py
youtube-dl
youtube-dl.exe
youtube-dl.tar.gz

View File

@@ -167,3 +167,5 @@ Kacper Michajłow
José Joaquín Atria
Viťas Strádal
Kagami Hiiragi
Philip Huppert
blahgeek

View File

@@ -140,14 +140,14 @@ After you have ensured this site is distributing it's content legally, you can f
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md ISSUE_TEMPLATE.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
@@ -59,7 +59,7 @@ README.md: youtube_dl/*.py youtube_dl/*/*.py
CONTRIBUTING.md: README.md
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
ISSUE_TEMPLATE.md:
.github/ISSUE_TEMPLATE.md: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md youtube_dl/version.py
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md .github/ISSUE_TEMPLATE.md
supportedsites:
@@ -88,6 +88,12 @@ youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py'
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \

View File

@@ -176,7 +176,9 @@ which means you can modify it, redistribute it or use it however you like.
--xattr-set-filesize Set file xattribute ytdl.filesize with
expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of
ffmpeg (experimental)
ffmpeg
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
downloader
--hls-use-mpegts Use the mpegts container for HLS videos,
allowing to play the video while
downloading (some players may not be able
@@ -515,6 +517,18 @@ Available for the video that is an episode of some series or programme:
- `episode_number`: Number of the video episode within a season
- `episode_id`: Id of the video episode
Available for the media that is a track or a part of a music album:
- `track`: Title of the track
- `track_number`: Number of the track within an album or a disc
- `track_id`: Id of the track
- `artist`: Artist(s) of the track
- `genre`: Genre(s) of the track
- `album`: Title of the album the track belongs to
- `album_type`: Type of the album
- `album_artist`: List of all artists appeared on the album
- `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
@@ -683,6 +697,10 @@ YouTube changed their playlist format in March 2014 and later on, so you'll need
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
Make sure you are not using `-o` with any of these options `-t`, `--title`, `--id`, `-A` or `--auto-number` set in command line or in a configuration file. Remove the latter if any.
### Do I always have to pass `-citw`?
By default, youtube-dl intends to have the best options (incidentally, if you have a convincing case that these should be different, [please file an issue where you explain that](https://yt-dl.org/bug)). Therefore, it is unnecessary and sometimes harmful to copy long option strings from webpages. In particular, the only option out of `-citw` that is regularly useful is `-i`.
@@ -703,7 +721,7 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
@@ -889,14 +907,14 @@ After you have ensured this site is distributing it's content legally, you can f
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor

View File

@@ -0,0 +1,19 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
class LazyLoadExtractor(object):
_module = None
@classmethod
def ie_key(cls):
return cls.__name__[:-2]
def __new__(cls, *args, **kwargs):
mod = __import__(cls._module, fromlist=(cls.__name__,))
real_cls = getattr(mod, cls.__name__)
instance = real_cls.__new__(real_cls)
instance.__init__(*args, **kwargs)
return instance

View File

@@ -0,0 +1,63 @@
from __future__ import unicode_literals, print_function
from inspect import getsource
import os
from os.path import dirname as dirn
import sys
print('WARNING: Lazy loading extractors is an experimental feature that may not always work', file=sys.stderr)
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
lazy_extractors_filename = sys.argv[1]
if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename)
from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read()
module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)]
ie_template = '''
class {name}(LazyLoadExtractor):
_VALID_URL = {valid_url!r}
_module = '{module}'
'''
make_valid_template = '''
@classmethod
def _make_valid_url(cls):
return {valid_url!r}
'''
def build_lazy_ie(ie, name):
valid_url = getattr(ie, '_VALID_URL', None)
s = ie_template.format(
name=name,
valid_url=valid_url,
module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
s += '\n' + getsource(ie.suitable)
if hasattr(ie, '_make_valid_url'):
# search extractors
s += make_valid_template.format(valid_url=ie._make_valid_url())
return s
names = []
for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]:
name = ie.ie_key() + 'IE'
src = build_lazy_ie(ie, name)
module_contents.append(src)
names.append(name)
module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names)))
module_src = '\n'.join(module_contents) + '\n'
with open(lazy_extractors_filename, 'wt') as f:
f.write(module_src)

View File

@@ -46,7 +46,7 @@ fi
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md ISSUE_TEMPLATE.md supportedsites
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version"

View File

@@ -50,6 +50,7 @@
- **arte.tv:ddc**
- **arte.tv:embed**
- **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine**
- **AtresPlayer**
- **ATTTechChannel**
@@ -115,6 +116,7 @@
- **Cinemassacre**
- **Clipfish**
- **cliphunter**
- **ClipRs**
- **Clipsyndicate**
- **cloudtime**: CloudTime
- **Cloudy**
@@ -161,6 +163,7 @@
- **defense.gouv.fr**
- **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
- **Dotsub**
@@ -172,7 +175,6 @@
- **Dropbox**
- **DrTuber**
- **DRTV**
- **Dump**
- **Dumpert**
- **dvtv**: http://video.aktualne.cz/
- **dw**
@@ -286,7 +288,6 @@
- **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV
- **Izlesene**
- **JadoreCettePub**
- **JeuxVideo**
- **Jove**
- **jpopsuki.tv**
@@ -337,26 +338,28 @@
- **mailru**: Видео@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **Malemotion**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
- **metacafe**
- **Metacritic**
- **Mgoon**
- **MGTV**: 芒果TV
- **Minhateca**
- **MinistryGrid**
- **Minoto**
- **miomio.tv**
- **MiTele**: mitele.es
- **mixcloud**
- **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user**
- **MLB**
- **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
- **Moniker**: allmyvideos.net and vidspot.net
- **mooshare**: Mooshare.biz
- **Morningstar**: morningstar.com
- **Motherless**
- **Motorsport**: motorsport.com
@@ -371,8 +374,8 @@
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
- **muzu.tv**
- **Mwave**
- **MwaveMeetGreet**
- **MySpace**
- **MySpace:album**
- **MySpass**
@@ -393,7 +396,6 @@
- **ndr:embed:base**
- **NDTV**
- **NerdCubedFeed**
- **Nerdist**
- **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV
@@ -411,7 +413,8 @@
- **nfl.com**
- **nhl.com**
- **nhl.com:news**: NHL news
- **nhl.com:videocenter**: NHL videocenter category
- **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com**
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
@@ -459,13 +462,13 @@
- **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **People**
- **Periscope**: Periscope
- **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de**
- **Photobucket**
- **Pinkbike**
- **Pladform**
- **PlanetaPlay**
- **play.fm**
- **played.to**
- **PlaysTV**
@@ -484,6 +487,7 @@
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
- **PressTV**
- **PrimeShareTV**
- **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital
@@ -494,7 +498,6 @@
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuickVid**
- **R7**
- **radio.de**
- **radiobremen**
@@ -550,7 +553,6 @@
- **SenateISVP**
- **ServingSys**
- **Sexu**
- **SexyKarma**: Sexy Karma and Watch Indian Porn
- **Shahid**
- **Shared**: shared.sx and vivo.sx
- **ShareSix**
@@ -563,8 +565,6 @@
- **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos
- **SnagFilms**
- **SnagFilmsEmbed**
- **Snotr**
- **Sohu**
- **soundcloud**
@@ -606,8 +606,10 @@
- **Syfy**
- **SztvHu**
- **Tagesschau**
- **tagesschau:player**
- **Tapely**
- **Tass**
- **TDSLifeway**
- **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel**
@@ -624,7 +626,6 @@
- **TeleTask**
- **TF1**
- **TheIntercept**
- **TheOnion**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
@@ -683,7 +684,6 @@
- **twitter**
- **twitter:amplify**
- **twitter:card**
- **Ubu**
- **udemy**
- **udemy:course**
- **UDNEmbed**: 聯合影音
@@ -722,6 +722,8 @@
- **Vidzi**
- **vier**
- **vier:videos**
- **ViewLift**
- **ViewLiftEmbed**
- **Viewster**
- **Viidea**
- **viki**
@@ -753,7 +755,7 @@
- **Walla**
- **WashingtonPost**
- **wat.tv**
- **WayOfTheMaster**
- **WatchIndianPorn**: Watch Indian Porn
- **WDR**
- **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus
@@ -773,6 +775,10 @@
- **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me
- **XHamster**
- **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
- **xiami:song**: 虾米音乐
- **XMinus**
- **XNXX**
- **Xstream**

View File

@@ -2,5 +2,5 @@
universal = True
[flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/make_issue_template.py,setup.py,build,.git
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
ignore = E402,E501,E731

View File

@@ -8,11 +8,12 @@ import warnings
import sys
try:
from setuptools import setup
from setuptools import setup, Command
setuptools_available = True
except ImportError:
from distutils.core import setup
from distutils.core import setup, Command
setuptools_available = False
from distutils.spawn import spawn
try:
# This will create an exe that needs Microsoft Visual C++ 2008
@@ -70,6 +71,22 @@ else:
else:
params['scripts'] = ['bin/youtube-dl']
class build_lazy_extractors(Command):
description = "Build the extractor lazy loading module"
user_options = []
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
spawn(
[sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'],
dry_run=self.dry_run,
)
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
@@ -107,5 +124,6 @@ setup(
"Programming Language :: Python :: 3.4",
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},
**params
)

View File

@@ -143,6 +143,9 @@ def expect_value(self, got, expected, field):
expect_value(self, item_got, item_expected, field)
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
self.assertTrue(
isinstance(got, compat_str),
'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
got = 'md5:' + md5(got)
elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
self.assertTrue(

View File

@@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError
class TestIE(InfoExtractor):
@@ -66,5 +67,14 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6')
def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
uri = encode_data_uri(b'callback({"foo": "blah"})', 'application/javascript')
self.assertEqual(self.ie._download_json(uri, None, transform_source=strip_jsonp), {'foo': 'blah'})
uri = encode_data_uri(b'{"foo": invalid}', 'application/json')
self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
if __name__ == '__main__':
unittest.main()

View File

@@ -76,6 +76,10 @@ class TestCompat(unittest.TestCase):
self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', b'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', b'def')]), 'abc=def')
def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])

View File

@@ -20,6 +20,7 @@ from youtube_dl.utils import (
args_to_str,
encode_base_n,
clean_html,
date_from_str,
DateRange,
detect_exe_version,
determine_ext,
@@ -234,6 +235,13 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
def test_date_from_str(self):
self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
self.assertEqual(date_from_str('now+7day'), date_from_str('now+1week'))
self.assertEqual(date_from_str('now+14day'), date_from_str('now+2week'))
self.assertEqual(date_from_str('now+365day'), date_from_str('now+1year'))
self.assertEqual(date_from_str('now+30day'), date_from_str('now+1month'))
def test_daterange(self):
_20century = DateRange("19000101", "20000101")
self.assertFalse("17890714" in _20century)
@@ -405,6 +413,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('01:02:03:04'), 93784)
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
self.assertEqual(parse_duration('87 Min.'), 5220)
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
def test_fix_xml_ampersands(self):
self.assertEqual(

View File

@@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase):
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries']
self.assertTrue(len(entries) >= 20)
self.assertTrue(len(entries) >= 50)
original_video = entries[0]
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')

View File

@@ -82,7 +82,7 @@ from .utils import (
YoutubeDLHandler,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractors
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version
from .postprocessor import (
@@ -260,7 +260,9 @@ class YoutubeDL(object):
The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
None or unset for standard (built-in) downloader.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see youtube_dl/downloader/common.py):
@@ -378,8 +380,9 @@ class YoutubeDL(object):
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
if not isinstance(ie, type):
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
@@ -397,7 +400,7 @@ class YoutubeDL(object):
"""
Add the InfoExtractors returned by gen_extractors to the end of the list
"""
for ie in gen_extractors():
for ie in gen_extractor_classes():
self.add_info_extractor(ie)
def add_post_processor(self, pp):
@@ -661,6 +664,7 @@ class YoutubeDL(object):
if not ie.suitable(url):
continue
ie = self.get_info_extractor(ie.ie_key())
if not ie.working():
self.report_warning('The program functionality for this site has been marked as broken, '
'and will probably not work.')
@@ -1240,7 +1244,10 @@ class YoutubeDL(object):
self.list_thumbnails(info_dict)
return
if thumbnails and 'thumbnail' not in info_dict:
thumbnail = info_dict.get('thumbnail')
if thumbnail:
info_dict['thumbnail'] = sanitize_url(thumbnail)
elif thumbnails:
info_dict['thumbnail'] = thumbnails[-1]['url']
if 'display_id' not in info_dict and 'id' in info_dict:
@@ -1954,6 +1961,8 @@ class YoutubeDL(object):
write_string(encoding_str, encoding=None)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
if _LAZY_LOADER:
self._write_string('[debug] Lazy loading extractors enabled' + '\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],

View File

@@ -181,7 +181,8 @@ except ImportError: # Python 2
if isinstance(e, dict):
e = encode_dict(e)
elif isinstance(e, (list, tuple,)):
e = encode_list(e)
list_e = encode_list(e)
e = tuple(list_e) if isinstance(e, tuple) else list_e
elif isinstance(e, compat_str):
e = e.encode(encoding)
return e

View File

@@ -41,9 +41,12 @@ def get_suitable_downloader(info_dict, params={}):
if ed.can_download(info_dict):
return ed
if protocol == 'm3u8' and params.get('hls_prefer_native'):
if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
return HlsFD
if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False:
return FFmpegFD
return PROTOCOL_MAP.get(protocol, HttpFD)

View File

@@ -225,7 +225,7 @@ class FFmpegFD(ExternalFD):
args += ['-i', url, '-c', 'copy']
if protocol == 'm3u8':
if self.params.get('hls_use_mpegts', False):
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']

View File

@@ -27,6 +27,8 @@ class RtspFD(FileDownloader):
self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
return False
self._debug_cmd(args)
retval = subprocess.call(args)
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))

File diff suppressed because it is too large Load Diff

View File

@@ -2,10 +2,14 @@
from __future__ import unicode_literals
import re
import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
from ..utils import (
int_or_none,
OnDemandPagedList,
)
class ACastIE(InfoExtractor):
@@ -26,13 +30,8 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
embed_page = self._download_webpage(
re.sub('(?:www\.)?acast\.com', 'embedcdn.acast.com', url), display_id)
cast_data = self._parse_json(self._search_regex(
r'window\[\'acast/queries\'\]\s*=\s*([^;]+);', embed_page, 'acast data'),
display_id)['GetAcast/%s/%s' % (channel, display_id)]
cast_data = self._download_json(
'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id)
return {
'id': compat_str(cast_data['id']),
'display_id': display_id,
@@ -58,15 +57,26 @@ class ACastChannelIE(InfoExtractor):
'playlist_mincount': 20,
}
_API_BASE_URL = 'https://www.acast.com/api/'
_PAGE_SIZE = 10
@classmethod
def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
channel_data = self._download_json(self._API_BASE_URL + 'channels/%s' % display_id, display_id)
casts = self._download_json(self._API_BASE_URL + 'channels/%s/acasts' % display_id, display_id)
entries = [self.url_result('https://www.acast.com/%s/%s' % (display_id, cast['url']), 'ACast') for cast in casts]
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://www.acast.com/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
return self.playlist_result(entries, compat_str(channel_data['id']), channel_data['name'], channel_data.get('description'))
def _real_extract(self, url):
channel_slug = self._match_id(url)
channel_data = self._download_json(
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug)
entries = OnDemandPagedList(functools.partial(
self._fetch_page, channel_slug), self._PAGE_SIZE)
return self.playlist_result(entries, compat_str(
channel_data['id']), channel_data['name'], channel_data.get('description'))

View File

@@ -1,26 +1,107 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/.*-)(?P<id>[^/?-]+)'
_TESTS = [{
# video with 5min ID
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
'md5': '18ef68f48740e86ae94b98da815eec42',
'info_dict': {
'id': '518167793',
'ext': 'mp4',
'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
'description': 'A major phone scam has cost thousands of taxpayers more than $1 million, with less than a month until income tax returns are due to the IRS.',
'timestamp': 1395405060,
'upload_date': '20140321',
'uploader': 'Newsy Studio',
},
'add_ie': ['FiveMin'],
'params': {
# m3u8 download
'skip_download': True,
}
}, {
# video with vidible ID
'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
'info_dict': {
'id': '5707d6b8e4b090497b04f706',
'ext': 'mp4',
'title': 'Netflix is Raising Rates',
'description': 'Netflix is rewarding millions of its long-standing members with an increase in cost. Veuers Carly Figueroa has more.',
'upload_date': '20160408',
'timestamp': 1460123280,
'uploader': 'Veuer',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944',
'only_matching': True,
}, {
'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id)
response = self._download_json(
'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,
video_id)['response']
if response['statusText'] != 'Ok':
raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusText']), expected=True)
video_data = response['data']
formats = []
m3u8_url = video_data.get('videoMasterPlaylist')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
for rendition in video_data.get('renditions', []):
video_url = rendition.get('url')
if not video_url:
continue
ext = rendition.get('format')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
f = {
'url': video_url,
'format_id': rendition.get('quality'),
}
mobj = re.search(r'(\d+)x(\d+)', video_url)
if mobj:
f.update({
'width': int(mobj.group(1)),
'height': int(mobj.group(2)),
})
formats.append(f)
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': video_data['title'],
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('publishDate')),
'view_count': int_or_none(video_data.get('views')),
'description': video_data.get('description'),
'uploader': video_data.get('videoOwner'),
'formats': formats,
}
class AolFeaturesIE(InfoExtractor):

View File

@@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor):
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'srt',
'ext': 'ttml',
'url': subtitle_url,
}]

View File

@@ -63,7 +63,7 @@ class ArteTvIE(InfoExtractor):
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&+])'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)'
@classmethod
def _extract_url_info(cls, url):
@@ -210,7 +210,7 @@ class ArteTVPlus7IE(InfoExtractor):
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
@@ -229,9 +229,27 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805',
}
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
}]
class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
'ext': 'mp4',
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
@@ -337,7 +355,7 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
http://www\.arte\.tv
/playerv2/embed\.php\?json_url=
/(?:playerv2/embed|arte_vp/index)\.php\?json_url=
(?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*

View File

@@ -30,14 +30,14 @@ class AudiomackIE(InfoExtractor):
# audiomack wrapper around soundcloud song
{
'add_ie': ['Soundcloud'],
'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare',
'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
'info_dict': {
'id': '172419696',
'id': '258901379',
'ext': 'mp3',
'description': 'md5:1fc3272ed7a635cce5be1568c2822997',
'title': 'Young Thug ft Lil Wayne - Take Kare',
'uploader': 'Young Thug World',
'upload_date': '20141016',
'description': 'mamba day freestyle for the legend Kobe Bryant ',
'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
'uploader': 'ILOVEMAKONNEN',
'upload_date': '20160414',
}
},
]

View File

@@ -671,6 +671,7 @@ class BBCIE(BBCCoUkIE):
'info_dict': {
'id': '34475836',
'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
},
'playlist_count': 3,
}, {

View File

@@ -33,8 +33,33 @@ class BeegIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
cpl_url = self._search_regex(
r'<script[^>]+src=(["\'])(?P<url>(?:https?:)?//static\.beeg\.com/cpl/\d+\.js.*?)\1',
webpage, 'cpl', default=None, group='url')
beeg_version, beeg_salt = [None] * 2
if cpl_url:
cpl = self._download_webpage(
self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False)
if cpl:
beeg_version = self._search_regex(
r'beeg_version\s*=\s*(\d+)', cpl,
'beeg version', default=None) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg beeg_salt',
default=None, group='beeg_salt')
beeg_version = beeg_version or '1750'
beeg_salt = beeg_salt or 'MIDtGaw96f0N1kMMAM1DE46EC9pmFr'
video = self._download_json(
'https://api.beeg.com/api/v6/1738/video/%s' % video_id, video_id)
'http://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
video_id)
def split(o, e):
def cut(s, x):
@@ -51,7 +76,7 @@ class BeegIE(InfoExtractor):
def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1738.js
a = 'GUuyodcfS8FW8gQp4OKLMsZBcX0T7B'
a = beeg_salt
e = compat_urllib_parse_unquote(key)
o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
@@ -101,5 +126,5 @@ class BeegIE(InfoExtractor):
'duration': duration,
'tags': tags,
'formats': formats,
'age_limit': 18,
'age_limit': self._rta_search(webpage),
}

View File

@@ -340,7 +340,7 @@ class BrightcoveLegacyIE(InfoExtractor):
ext = 'flv'
if ext is None:
ext = determine_ext(url)
tbr = int_or_none(rend.get('encodingRate'), 1000),
tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,

View File

@@ -33,6 +33,7 @@ class CBCIE(InfoExtractor):
'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101',
'uploader': 'CBCC-NEW',
},
'params': {
# rtmp download

View File

@@ -5,7 +5,6 @@ from ..utils import (
xpath_text,
xpath_element,
int_or_none,
ExtractorError,
find_xpath_attr,
)
@@ -64,7 +63,7 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true'
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url):
display_id = self._match_id(url)
@@ -84,11 +83,11 @@ class CBSIE(CBSBaseIE):
pid = xpath_text(item, 'pid')
if not pid:
continue
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid)
except ExtractorError:
continue
tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
tp_release_url += '&manifest=m3u'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)

View File

@@ -1,13 +1,9 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
qualities,
unified_strdate,
parse_iso8601,
)
@@ -19,14 +15,14 @@ class CCCIE(InfoExtractor):
'url': 'https://media.ccc.de/v/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor#video',
'md5': '3a1eda8f3a29515d27f5adb967d7e740',
'info_dict': {
'id': '30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor',
'id': '1839',
'ext': 'mp4',
'title': 'Introduction to Processor Design',
'description': 'md5:80be298773966f66d56cb11260b879af',
'description': 'md5:df55f6d073d4ceae55aae6f2fd98a0ac',
'thumbnail': 're:^https?://.*\.jpg$',
'view_count': int,
'upload_date': '20131228',
'duration': 3660,
'timestamp': 1388188800,
'duration': 3710,
}
}, {
'url': 'https://media.ccc.de/v/32c3-7368-shopshifting#download',
@@ -34,79 +30,48 @@ class CCCIE(InfoExtractor):
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
event_id = self._search_regex("data-id='(\d+)'", webpage, 'event id')
event_data = self._download_json('https://media.ccc.de/public/events/%s' % event_id, event_id)
if self._downloader.params.get('prefer_free_formats'):
preference = qualities(['mp3', 'opus', 'mp4-lq', 'webm-lq', 'h264-sd', 'mp4-sd', 'webm-sd', 'mp4', 'webm', 'mp4-hd', 'h264-hd', 'webm-hd'])
else:
preference = qualities(['opus', 'mp3', 'webm-lq', 'mp4-lq', 'webm-sd', 'h264-sd', 'mp4-sd', 'webm', 'mp4', 'webm-hd', 'mp4-hd', 'h264-hd'])
title = self._html_search_regex(
r'(?s)<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r'(?s)<h3>About</h3>(.+?)<h3>',
webpage, 'description', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",
webpage, 'upload date', fatal=False))
view_count = int_or_none(self._html_search_regex(
r"(?s)<span class='[^']*fa-eye'></span>(.*?)</li>",
webpage, 'view count', fatal=False))
duration = parse_duration(self._html_search_regex(
r'(?s)<span[^>]+class=(["\']).*?fa-clock-o.*?\1[^>]*></span>(?P<duration>.+?)</li',
webpage, 'duration', fatal=False, group='duration'))
matches = re.finditer(r'''(?xs)
<(?:span|div)\s+class='label\s+filetype'>(?P<format>[^<]*)</(?:span|div)>\s*
<(?:span|div)\s+class='label\s+filetype'>(?P<lang>[^<]*)</(?:span|div)>\s*
<a\s+download\s+href='(?P<http_url>[^']+)'>\s*
(?:
.*?
<a\s+(?:download\s+)?href='(?P<torrent_url>[^']+\.torrent)'
)?''', webpage)
formats = []
for m in matches:
format = m.group('format')
format_id = self._search_regex(
r'.*/([a-z0-9_-]+)/[^/]*$',
m.group('http_url'), 'format id', default=None)
if format_id:
format_id = m.group('lang') + '-' + format_id
vcodec = 'h264' if 'h264' in format_id else (
'none' if format_id in ('mp3', 'opus') else None
for recording in event_data.get('recordings', []):
recording_url = recording.get('recording_url')
if not recording_url:
continue
language = recording.get('language')
folder = recording.get('folder')
format_id = None
if language:
format_id = language
if folder:
if language:
format_id += '-' + folder
else:
format_id = folder
vcodec = 'h264' if 'h264' in folder else (
'none' if folder in ('mp3', 'opus') else None
)
formats.append({
'format_id': format_id,
'format': format,
'language': m.group('lang'),
'url': m.group('http_url'),
'url': recording_url,
'width': int_or_none(recording.get('width')),
'height': int_or_none(recording.get('height')),
'filesize': int_or_none(recording.get('size'), invscale=1024 * 1024),
'language': language,
'vcodec': vcodec,
'preference': preference(format_id),
})
if m.group('torrent_url'):
formats.append({
'format_id': 'torrent-%s' % (format if format_id is None else format_id),
'format': '%s (torrent)' % format,
'proto': 'torrent',
'format_note': '(unsupported; will just download the .torrent file)',
'vcodec': vcodec,
'preference': -100 + preference(format_id),
'url': m.group('torrent_url'),
})
self._sort_formats(formats)
thumbnail = self._html_search_regex(
r"<video.*?poster='([^']+)'", webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'upload_date': upload_date,
'duration': duration,
'id': event_id,
'display_id': display_id,
'title': event_data['title'],
'description': event_data.get('description'),
'thumbnail': event_data.get('thumb_url'),
'timestamp': parse_iso8601(event_data.get('date')),
'duration': int_or_none(event_data.get('length')),
'tags': event_data.get('tags'),
'formats': formats,
}

View File

@@ -0,0 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
class ClipRsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
'md5': 'c412d57815ba07b56f9edc7b5d6a14e5',
'info_dict': {
'id': '1488842.1399140381',
'ext': 'mp4',
'title': 'PREMIJERA Frajle predstavljaju novi spot za pesmu Moli me, moli',
'description': 'md5:56ce2c3b4ab31c5a2e0b17cb9a453026',
'duration': 229,
'timestamp': 1459850243,
'upload_date': '20160405',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
error = response.get('error')
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -19,7 +19,7 @@ from ..utils import (
class CloudyIE(InfoExtractor):
_IE_DESC = 'cloudy.ec and videoraj.ch'
_VALID_URL = r'''(?x)
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.ch)/
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.(?:ch|to))/
(?:v/|embed\.php\?id=)
(?P<id>[A-Za-z0-9]+)
'''
@@ -37,7 +37,7 @@ class CloudyIE(InfoExtractor):
}
},
{
'url': 'http://www.videoraj.ch/v/47f399fd8bb60',
'url': 'http://www.videoraj.to/v/47f399fd8bb60',
'md5': '7d0f8799d91efd4eda26587421c3c3b0',
'info_dict': {
'id': '47f399fd8bb60',

View File

@@ -232,6 +232,24 @@ class InfoExtractor(object):
episode_number: Number of the video episode within a season, as an integer.
episode_id: Id of the video episode, as a unicode string.
The following fields should only be used when the media is a track or a part of
a music album:
track: Title of the track.
track_number: Number of the track within an album or a disc, as an integer.
track_id: Id of the track (useful in case of custom indexing, e.g. 6.iii),
as a unicode string.
artist: Artist(s) of the track.
genre: Genre(s) of the track.
album: Title of the album the track belongs to.
album_type: Type of the album (e.g. "Demo", "Full-length", "Split", "Compilation", etc).
album_artist: List of all artists appeared on the album (e.g.
"Ash Borer / Fell Voices" or "Various Artists", useful for splits
and compilations).
disc_number: Number of the disc or other physical medium the track belongs to,
as an integer.
release_year: Year (YYYY) when the album was released.
Unless mentioned otherwise, the fields should be Unicode strings.
Unless mentioned otherwise, None is equivalent to absence of information.
@@ -358,14 +376,13 @@ class InfoExtractor(object):
self.to_screen('%s' % (note,))
else:
self.to_screen('%s: %s' % (video_id, note))
# data, headers and query params will be ignored for `Request` objects
if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query)
else:
if query:
url_or_request = update_url_query(url_or_request, query)
if data or headers:
if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers)
try:
return self._downloader.urlopen(url_or_request)
@@ -825,7 +842,7 @@ class InfoExtractor(object):
for input in re.findall(r'(?i)<input([^>]+)>', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input):
continue
name = re.search(r'name=(["\'])(?P<value>.+?)\1', input)
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input)
if not name:
continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input)
@@ -989,6 +1006,13 @@ class InfoExtractor(object):
def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True):
# currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0')
if akamai_pv is not None and ';' in akamai_pv.text:
playerVerificationChallenge = akamai_pv.text.split(';')[0]
if playerVerificationChallenge.strip() != '':
return []
formats = []
manifest_version = '1.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media')
@@ -1037,7 +1061,7 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
fatal=True):
fatal=True, live=False):
formats = [{
'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
@@ -1115,7 +1139,11 @@ class InfoExtractor(object):
if m3u8_id:
format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None
format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
# Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided
# format_id intact.
if not live:
format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
f = {
'format_id': '-'.join(format_id),
'url': format_url(line.strip()),
@@ -1516,7 +1544,7 @@ class InfoExtractor(object):
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%(\d+)\$', r'%(\1)\2d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%([^$]+)\$', r'%(\1)\2', media_template)
media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [
media_template % {

View File

@@ -11,7 +11,6 @@ from math import pow, sqrt, floor
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
@@ -27,6 +26,7 @@ from ..utils import (
unified_strdate,
urlencode_postdata,
xpath_text,
extract_attributes,
)
from ..aes import (
aes_cbc_decrypt,
@@ -306,28 +306,36 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage,
'video_uploader', fatal=False)
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = sanitized_Request(playerdata_url)
playerdata_req.data = urlencode_postdata({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, 'stream_id')
video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, 'thumbnail', fatal=False)
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
attrs = extract_attributes(a)
href = attrs.get('href')
if href and '/freetrial' in href:
continue
available_fmts.append(fmt)
if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
video_encode_ids = []
formats = []
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
% (video_id, stream_format, stream_quality),
compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata = self._download_xml(
streamdata_req, video_id,
note='Downloading media info for %s' % video_format)
stream_info = streamdata.find('./{default}preload/stream_info')
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_url = xpath_text(stream_info, './host')
video_play_path = xpath_text(stream_info, './file')
if not video_url or not video_play_path:
@@ -359,6 +367,14 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats)
metadata = self._download_xml(
'http://www.crunchyroll.com/xml', video_id,
note='Downloading media info', query={
'req': 'RpcApiVideoPlayer_GetMediaMetadata',
'media_id': video_id,
})
subtitles = self.extract_subtitles(video_id, webpage)
@@ -366,9 +382,12 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'id': video_id,
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'thumbnail': xpath_text(metadata, 'episode_image_url'),
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': xpath_text(metadata, 'series_title'),
'episode': xpath_text(metadata, 'episode_title'),
'episode_number': int_or_none(xpath_text(metadata, 'episode_number')),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -9,7 +9,7 @@ from ..utils import (
class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/shows/(?:[^/]+/){2}\?play=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/(?:shows/)?(?:[^/]+/){2}\?.*\bplay=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
'info_dict': {
@@ -48,6 +48,9 @@ class CWTVIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -17,37 +17,53 @@ class DemocracynowIE(InfoExtractor):
IE_NAME = 'democracynow'
_TESTS = [{
'url': 'http://www.democracynow.org/shows/2015/7/3',
'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
'md5': '3757c182d3d84da68f5c8f506c18c196',
'info_dict': {
'id': '2015-0703-001',
'ext': 'mp4',
'title': 'July 03, 2015 - Democracy Now!',
'description': 'A daily independent global news hour with Amy Goodman & Juan González "What to the Slave is 4th of July?": James Earl Jones Reads Frederick Douglass\u2019 Historic Speech : "This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag : "We Shall Overcome": Remembering Folk Icon, Activist Pete Seeger in His Own Words & Songs',
'title': 'Daily Show',
},
}, {
'url': 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree',
'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
'info_dict': {
'id': '2015-0703-001',
'ext': 'mp4',
'title': '"This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag',
'description': 'md5:4d2bc4f0d29f5553c2210a4bc7761a21',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
description = self._og_search_description(webpage)
json_data = self._parse_json(self._search_regex(
r'<script[^>]+type="text/json"[^>]*>\s*({[^>]+})', webpage, 'json'),
display_id)
video_id = None
title = json_data['title']
formats = []
default_lang = 'en'
video_id = None
for key in ('file', 'audio', 'video', 'high_res_video'):
media_url = json_data.get(key, '')
if not media_url:
continue
media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
formats.append({
'url': media_url,
'vcodec': 'none' if key == 'audio' else None,
})
self._sort_formats(formats)
default_lang = 'en'
subtitles = {}
def add_subtitle_item(lang, info_dict):
@@ -67,22 +83,13 @@ class DemocracynowIE(InfoExtractor):
'url': compat_urlparse.urljoin(url, subtitle_item['url']),
})
for key in ('file', 'audio', 'video'):
media_url = json_data.get(key, '')
if not media_url:
continue
media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
formats.append({
'url': media_url,
})
self._sort_formats(formats)
description = self._og_search_description(webpage, default=None)
return {
'id': video_id or display_id,
'title': json_data['title'],
'title': title,
'description': description,
'thumbnail': json_data.get('image'),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -12,39 +12,46 @@ class DFBIE(InfoExtractor):
_TEST = {
'url': 'http://tv.dfb.de/video/u-19-em-stimmen-zum-spiel-gegen-russland/11633/',
# The md5 is different each time
'md5': 'ac0f98a52a330f700b4b3034ad240649',
'info_dict': {
'id': '11633',
'display_id': 'u-19-em-stimmen-zum-spiel-gegen-russland',
'ext': 'flv',
'ext': 'mp4',
'title': 'U 19-EM: Stimmen zum Spiel gegen Russland',
'upload_date': '20150714',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
display_id, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
player_info = self._download_xml(
'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
display_id)
video_info = player_info.find('video')
stream_access_url = self._proto_relative_url(video_info.find('url').text.strip())
f4m_info = self._download_xml(
self._proto_relative_url(video_info.find('url').text.strip()), display_id)
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id)
formats = []
# see http://tv.dfb.de/player/js/ajax.js for the method to extract m3u8 formats
for sa_url in (stream_access_url, stream_access_url + '&area=&format=iphone'):
stream_access_info = self._download_xml(sa_url, display_id)
token_el = stream_access_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth']
if '.f4m' in manifest_url:
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.2.0',
display_id, f4m_id='hds', fatal=False))
else:
formats.extend(self._extract_m3u8_formats(
manifest_url, display_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': video_info.find('title').text,
'thumbnail': self._og_search_thumbnail(webpage),
'thumbnail': 'http://tv.dfb.de/images/%s_640x360.jpg' % video_id,
'upload_date': unified_strdate(video_info.find('time_date').text),
'formats': formats,
}

View File

@@ -33,6 +33,7 @@ class DiscoveryIE(InfoExtractor):
'duration': 156,
'timestamp': 1302032462,
'upload_date': '20110405',
'uploader_id': '103207',
},
'params': {
'skip_download': True, # requires ffmpeg
@@ -54,7 +55,11 @@ class DiscoveryIE(InfoExtractor):
'upload_date': '20140725',
'timestamp': 1406246400,
'duration': 116,
'uploader_id': '103207',
},
'params': {
'skip_download': True, # requires ffmpeg
}
}]
def _real_extract(self, url):
@@ -66,13 +71,19 @@ class DiscoveryIE(InfoExtractor):
entries = []
for idx, video_info in enumerate(info['playlist']):
formats = self._extract_m3u8_formats(
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Download m3u8 information for video %d' % (idx + 1))
self._sort_formats(formats)
subtitles = []
caption_url = video_info.get('captionsUrl')
if caption_url:
subtitles = {
'en': [{
'url': caption_url,
}]
}
entries.append({
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/103207/default_default/index.html?videoId=ref:%s' % video_info['referenceId'],
'id': compat_str(video_info['id']),
'formats': formats,
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
@@ -80,6 +91,7 @@ class DiscoveryIE(InfoExtractor):
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
'subtitles': subtitles,
})
return self.playlist_result(entries, display_id, video_title)

View File

@@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
remove_end,
xpath_element,
xpath_text,
)
class DigitallySpeakingIE(InfoExtractor):
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_TESTS = [{
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '840376_BQRC',
'ext': 'mp4',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
}, {
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):
video_formats = []
video_root = None
mp4_video = xpath_text(metadata, './mp4video', default=None)
if mp4_video is not None:
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video)
video_root = mobj.group('root')
if video_root is None:
http_host = xpath_text(metadata, 'httpHost', default=None)
if http_host:
video_root = 'http://%s/' % http_host
if video_root is None:
# Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js
# Works for GPUTechConf, too
video_root = 'http://s3-2u.digitallyspeaking.com/'
formats = metadata.findall('./MBRVideos/MBRVideo')
if not formats:
return None
for a_format in formats:
stream_name = xpath_text(a_format, 'streamName', fatal=True)
video_path = re.match(r'mp4\:(?P<path>.*)', stream_name).group('path')
url = video_root + video_path
vbr = xpath_text(a_format, 'bitrate')
video_formats.append({
'url': url,
'vbr': int_or_none(vbr),
})
return video_formats
def _parse_flv(self, metadata):
formats = []
akamai_url = xpath_text(metadata, './akamaiHost', fatal=True)
audios = metadata.findall('./audios/audio')
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _real_extract(self, url):
video_id = self._match_id(url)
xml_description = self._download_xml(url, video_id)
metadata = xpath_element(xml_description, 'metadata')
video_formats = self._parse_mp4(metadata)
if video_formats is None:
video_formats = self._parse_flv(metadata)
return {
'id': video_id,
'formats': video_formats,
'title': xpath_text(metadata, 'title', fatal=True),
'duration': parse_duration(xpath_text(metadata, 'endTime')),
'creator': xpath_text(metadata, 'speaker'),
}

View File

@@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': 'iseven',
'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61',
'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅',
'uploader_id': '431925',
@@ -43,7 +43,7 @@ class DouyuTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'Romm not found',
'skip': 'Room not found',
}, {
'url': 'http://www.douyutv.com/17732',
'info_dict': {
@@ -51,7 +51,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': '17732',
'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61',
'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅',
'uploader_id': '431925',
@@ -75,13 +75,28 @@ class DouyuTVIE(InfoExtractor):
room_id = self._html_search_regex(
r'"room_id"\s*:\s*(\d+),', page, 'room id')
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
room_id, int(time.time()))
config = None
# Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache"
# Retry with different parameters - same parameters cause same errors
for i in range(5):
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
room_id, int(time.time()))
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
config = self._download_json(
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
video_id)
config_page = self._download_webpage(
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
video_id)
try:
config = self._parse_json(config_page, video_id, fatal=False)
except ExtractorError:
# Wait some time before retrying to get a different time() value
self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. '
'Waiting for %(timeout)s seconds before retrying')
continue
else:
break
if config is None:
raise ExtractorError('Unable to fetch API result')
data = config['data']

View File

@@ -6,13 +6,18 @@ import re
import time
from .common import InfoExtractor
from ..utils import int_or_none
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
update_url_query,
)
class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
# geo restricted, via direct unsigned hls URL
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
'info_dict': {
'id': '1255600',
@@ -31,11 +36,12 @@ class DPlayIE(InfoExtractor):
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': {
'id': '3172',
'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'flv',
'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'duration': 2650,
@@ -48,23 +54,25 @@ class DPlayIE(InfoExtractor):
'age_limit': 0,
},
}, {
# geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': {
'id': '70816',
'display_id': 'season-6-episode-12',
'ext': 'flv',
'ext': 'mp4',
'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
'duration': 2563,
'timestamp': 1429696800,
'upload_date': '20150422',
'creator': 'Kanal 4',
'creator': 'Kanal 4 (Home)',
'series': 'Mig og min mor',
'season_number': 6,
'episode_number': 12,
'age_limit': 0,
},
}, {
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True,
}]
@@ -90,17 +98,24 @@ class DPlayIE(InfoExtractor):
def extract_formats(protocol, manifest_url):
if protocol == 'hls':
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False))
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
# ourselves
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
for m3u8_format in m3u8_formats:
m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
formats.extend(m3u8_formats)
elif protocol == 'hds':
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
video_id, f4m_id=protocol, fatal=False))
domain_tld = domain.split('.')[-1]
if domain_tld in ('se', 'dk'):
if domain_tld in ('se', 'dk', 'no'):
for protocol in PROTOCOLS:
# Providing dsc-geo allows to bypass geo restriction in some cases
self._set_cookie(
'secure.dplay.%s' % domain_tld, 'dsc-geo',
json.dumps({
@@ -113,13 +128,24 @@ class DPlayIE(InfoExtractor):
'Downloading %s stream JSON' % protocol, fatal=False)
if stream and stream.get(protocol):
extract_formats(protocol, stream[protocol])
else:
# The last resort is to try direct unsigned hls/hds URLs from info dictionary.
# Sometimes this does work even when secure API with dsc-geo has failed (e.g.
# http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
if not formats:
for protocol in PROTOCOLS:
if info.get(protocol):
extract_formats(protocol, info[protocol])
self._sort_formats(formats)
subtitles = {}
for lang in ('se', 'sv', 'da', 'nl', 'no'):
for format_id in ('web_vtt', 'vtt', 'srt'):
subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
if subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
@@ -133,4 +159,5 @@ class DPlayIE(InfoExtractor):
'episode_number': int_or_none(info.get('episode')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -1,39 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class DumpIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
_TEST = {
'url': 'http://www.dump.com/oneus/',
'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
'info_dict': {
'id': 'oneus',
'ext': 'flv',
'title': "He's one of us.",
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
}

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
url_basename,
)
@@ -21,7 +23,7 @@ class EaglePlatformIE(InfoExtractor):
_TESTS = [{
# http://lenta.ru/news/2015/03/06/navalny/
'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
'md5': '70f5187fb620f2c1d503b3b22fd4efe3',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': {
'id': '227304',
'ext': 'mp4',
@@ -36,7 +38,7 @@ class EaglePlatformIE(InfoExtractor):
# http://muz-tv.ru/play/7129/
# http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true
'url': 'eagleplatform:media.clipyou.ru:12820',
'md5': '90b26344ba442c8e44aa4cf8f301164a',
'md5': '358597369cf8ba56675c1df15e7af624',
'info_dict': {
'id': '12820',
'ext': 'mp4',
@@ -55,8 +57,13 @@ class EaglePlatformIE(InfoExtractor):
raise ExtractorError(' '.join(response['errors']), expected=True)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
self._handle_error(response)
try:
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError):
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
self._handle_error(response)
raise
return response
def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'):
@@ -84,17 +91,33 @@ class EaglePlatformIE(InfoExtractor):
secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:')
formats = []
m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON')
formats = self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id,
'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
formats.extend(m3u8_formats)
mp4_url = self._get_video_url(
# Secure mp4 URL is constructed according to Player.prototype.mp4 from
# http://lentaru.media.eagleplatform.com/player/player.js
re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8),
video_id, 'Downloading mp4 JSON')
formats.append({'url': mp4_url, 'format_id': 'mp4'})
mp4_url_basename = url_basename(mp4_url)
for m3u8_format in m3u8_formats:
mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url'])
if mobj:
http_format = m3u8_format.copy()
video_url = mp4_url.replace(mp4_url_basename, mobj.group(1))
if not self._is_valid_url(video_url, video_id):
continue
http_format.update({
'url': video_url,
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(http_format)
self._sort_formats(formats)

View File

@@ -4,10 +4,10 @@ from .common import InfoExtractor
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?ebaumsworld\.com/videos/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.ebaumsworld.com/video/watch/83367677/',
'url': 'http://www.ebaumsworld.com/videos/a-giant-python-opens-the-door/83367677/',
'info_dict': {
'id': '83367677',
'ext': 'mp4',

File diff suppressed because it is too large Load Diff

View File

@@ -2,78 +2,133 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..compat import compat_xpath
from ..utils import (
int_or_none,
qualities,
unified_strdate,
xpath_attr,
xpath_element,
xpath_text,
xpath_with_ns,
)
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+p?(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390',
'md5': '777f525feeec4806130f4f764bc18a4f',
'info_dict': {
'id': '73390',
'ext': 'mp4',
'title': 'Олимпийские канатные дороги',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'duration': 149,
'like_count': int,
'dislike_count': int,
},
'skip': 'Only works from Russia',
}, {
# single format via video_materials.json API
'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
'md5': '82a2777648acae812d58b3f5bd42882b',
'info_dict': {
'id': '35930',
'ext': 'mp4',
'title': 'Наедине со всеми. Людмила Сенчина',
'description': 'md5:89553aed1d641416001fe8d450f06cb9',
'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
'description': 'md5:357933adeede13b202c7c21f91b871b2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20150212',
'duration': 2694,
},
'skip': 'Only works from Russia',
}, {
# multiple formats via video_materials.json API
'url': 'http://www.1tv.ru/video_archive/projects/dobroeutro/p113641',
'info_dict': {
'id': '113641',
'ext': 'mp4',
'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
'description': 'md5:8dcebb3dded0ff20fade39087fd1fee2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20160407',
'duration': 179,
'formats': 'mincount:3',
},
'params': {
'skip_download': True,
},
}, {
# single format only available via ONE_ONLINE_VIDEOS.archive_single_xml API
'url': 'http://www.1tv.ru/video_archive/series/f7552/p47038',
'md5': '519d306c5b5669761fd8906c39dbee23',
'info_dict': {
'id': '47038',
'ext': 'mp4',
'title': '"Побег". Второй сезон. 3 серия',
'description': 'md5:3abf8f6b9bce88201c33e9a3d794a00b',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20120516',
'duration': 3080,
},
}, {
'url': 'http://www.1tv.ru/videoarchive/9967',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, 'Downloading page')
# Videos with multiple formats only available via this API
video = self._download_json(
'http://www.1tv.ru/video_materials.json?legacy_id=%s' % video_id,
video_id, fatal=False)
video_url = self._html_search_regex(
r'''(?s)(?:jwplayer\('flashvideoportal_1'\)\.setup\({|var\s+playlistObj\s*=).*?'file'\s*:\s*'([^']+)'.*?}\);''',
webpage, 'video URL')
description, thumbnail, upload_date, duration = [None] * 4
title = self._html_search_regex(
[r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"], webpage, 'title')
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
if video:
item = video[0]
title = item['title']
quality = qualities(('ld', 'sd', 'hd', ))
formats = [{
'url': f['src'],
'format_id': f.get('name'),
'quality': quality(f.get('name')),
} for f in item['mbr'] if f.get('src')]
thumbnail = item.get('poster')
else:
# Some videos are not available via video_materials.json
video = self._download_xml(
'http://www.1tv.ru/owa/win/ONE_ONLINE_VIDEOS.archive_single_xml?pid=%s' % video_id,
video_id)
NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
item = xpath_element(video, './channel/item', fatal=True)
title = xpath_text(item, './title', fatal=True)
formats = [{
'url': content.attrib['url'],
} for content in item.findall(
compat_xpath(xpath_with_ns('./media:content', NS_MAP))) if content.attrib.get('url')]
thumbnail = xpath_attr(
item, xpath_with_ns('./media:thumbnail', NS_MAP), 'url')
self._sort_formats(formats)
webpage = self._download_webpage(url, video_id, 'Downloading page', fatal=False)
if webpage:
title = self._html_search_regex(
(r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"),
webpage, 'title', default=None) or title
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
duration = self._og_search_property(
'video:duration', webpage,
'video duration', fatal=False)
like_count = self._html_search_regex(
r'title="Понравилось".*?/></label> \[(\d+)\]',
webpage, 'like count', default=None)
dislike_count = self._html_search_regex(
r'title="Не понравилось".*?/></label> \[(\d+)\]',
webpage, 'dislike count', default=None)
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'video duration', fatal=False))
upload_date = unified_strdate(self._html_search_meta(
'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
return {
'id': video_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,
'description': description,
'upload_date': upload_date,
'duration': int_or_none(duration),
'like_count': int_or_none(like_count),
'dislike_count': int_or_none(dislike_count),
'formats': formats
}

View File

@@ -2,6 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse_unquote_plus,
)
from ..utils import (
clean_html,
determine_ext,
@@ -27,6 +31,7 @@ class FunimationIE(InfoExtractor):
'description': 'md5:1769f43cd5fc130ace8fd87232207892',
'thumbnail': 're:https?://.*\.jpg',
},
'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed',
}, {
'url': 'http://www.funimation.com/shows/hacksign/videos/official/role-play',
'info_dict': {
@@ -37,6 +42,7 @@ class FunimationIE(InfoExtractor):
'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
'thumbnail': 're:https?://.*\.jpg',
},
'skip': 'Access without user interaction is forbidden by CloudFlare',
}, {
'url': 'http://www.funimation.com/shows/attack-on-titan-junior-high/videos/promotional/broadcast-dub-preview',
'info_dict': {
@@ -47,8 +53,36 @@ class FunimationIE(InfoExtractor):
'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803',
'thumbnail': 're:https?://.*\.(?:jpg|png)',
},
'skip': 'Access without user interaction is forbidden by CloudFlare',
}]
_LOGIN_URL = 'http://www.funimation.com/login'
def _download_webpage(self, *args, **kwargs):
try:
return super(FunimationIE, self)._download_webpage(*args, **kwargs)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
response = ee.cause.read()
if b'>Please complete the security check to access<' in response:
raise ExtractorError(
'Access to funimation.com is blocked by CloudFlare. '
'Please browse to http://www.funimation.com/, solve '
'the reCAPTCHA, export browser cookies to a text file,'
' and then try again with --cookies YOUR_COOKIE_FILE.',
expected=True)
raise
def _extract_cloudflare_session_ua(self, url):
ci_session_cookie = self._get_cookies(url).get('ci_session')
if ci_session_cookie:
ci_session = compat_urllib_parse_unquote_plus(ci_session_cookie.value)
# ci_session is a string serialized by PHP function serialize()
# This case is simple enough to use regular expressions only
return self._search_regex(
r'"user_agent";s:\d+:"([^"]+)"', ci_session, 'user agent',
default=None)
def _login(self):
(username, password) = self._get_login_info()
if username is None:
@@ -57,8 +91,11 @@ class FunimationIE(InfoExtractor):
'email_field': username,
'password_field': password,
})
login_request = sanitized_Request('http://www.funimation.com/login', data, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0',
user_agent = self._extract_cloudflare_session_ua(self._LOGIN_URL)
if not user_agent:
user_agent = 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0'
login_request = sanitized_Request(self._LOGIN_URL, data, headers={
'User-Agent': user_agent,
'Content-Type': 'application/x-www-form-urlencoded'
})
login_page = self._download_webpage(
@@ -103,11 +140,16 @@ class FunimationIE(InfoExtractor):
('mobile', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'),
)
user_agent = self._extract_cloudflare_session_ua(url)
if user_agent:
USER_AGENTS = ((None, user_agent),)
for kind, user_agent in USER_AGENTS:
request = sanitized_Request(url)
request.add_header('User-Agent', user_agent)
webpage = self._download_webpage(
request, display_id, 'Downloading %s webpage' % kind)
request, display_id,
'Downloading %s webpage' % kind if kind else 'Downloading webpage')
playlist = self._parse_json(
self._search_regex(

View File

@@ -46,8 +46,8 @@ class FunnyOrDieIE(InfoExtractor):
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
m3u8_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8)\1',
webpage, 'm3u8 url', default=None, group='url')
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
webpage, 'm3u8 url', group='url')
formats = []

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class GazetaIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
_TESTS = [{
'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml',
'md5': 'd49c9bdc6e5a7888f27475dc215ee789',
@@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor):
'description': 'md5:38617526050bd17b234728e7f9620a71',
'thumbnail': 're:^https?://.*\.jpg',
},
'skip': 'video not found',
}, {
'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml',
'only_matching': True,
}, {
'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml',
'md5': '37f19f78355eb2f4256ee1688359f24c',
'info_dict': {
'id': '252048',
'ext': 'mp4',
'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"',
},
'add_ie': ['EaglePlatform'],
}]
def _real_extract(self, url):

View File

@@ -4,7 +4,6 @@ import re
from .common import InfoExtractor
from ..utils import (
remove_end,
HEADRequest,
sanitized_Request,
urlencode_postdata,
@@ -51,63 +50,33 @@ class GDCVaultIE(InfoExtractor):
{
'url': 'http://gdcvault.com/play/1020791/',
'only_matching': True,
}
},
{
# Hard-coded hostname
'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '1023460',
'ext': 'mp4',
'display_id': 'Tenacious-Design-and-The-Interface',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
},
{
# Multiple audios
'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC',
'info_dict': {
'id': '1014631',
'ext': 'flv',
'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man',
},
'params': {
'skip_download': True, # Requires rtmpdump
'format': 'jp', # The japanese audio
}
},
]
def _parse_mp4(self, xml_description):
video_formats = []
mp4_video = xml_description.find('./metadata/mp4video')
if mp4_video is None:
return None
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video.text)
video_root = mobj.group('root')
formats = xml_description.findall('./metadata/MBRVideos/MBRVideo')
for format in formats:
mobj = re.match(r'mp4\:(?P<path>.*)', format.find('streamName').text)
url = video_root + mobj.group('path')
vbr = format.find('bitrate').text
video_formats.append({
'url': url,
'vbr': int(vbr),
})
return video_formats
def _parse_flv(self, xml_description):
formats = []
akamai_url = xml_description.find('./metadata/akamaiHost').text
audios = xml_description.find('./metadata/audios')
if audios is not None:
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xml_description.find('./metadata/slideVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xml_description.find('./metadata/speakerVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info()
if username is None or password is None:
@@ -159,9 +128,10 @@ class GDCVaultIE(InfoExtractor):
'title': title,
}
PLAYER_REGEX = r'<iframe src="(?P<xml_root>.+?)/player.*?\.html.*?".*?</iframe>'
xml_root = self._html_search_regex(
r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>',
start_page, 'xml root', default=None)
PLAYER_REGEX, start_page, 'xml root', default=None)
if xml_root is None:
# Probably need to authenticate
login_res = self._login(webpage_url, display_id)
@@ -171,27 +141,21 @@ class GDCVaultIE(InfoExtractor):
start_page = login_res
# Grab the url from the authenticated page
xml_root = self._html_search_regex(
r'<iframe src="(.*?)player.html.*?".*?</iframe>',
start_page, 'xml root')
PLAYER_REGEX, start_page, 'xml root')
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml=(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename', default=None)
if xml_name is None:
# Fallback to the older format
xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename')
xml_description_url = xml_root + 'xml/' + xml_name
xml_description = self._download_xml(xml_description_url, display_id)
video_title = xml_description.find('./metadata/title').text
video_formats = self._parse_mp4(xml_description)
if video_formats is None:
video_formats = self._parse_flv(xml_description)
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename')
return {
'_type': 'url_transparent',
'id': video_id,
'display_id': display_id,
'title': video_title,
'formats': video_formats,
'url': '%s/xml/%s' % (xml_root, xml_name),
'ie_key': 'DigitallySpeaking',
}

View File

@@ -51,7 +51,7 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE
from .onionstudios import OnionStudiosIE
from .snagfilms import SnagFilmsEmbedIE
from .viewlift import ViewLiftEmbedIE
from .screenwavemedia import ScreenwaveMediaIE
from .mtv import MTVServicesEmbeddedIE
from .pladform import PladformIE
@@ -60,6 +60,7 @@ from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
class GenericIE(InfoExtractor):
@@ -104,7 +105,8 @@ class GenericIE(InfoExtractor):
'skip_download': True, # infinite live stream
},
'expected_warnings': [
r'501.*Not Implemented'
r'501.*Not Implemented',
r'400.*Bad Request',
],
},
# Direct link with incorrect MIME type
@@ -235,6 +237,7 @@ class GenericIE(InfoExtractor):
'ext': 'mp4',
'title': 'car-20120827-manifest',
'formats': 'mincount:9',
'upload_date': '20130904',
},
'params': {
'format': 'bestvideo',
@@ -594,7 +597,11 @@ class GenericIE(InfoExtractor):
'id': 'k2mm4bCdJ6CQ2i7c8o2',
'ext': 'mp4',
'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
'description': 'md5:faf028e48a461b8b7fad38f1e104b119',
'uploader': 'Spi0n',
'uploader_id': 'xgditw',
'upload_date': '20140425',
'timestamp': 1398441542,
},
'add_ie': ['Dailymotion'],
},
@@ -727,8 +734,11 @@ class GenericIE(InfoExtractor):
'id': 'uxjb0lwrcz',
'ext': 'mp4',
'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
'description': 'a Martin Fowler video from ThoughtWorks',
'duration': 1715.0,
'uploader': 'thoughtworks.wistia.com',
'upload_date': '20140603',
'timestamp': 1401832161,
},
},
# Soundcloud embed
@@ -877,6 +887,7 @@ class GenericIE(InfoExtractor):
# Eagle.Platform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': {
'id': '227304',
'ext': 'mp4',
@@ -891,6 +902,7 @@ class GenericIE(InfoExtractor):
# ClipYou (Eagle.Platform) embed (custom URL)
{
'url': 'http://muz-tv.ru/play/7129/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': {
'id': '12820',
'ext': 'mp4',
@@ -979,6 +991,9 @@ class GenericIE(InfoExtractor):
'ext': 'flv',
'title': "PFT Live: New leader in the 'new-look' defense",
'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e',
'uploader': 'NBCU-SPORTS',
'upload_date': '20140107',
'timestamp': 1389118457,
},
},
# UDN embed
@@ -1031,6 +1046,9 @@ class GenericIE(InfoExtractor):
'title': 'SN Presents: Russell Martin, World Citizen',
'description': 'To understand why he was the Toronto Blue Jays top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
'uploader': 'Rogers Sportsnet',
'uploader_id': '1704050871',
'upload_date': '20150525',
'timestamp': 1432570283,
},
},
# Dailymotion Cloud video
@@ -1122,12 +1140,39 @@ class GenericIE(InfoExtractor):
'title': 'The Cardinal Pell Interview',
'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
'uploader': 'GlobeCast Australia - GlobeStream',
'uploader_id': '2733773828001',
'upload_date': '20160304',
'timestamp': 1457083087,
},
'params': {
# m3u8 downloads
'skip_download': True,
},
},
# Another form of arte.tv embed
{
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
'md5': '850bfe45417ddf221288c88a0cffe2e2',
'info_dict': {
'id': '030273-562_PLUS7-F',
'ext': 'mp4',
'title': 'ARTE Reportage - Nulle part, en France',
'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
}
},
]
def report_following_redirect(self, new_url):
@@ -1702,7 +1747,7 @@ class GenericIE(InfoExtractor):
# Look for embedded arte.tv player
mobj = re.search(
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed')
@@ -1879,10 +1924,10 @@ class GenericIE(InfoExtractor):
if onionstudios_url:
return self.url_result(onionstudios_url)
# Look for SnagFilms embeds
snagfilms_url = SnagFilmsEmbedIE._extract_url(webpage)
if snagfilms_url:
return self.url_result(snagfilms_url)
# Look for ViewLift embeds
viewlift_url = ViewLiftEmbedIE._extract_url(webpage)
if viewlift_url:
return self.url_result(viewlift_url)
# Look for JWPlatform embeds
jwplatform_url = JWPlatformIE._extract_url(webpage)
@@ -1930,7 +1975,13 @@ class GenericIE(InfoExtractor):
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
return self.url_result(instagram_embed_url, InstagramIE.ie_key())
return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_url = LiveLeakIE._extract_url(webpage)
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
def check_video(vurl):
if YoutubeIE.suitable(vurl):
@@ -2013,6 +2064,7 @@ class GenericIE(InfoExtractor):
entries = []
for video_url in found:
video_url = unescapeHTML(video_url)
video_url = video_url.replace('\\/', '/')
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse_unquote(os.path.basename(video_url))

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
class GlideIE(InfoExtractor):
@@ -15,26 +16,38 @@ class GlideIE(InfoExtractor):
'ext': 'mp4',
'title': 'Damon Timm\'s Glide message',
'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
'uploader': 'Damon Timm',
'upload_date': '20140919',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>(.*?)</title>', webpage, 'title')
video_url = self.http_scheme() + self._search_regex(
r'<source src="(.*?)" type="video/mp4">', webpage, 'video URL')
thumbnail_url = self._search_regex(
r'<img id="video-thumbnail" src="(.*?)"',
webpage, 'thumbnail url', fatal=False)
thumbnail = (
thumbnail_url if thumbnail_url is None
else self.http_scheme() + thumbnail_url)
r'<title>(.+?)</title>', webpage, 'title')
video_url = self._proto_relative_url(self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'video URL', default=None,
group='url')) or self._og_search_video_url(webpage)
thumbnail = self._proto_relative_url(self._search_regex(
r'<img[^>]+id=["\']video-thumbnail["\'][^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'thumbnail url', default=None,
group='url')) or self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'<div[^>]+class=["\']info-name["\'][^>]*>([^<]+)',
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'<div[^>]+class="info-date"[^>]*>([^<]+)',
webpage, 'upload date', fatal=False))
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@@ -14,13 +14,13 @@ class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '027fcc54459dff0feb0bc06a7aeda680',
'md5': '4b6db9a0a333142eb9f15913142b0ed1',
'info_dict': {
'id': '299069',
'ext': 'flv',
'title': 'DIESEL SFW XXX Video',
'thumbnail': 're:^http://.*\.jpg$',
'duration': 79,
'duration': 80,
'age_limit': 18,
}
}
@@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'age_limit': self._family_friendly_search(webpage),
'age_limit': 18,
}

View File

@@ -2,12 +2,6 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_element,
xpath_text,
int_or_none,
parse_duration,
)
class GPUTechConfIE(InfoExtractor):
@@ -27,29 +21,15 @@ class GPUTechConfIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
root_path = self._search_regex(r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path', 'http://evt.dispeak.com/nvidia/events/gtc15/')
xml_file_id = self._search_regex(r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
doc = self._download_xml('%sxml/%s.xml' % (root_path, xml_file_id), video_id)
metadata = xpath_element(doc, 'metadata')
http_host = xpath_text(metadata, 'httpHost', 'http host', True)
mbr_videos = xpath_element(metadata, 'MBRVideos')
formats = []
for mbr_video in mbr_videos.findall('MBRVideo'):
stream_name = xpath_text(mbr_video, 'streamName')
if stream_name:
formats.append({
'url': 'http://%s/%s' % (http_host, stream_name.replace('mp4:', '')),
'tbr': int_or_none(xpath_text(mbr_video, 'bitrate')),
})
self._sort_formats(formats)
root_path = self._search_regex(
r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path',
default='http://evt.dispeak.com/nvidia/events/gtc15/')
xml_file_id = self._search_regex(
r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
return {
'_type': 'url_transparent',
'id': video_id,
'title': xpath_text(metadata, 'title'),
'duration': parse_duration(xpath_text(metadata, 'endTime')),
'creator': xpath_text(metadata, 'speaker'),
'formats': formats,
'url': '%sxml/%s.xml' % (root_path, xml_file_id),
'ie_key': 'DigitallySpeaking',
}

View File

@@ -16,14 +16,14 @@ class GrouponIE(InfoExtractor):
'playlist': [{
'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'mp4',
'ext': 'flv',
'title': 'Bikram Yoga Huntington Beach | Orange County',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 44.961,
},
}],
'params': {
'skip_download': 'HLS',
'skip_download': 'HDS',
}
}
@@ -32,7 +32,7 @@ class GrouponIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id)
payload = self._parse_json(self._search_regex(
r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
videos = payload['carousel'].get('dealVideos', [])
entries = []
for v in videos:

View File

@@ -24,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 161,
},
'skip': 'Video broken',
},
{
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',

View File

@@ -4,6 +4,7 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_duration,
unified_strdate,
)
@@ -29,7 +30,12 @@ class HuffPostIE(InfoExtractor):
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
'duration': 1549,
'upload_date': '20140124',
}
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['HTTP Error 404: Not Found'],
}
def _real_extract(self, url):
@@ -45,7 +51,7 @@ class HuffPostIE(InfoExtractor):
description = data.get('description')
thumbnails = []
for url in data['images'].values():
for url in filter(None, data['images'].values()):
m = re.match('.*-([0-9]+x[0-9]+)\.', url)
if not m:
continue
@@ -54,13 +60,25 @@ class HuffPostIE(InfoExtractor):
'resolution': m.group(1),
})
formats = [{
'format': key,
'format_id': key.replace('/', '.'),
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
} for key, url in data.get('sources', {}).get('live', {}).items()]
formats = []
sources = data.get('sources', {})
live_sources = list(sources.get('live', {}).items()) + list(sources.get('live_again', {}).items())
for key, url in live_sources:
ext = determine_ext(url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
url + '?hdcore=2.9.5', video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'format': key,
'format_id': key.replace('/', '.'),
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
})
if not formats and data.get('fivemin_id'):
return self.url_result('5min:%s' % data['fivemin_id'])

View File

@@ -12,7 +12,7 @@ from ..utils import (
class InstagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)'
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
_TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
@@ -38,10 +38,19 @@ class InstagramIE(InfoExtractor):
}, {
'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True,
}, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True,
}]
@staticmethod
def _extract_embed_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
webpage)
if mobj:
return mobj.group('url')
blockquote_el = get_element_by_attribute(
'class', 'instagram-media', webpage)
if blockquote_el is None:
@@ -53,7 +62,9 @@ class InstagramIE(InfoExtractor):
return mobj.group('link')
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = mobj.group('url')
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',

View File

@@ -1,93 +1,91 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urlparse,
compat_urllib_parse_urlencode,
)
from ..utils import (
xpath_with_ns,
determine_ext,
int_or_none,
xpath_text,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_VALID_URL = r'https?://video\.internetvideoarchive\.net/(?:player|flash/players)/.*?\?.*?publishedid.*?'
_TEST = {
'url': 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?customerid=69249&publishedid=194487&reporttag=vdbetatitle&playerid=641&autolist=0&domain=www.videodetective.com&maxrate=high&minrate=low&socialplayer=false',
'info_dict': {
'id': '452693',
'id': '194487',
'ext': 'mp4',
'title': 'SKYFALL',
'description': 'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
'duration': 152,
'title': 'KICK-ASS 2',
'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _build_json_url(query):
return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k, v[0]) for (k, v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse_urlencode(cleaned_dic)
def _build_xml_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
query_dic = compat_parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration = self._download_xml(url, video_id,
'Downloading flash configuration')
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info = self._download_xml(file_url, video_id,
'Downloading video info')
item = info.find('channel/item')
if '/player/' in url:
configuration = self._download_json(url, video_id)
def _bp(p):
return xpath_with_ns(
p,
{
'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats',
}
)
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
width = int(attr['width'])
bitrate = int(attr['bitrate'])
format_id = '%d-%dk' % (width, bitrate)
formats.append({
'format_id': format_id,
'url': f_url,
'width': width,
'tbr': bitrate,
})
# There are multiple videos in the playlist whlie only the first one
# matches the video played in browsers
video_info = configuration['playlist'][0]
self._sort_formats(formats)
formats = []
for source in video_info['sources']:
file_url = source['file']
if determine_ext(file_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, ext='mp4', m3u8_id='hls'))
else:
a_format = {
'url': file_url,
}
if source.get('label') and source['label'][-4:] == ' kbs':
tbr = int_or_none(source['label'][:-4])
a_format.update({
'tbr': tbr,
'format_id': 'http-%d' % tbr,
})
formats.append(a_format)
self._sort_formats(formats)
title = video_info['title']
description = video_info.get('description')
thumbnail = video_info.get('image')
else:
configuration = self._download_xml(url, video_id)
formats = [{
'url': xpath_text(configuration, './file', 'file URL', fatal=True),
}]
thumbnail = xpath_text(configuration, './image', 'thumbnail')
title = 'InternetVideoArchive video %s' % video_id
description = None
return {
'id': video_id,
'title': item.find('title').text,
'title': title,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
'thumbnail': thumbnail,
'description': description,
}

View File

@@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
IE_NAME = 'iqiyi'
IE_DESC = '爱奇艺'
_VALID_URL = r'https?://(?:[^.]+\.)?iqiyi\.com/.+\.html'
_VALID_URL = r'https?://(?:(?:[^.]+\.)?iqiyi\.com|www\.pps\.tv)/.+\.html'
_NETRC_MACHINE = 'iqiyi'
@@ -273,6 +273,9 @@ class IqiyiIE(InfoExtractor):
'title': '灌篮高手 国语版',
},
'playlist_count': 101,
}, {
'url': 'http://www.pps.tv/w_19rrbav0ph.html',
'only_matching': True,
}]
_FORMATS_MAP = [
@@ -284,6 +287,13 @@ class IqiyiIE(InfoExtractor):
('10', 'h1'),
]
AUTH_API_ERRORS = {
# No preview available (不允许试看鉴权失败)
'Q00505': 'This video requires a VIP account',
# End of preview time (试看结束鉴权失败)
'Q00506': 'Needs a VIP account for full video',
}
def _real_initialize(self):
self._login()
@@ -368,12 +378,19 @@ class IqiyiIE(InfoExtractor):
auth_req, video_id,
note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON')
if auth_result['code'] == 'Q00506': # requires a VIP account
if do_report_warning:
self.report_warning('Needs a VIP account for full video')
return False
return auth_result
code = auth_result.get('code')
msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
if code == 'Q00506':
if do_report_warning:
self.report_warning(msg)
return False
if 'data' not in auth_result:
if msg is not None:
raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
raise ExtractorError('Unexpected error from Iqiyi auth API')
return auth_result['data']
def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y):
@@ -449,11 +466,11 @@ class IqiyiIE(InfoExtractor):
need_vip_warning_report = False
break
param.update({
't': auth_result['data']['t'],
't': auth_result['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9',
'vid': video_id,
'QY00001': auth_result['data']['u'],
'QY00001': auth_result['u'],
})
api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse_urlencode(param)

View File

@@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
'ext': 'mp4',
'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
'description': 'md5:253753e2655dde93f59f74b572454f6d',
'thumbnail': 're:^http://.*\.jpg',
'thumbnail': 're:^https?://.*\.jpg',
'uploader_id': 'pelikzzle',
'timestamp': int,
'upload_date': '20140702',
@@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
'id': '17997',
'ext': 'mp4',
'title': 'Tarkan Dortmund 2006 Konseri',
'description': 'Tarkan Dortmund 2006 Konseri',
'thumbnail': 're:^http://.*\.jpg',
'thumbnail': 're:^https://.*\.jpg',
'uploader_id': 'parlayankiz',
'timestamp': int,
'upload_date': '20061112',
@@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:')

View File

@@ -1,47 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
class JadoreCettePubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_TEST = {
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
'md5': '401286a06067c70b44076044b66515de',
'info_dict': {
'id': 'jLMja3tr7a4',
'ext': 'mp4',
'title': 'La pire utilisation de Star Wars',
'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon. Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
webpage, 'title')
description = self._html_search_regex(
r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
fatal=False)
real_url = self._search_regex(
r'\[/postlink\](.*)endofvid', webpage, 'video URL')
video_id = YoutubeIE.extract_id(real_url)
return {
'_type': 'url_transparent',
'url': real_url,
'id': video_id,
'title': title,
'description': description,
}

View File

@@ -4,16 +4,15 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
float_or_none,
int_or_none,
)
class JWPlatformBaseIE(InfoExtractor):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
video_data = jwplayer_data['playlist'][0]
subtitles = {}
for track in video_data['tracks']:
if track['kind'] == 'captions':
subtitles[track['label']] = [{'url': self._proto_relative_url(track['file'])}]
formats = []
for source in video_data['sources']:
@@ -35,12 +34,22 @@ class JWPlatformBaseIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
return {
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -2,39 +2,63 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
js_to_json,
)
class KaraoketvIE(InfoExtractor):
_VALID_URL = r'https?://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
_VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://karaoketv.co.il/?container=songs&id=171568',
'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': {
'id': '171568',
'ext': 'mp4',
'title': 'אל העולם שלך - רותם כהן - שרים קריוקי',
'id': '58356',
'ext': 'flv',
'title': 'קריוקי של איזון',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
api_page_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
webpage, 'API play URL', group='url')
page_video_url = self._og_search_video_url(webpage, video_id)
config_json = compat_urllib_parse_unquote_plus(self._search_regex(
r'config=(.*)', page_video_url, 'configuration'))
api_page = self._download_webpage(api_page_url, video_id)
video_cdn_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
api_page, 'video cdn URL', group='url')
urls_info_json = self._download_json(
config_json, video_id, 'Downloading configuration',
transform_source=js_to_json)
video_cdn = self._download_webpage(video_cdn_url, video_id)
play_path = self._parse_json(
self._search_regex(
r'var\s+options\s*=\s*({.+?});', video_cdn, 'options'),
video_id)['clip']['url']
url = urls_info_json['playlist'][0]['url']
settings = self._parse_json(
self._search_regex(
r'var\s+settings\s*=\s*({.+?});', video_cdn, 'servers', default='{}'),
video_id, fatal=False) or {}
servers = settings.get('servers')
if not servers or not isinstance(servers, list):
servers = ('wowzail.video-cdn.com:80/vodcdn', )
formats = [{
'url': 'rtmp://%s' % server if not server.startswith('rtmp') else server,
'play_path': play_path,
'app': 'vodcdn',
'page_url': video_cdn_url,
'player_url': 'http://www.video-cdn.com/assets/flowplayer/flowplayer.commercial-3.2.18.swf',
'rtmp_real_time': True,
'ext': 'flv',
} for server in servers]
return {
'id': video_id,
'title': self._og_search_title(webpage),
'url': url,
'formats': formats,
}

View File

@@ -52,9 +52,12 @@ class KarriereVideosIE(InfoExtractor):
video_id = self._search_regex(
r'/config/video/(.+?)\.xml', webpage, 'video id')
# Server returns malformed headers
# Force Accept-Encoding: * to prevent gzipped results
playlist = self._download_xml(
'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
video_id, transform_source=fix_xml_ampersands)
video_id, transform_source=fix_xml_ampersands,
headers={'Accept-Encoding': '*'})
NS_MAP = {
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'

View File

@@ -81,7 +81,7 @@ class KuwoIE(KuwoBaseIE):
'id': '6446136',
'ext': 'mp3',
'title': '',
'description': 'md5:b2ab6295d014005bfc607525bfc1e38a',
'description': 'md5:5d0e947b242c35dc0eb1d2fce9fbf02c',
'creator': 'IU',
'upload_date': '20150518',
},
@@ -102,10 +102,10 @@ class KuwoIE(KuwoBaseIE):
raise ExtractorError('this song has been offline because of copyright issues', expected=True)
song_name = self._html_search_regex(
r'(?s)class="(?:[^"\s]+\s+)*title(?:\s+[^"\s]+)*".*?<h1[^>]+title="([^"]+)"', webpage, 'song name')
singer_name = self._html_search_regex(
r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"',
webpage, 'singer name', fatal=False)
r'<p[^>]+id="lrcName">([^<]+)</p>', webpage, 'song name')
singer_name = remove_start(self._html_search_regex(
r'<a[^>]+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">',
webpage, 'singer name', fatal=False), '歌手')
lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
if lrc_content == '暂无': # indicates no lyrics
lrc_content = None
@@ -114,7 +114,7 @@ class KuwoIE(KuwoBaseIE):
self._sort_formats(formats)
album_id = self._html_search_regex(
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
r'<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
webpage, 'album id', fatal=False)
publish_time = None
@@ -268,7 +268,7 @@ class KuwoCategoryIE(InfoExtractor):
'title': '八十年代精选',
'description': '这些都是属于八十年代的回忆!',
},
'playlist_count': 30,
'playlist_mincount': 24,
}
def _real_extract(self, url):

View File

@@ -63,6 +63,7 @@ class Laola1TvIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'This live stream has already finished.',
}]
def _real_extract(self, url):
@@ -74,6 +75,9 @@ class Laola1TvIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
iframe_url = self._search_regex(
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
webpage, 'iframe url')

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
determine_protocol,
parse_duration,
int_or_none,
)
@@ -18,10 +19,14 @@ class Lecture2GoIE(InfoExtractor):
'md5': 'ac02b570883020d208d405d5a3fd2f7f',
'info_dict': {
'id': '17473',
'ext': 'flv',
'ext': 'mp4',
'title': '2 - Endliche Automaten und reguläre Sprachen',
'creator': 'Frank Heitmann',
'duration': 5220,
},
'params': {
# m3u8 download
'skip_download': True,
}
}
@@ -32,14 +37,18 @@ class Lecture2GoIE(InfoExtractor):
title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
formats = []
for url in set(re.findall(r'"src","([^"]+)"', webpage)):
for url in set(re.findall(r'var\s+playerUri\d+\s*=\s*"([^"]+)"', webpage)):
ext = determine_ext(url)
protocol = determine_protocol({'url': url})
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(url, video_id))
formats.extend(self._extract_f4m_formats(url, video_id, f4m_id='hds'))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(url, video_id))
formats.extend(self._extract_m3u8_formats(url, video_id, ext='mp4', m3u8_id='hls'))
else:
if protocol == 'rtmp':
continue # XXX: currently broken
formats.append({
'format_id': protocol,
'url': url,
})

View File

@@ -53,6 +53,14 @@ class LiveLeakIE(InfoExtractor):
}
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
webpage)
if mobj:
return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)

View File

@@ -1,46 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
class MalemotionIE(InfoExtractor):
_VALID_URL = r'https?://malemotion\.com/video/(.+?)\.(?P<id>.+?)(#|$)'
_TEST = {
'url': 'http://malemotion.com/video/bete-de-concours.ltc',
'md5': '3013e53a0afbde2878bc39998c33e8a5',
'info_dict': {
'id': 'ltc',
'ext': 'mp4',
'title': 'Bête de Concours',
'age_limit': 18,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = compat_urllib_parse_unquote(self._search_regex(
r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title')
video_thumbnail = self._search_regex(
r'<video .+?poster="(.+?)"', webpage, 'thumbnail', fatal=False)
formats = [{
'url': video_url,
'ext': 'mp4',
'format_id': 'mp4',
'preference': 1,
}]
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': video_title,
'thumbnail': video_thumbnail,
'age_limit': 18,
}

View File

@@ -49,8 +49,8 @@ class MDRIE(InfoExtractor):
'ext': 'mp4',
'title': 'Beutolomäus und der geheime Weihnachtswunsch',
'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
'timestamp': 1419047100,
'upload_date': '20141220',
'timestamp': 1450950000,
'upload_date': '20151224',
'duration': 4628,
'uploader': 'KIKA',
},
@@ -71,8 +71,8 @@ class MDRIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_url = self._search_regex(
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>\\?/.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
webpage, 'data url', default=None, group='url').replace('\/', '/')
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
webpage, 'data url', group='url').replace('\/', '/')
doc = self._download_xml(
compat_urlparse.urljoin(url, data_url), video_id)

View File

@@ -81,6 +81,9 @@ class MetacafeIE(InfoExtractor):
'title': 'Open: This is Face the Nation, February 9',
'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
'duration': 96,
'uploader': 'CBSI-NEW',
'upload_date': '20140209',
'timestamp': 1391959800,
},
'params': {
# rtmp download

View File

@@ -11,7 +11,7 @@ from ..utils import (
class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
'info_dict': {
'id': '3698222',
@@ -20,7 +20,17 @@ class MetacriticIE(InfoExtractor):
'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
'duration': 221,
},
}
'skip': 'Not providing trailers anymore',
}, {
'url': 'http://www.metacritic.com/game/playstation-4/tales-from-the-borderlands-a-telltale-game-series/trailers/5740315',
'info_dict': {
'id': '5740315',
'ext': 'mp4',
'title': 'Tales from the Borderlands - Finale: The Vault of the Traveler',
'description': 'In the final episode of the season, all hell breaks loose. Jack is now in control of Helios\' systems, and he\'s ready to reclaim his rightful place as king of Hyperion (with or without you).',
'duration': 114,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -0,0 +1,63 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV'
_TEST = {
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
'md5': '',
'info_dict': {
'id': '3116640',
'ext': 'mp4',
'title': '我是歌手第四季双年巅峰会:韩红李玟“双王”领军对抗',
'description': '我是歌手第四季双年巅峰会',
'duration': 7461,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # m3u8 download
},
}
_FORMAT_MAP = {
'标清': ('Standard', 0),
'高清': ('High', 1),
'超清': ('SuperHigh', 2),
}
def _real_extract(self, url):
video_id = self._match_id(url)
api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id,
query={'video_id': video_id})['data']
info = api_data['info']
formats = []
for idx, stream in enumerate(api_data['stream']):
format_name = stream.get('name')
format_id, preference = self._FORMAT_MAP.get(format_name, (None, None))
format_info = self._download_json(
stream['url'], video_id,
note='Download video info for format %s' % format_id or '#%d' % idx)
formats.append({
'format_id': format_id,
'url': format_info['info'],
'ext': 'mp4', # These are m3u8 playlists
'preference': preference,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info['title'].strip(),
'formats': formats,
'description': info.get('desc'),
'duration': int_or_none(info.get('duration')),
'thumbnail': info.get('thumb'),
}

View File

@@ -1,8 +1,5 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -20,21 +17,28 @@ class MinistryGridIE(InfoExtractor):
'id': '3453494717001',
'ext': 'mp4',
'title': 'The Gospel by Numbers',
'thumbnail': 're:^https?://.*\.jpg',
'upload_date': '20140410',
'description': 'Coming soon from T4G 2014!',
'uploader': 'LifeWay Christian Resources (MG)',
'uploader_id': '2034960640001',
'timestamp': 1397145591,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['TDSLifeway'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
portlets_json = self._search_regex(
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list')
portlets = json.loads(portlets_json)
portlets = self._parse_json(self._search_regex(
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list'),
video_id)
pl_id = self._search_regex(
r'<!--\s*p_l_id - ([0-9]+)<br>', webpage, 'p_l_id')
r'getPlid:function\(\){return"(\d+)"}', webpage, 'p_l_id')
for i, portlet in enumerate(portlets):
portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet)
@@ -46,12 +50,8 @@ class MinistryGridIE(InfoExtractor):
r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
default=None)
if video_iframe_url:
surl = smuggle_url(
video_iframe_url, {'force_videoid': video_id})
return {
'_type': 'url',
'id': video_id,
'url': surl,
}
return self.url_result(
smuggle_url(video_iframe_url, {'force_videoid': video_id}),
video_id=video_id)
raise ExtractorError('Could not find video iframe in any portlets')

View File

@@ -15,9 +15,9 @@ class MiTeleIE(InfoExtractor):
IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
_TESTS = [{
_TEST = {
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
'md5': '0ff1a13aebb35d9bc14081ff633dd324',
# MD5 is unstable
'info_dict': {
'id': '0NF1jJnxS1Wu3pHrmvFyw2',
'display_id': 'programa-144',
@@ -27,7 +27,7 @@ class MiTeleIE(InfoExtractor):
'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 2913,
},
}]
}
def _real_extract(self, url):
display_id = self._match_id(url)

View File

@@ -1,26 +1,35 @@
from __future__ import unicode_literals
import base64
import functools
import itertools
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..compat import (
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
HEADRequest,
OnDemandPagedList,
parse_count,
str_to_int,
)
class MixcloudIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)'
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
IE_NAME = 'mixcloud'
_TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
'info_dict': {
'id': 'dholbach-cryptkeeper',
'ext': 'mp3',
'ext': 'm4a',
'title': 'Cryptkeeper',
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
'uploader': 'Daniel Holbach',
@@ -38,22 +47,22 @@ class MixcloudIE(InfoExtractor):
'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
'uploader': 'Gilles Peterson Worldwide',
'uploader_id': 'gillespeterson',
'thumbnail': 're:https?://.*/images/',
'thumbnail': 're:https?://.*',
'view_count': int,
'like_count': int,
},
}]
def _check_url(self, url, track_id, ext):
try:
# We only want to know if the request succeed
# don't download the whole file
self._request_webpage(
HEADRequest(url), track_id,
'Trying %s URL' % ext)
return True
except ExtractorError:
return False
# See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
@staticmethod
def _decrypt_play_info(play_info):
KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
play_info = base64.b64decode(play_info.encode('ascii'))
return ''.join([
compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
for idx, ch in enumerate(play_info)])
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -63,14 +72,19 @@ class MixcloudIE(InfoExtractor):
webpage = self._download_webpage(url, track_id)
preview_url = self._search_regex(
r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url')
song_url = re.sub(r'audiocdn(\d+)', r'stream\1', preview_url)
song_url = song_url.replace('/previews/', '/c/originals/')
if not self._check_url(song_url, track_id, 'mp3'):
song_url = song_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
if not self._check_url(song_url, track_id, 'm4a'):
raise ExtractorError('Unable to extract track url')
message = self._html_search_regex(
r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
webpage, 'error message', default=None)
encrypted_play_info = self._search_regex(
r'm-play-info="([^"]+)"', webpage, 'play info')
play_info = self._parse_json(
self._decrypt_play_info(encrypted_play_info), track_id)
if message and 'stream_url' not in play_info:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
song_url = play_info['stream_url']
PREFIX = (
r'm-play-on-spacebar[^>]+'
@@ -105,3 +119,201 @@ class MixcloudIE(InfoExtractor):
'view_count': view_count,
'like_count': like_count,
}
class MixcloudPlaylistBaseIE(InfoExtractor):
_PAGE_SIZE = 24
def _find_urls_in_page(self, page):
for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
yield self.url_result(
compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
MixcloudIE.ie_key())
def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
real_page_number = real_page_number or current_page + 1
return self._download_webpage(
'https://www.mixcloud.com/%s/' % path, video_id,
note='Download %s (page %d)' % (page_name, current_page + 1),
errnote='Unable to download %s' % page_name,
query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
headers={'X-Requested-With': 'XMLHttpRequest'})
def _tracks_page_func(self, page, video_id, page_name, current_page):
resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
for item in self._find_urls_in_page(resp):
yield item
def _get_user_description(self, page_content):
return self._html_search_regex(
r'<div[^>]+class="description-text"[^>]*>(.+?)</div>',
page_content, 'user description', fatal=False)
class MixcloudUserIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
IE_NAME = 'mixcloud:user'
_TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/uploads/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/favorites/',
'info_dict': {
'id': 'dholbach_favorites',
'title': 'Daniel Holbach (favorites)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.mixcloud.com/dholbach/listens/',
'info_dict': {
'id': 'dholbach_listens',
'title': 'Daniel Holbach (listens)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
list_type = mobj.group('type')
# if only a profile URL was supplied, default to download all uploads
if list_type is None:
list_type = 'uploads'
video_id = '%s_%s' % (user_id, list_type)
profile = self._download_webpage(
'https://www.mixcloud.com/%s/' % user_id, video_id,
note='Downloading user profile',
errnote='Unable to download user profile')
username = self._og_search_title(profile)
description = self._get_user_description(profile)
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
self._PAGE_SIZE, use_cache=True)
return self.playlist_result(
entries, video_id, '%s (%s)' % (username, list_type), description)
class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/playlists/(?P<playlist>[^/]+)/?$'
IE_NAME = 'mixcloud:playlist'
_TESTS = [{
'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
'info_dict': {
'id': 'RedBullThre3style_tokyo-finalists-2015',
'title': 'National Champions 2015',
'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
},
'playlist_mincount': 16,
}, {
'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
'info_dict': {
'id': 'maxvibes_jazzcat-on-ness-radio',
'title': 'Jazzcat on Ness Radio',
'description': 'md5:7bbbf0d6359a0b8cda85224be0f8f263',
},
'playlist_mincount': 23
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
playlist_id = mobj.group('playlist')
video_id = '%s_%s' % (user_id, playlist_id)
profile = self._download_webpage(
url, user_id,
note='Downloading playlist page',
errnote='Unable to download playlist page')
description = self._get_user_description(profile)
playlist_title = self._html_search_regex(
r'<span[^>]+class="[^"]*list-playlist-title[^"]*"[^>]*>(.*?)</span>',
profile, 'playlist title')
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
self._PAGE_SIZE)
return self.playlist_result(entries, video_id, playlist_title, description)
class MixcloudStreamIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
IE_NAME = 'mixcloud:stream'
_TEST = {
'url': 'https://www.mixcloud.com/FirstEar/stream/',
'info_dict': {
'id': 'FirstEar',
'title': 'First Ear',
'description': 'Curators of good music\nfirstearmusic.com',
},
'playlist_mincount': 192,
}
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(url, user_id)
entries = []
prev_page_url = None
def _handle_page(page):
entries.extend(self._find_urls_in_page(page))
return self._search_regex(
r'm-next-page-url="([^"]+)"', page,
'next page URL', default=None)
next_page_url = _handle_page(webpage)
for idx in itertools.count(0):
if not next_page_url or prev_page_url == next_page_url:
break
prev_page_url = next_page_url
current_page = int(self._search_regex(
r'\?page=(\d+)', next_page_url, 'next page number'))
next_page_url = _handle_page(self._fetch_tracks_page(
'%s/stream' % user_id, user_id, 'stream', idx,
real_page_number=current_page))
username = self._og_search_title(webpage)
description = self._get_user_description(webpage)
return self.playlist_result(entries, user_id, username, description)

View File

@@ -1,110 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class MooshareIE(InfoExtractor):
IE_NAME = 'mooshare'
IE_DESC = 'Mooshare.biz'
_VALID_URL = r'https?://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
_TESTS = [
{
'url': 'http://mooshare.biz/8dqtk4bjbp8g',
'md5': '4e14f9562928aecd2e42c6f341c8feba',
'info_dict': {
'id': '8dqtk4bjbp8g',
'ext': 'mp4',
'title': 'Comedy Football 2011 - (part 1-2)',
'duration': 893,
},
},
{
'url': 'http://mooshare.biz/aipjtoc4g95j',
'info_dict': {
'id': 'aipjtoc4g95j',
'ext': 'mp4',
'title': 'Orange Caramel Dashing Through the Snow',
'duration': 212,
},
'params': {
# rtmp download
'skip_download': True,
}
}
]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id, 'Downloading page')
if re.search(r'>Video Not Found or Deleted<', page) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
download_form = {
'op': 'download1',
'id': video_id,
'hash': hash_key,
}
request = sanitized_Request(
'http://mooshare.biz/%s' % video_id, urlencode_postdata(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._sleep(5, video_id)
video_page = self._download_webpage(request, video_id, 'Downloading video page')
thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
duration = int(duration_str) if duration_str is not None else None
formats = []
# SD video
mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'sd',
'format': 'SD',
})
# HD video
mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'hd',
'format': 'HD',
})
# rtmp video
mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('rtmpurl'),
'play_path': mobj.group('playpath'),
'rtmp_live': False,
'ext': 'mp4',
'format_id': 'rtmp',
'format': 'HD',
})
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -1,17 +1,21 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
)
class MusicPlayOnIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=100&play)=(?P<id>\d+)'
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://en.musicplayon.com/play?v=433377',
'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
'info_dict': {
'id': '433377',
'ext': 'mp4',
@@ -20,15 +24,16 @@ class MusicPlayOnIE(InfoExtractor):
'duration': 342,
'uploader': 'ultrafish',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
}, {
'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
'only_matching': True,
}]
_URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
url = self._URL_TEMPLATE % video_id
page = self._download_webpage(url, video_id)
@@ -40,28 +45,14 @@ class MusicPlayOnIE(InfoExtractor):
uploader = self._html_search_regex(
r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
formats = [
{
'url': 'http://media0-eu-nl.musicplayon.com/stream-mobile?id=%s&type=.mp4' % video_id,
'ext': 'mp4',
}
]
manifest = self._download_webpage(
'http://en.musicplayon.com/manifest.m3u8?v=%s' % video_id, video_id, 'Downloading manifest')
for entry in manifest.split('#')[1:]:
if entry.startswith('EXT-X-STREAM-INF:'):
meta, url, _ = entry.split('\n')
params = dict(param.split('=') for param in meta.split(',')[1:])
formats.append({
'url': url,
'ext': 'mp4',
'tbr': int(params['BANDWIDTH']),
'width': int(params['RESOLUTION'].split('x')[1]),
'height': int(params['RESOLUTION'].split('x')[-1]),
'format_note': params['NAME'].replace('"', '').strip(),
})
sources = self._parse_json(
self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
video_id, transform_source=js_to_json)
formats = [{
'url': compat_urlparse.urljoin(url, source['src']),
'ext': mimetype2ext(source.get('type')),
'format_note': source.get('data-res'),
} for source in sources]
return {
'id': video_id,

View File

@@ -1,63 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
class MuzuTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.muzu\.tv/(.+?)/(.+?)/(?P<id>\d+)'
IE_NAME = 'muzu.tv'
_TEST = {
'url': 'http://www.muzu.tv/defected/marcashken-featuring-sos-cat-walk-original-mix-music-video/1981454/',
'md5': '98f8b2c7bc50578d6a0364fff2bfb000',
'info_dict': {
'id': '1981454',
'ext': 'mp4',
'title': 'Cat Walk (Original Mix)',
'description': 'md5:90e868994de201b2570e4e5854e19420',
'uploader': 'MarcAshken featuring SOS',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
info_data = compat_urllib_parse_urlencode({
'format': 'json',
'url': url,
})
info = self._download_json(
'http://www.muzu.tv/api/oembed/?%s' % info_data,
video_id, 'Downloading video info')
player_info = self._download_json(
'http://player.muzu.tv/player/playerInit?ai=%s' % video_id,
video_id, 'Downloading player info')
video_info = player_info['videos'][0]
for quality in ['1080', '720', '480', '360']:
if video_info.get('v%s' % quality):
break
data = compat_urllib_parse_urlencode({
'ai': video_id,
# Even if each time you watch a video the hash changes,
# it seems to work for different videos, and it will work
# even if you use any non empty string as a hash
'viewhash': 'VBNff6djeV4HV5TRPW5kOHub2k',
'device': 'web',
'qv': quality,
})
video_url_info = self._download_json(
'http://player.muzu.tv/player/requestVideo?%s' % data,
video_id, 'Downloading video url')
video_url = video_url_info['url']
return {
'id': video_id,
'title': info['title'],
'url': video_url,
'thumbnail': info['thumbnail_url'],
'description': info['description'],
'uploader': info['author_name'],
}

View File

@@ -10,9 +10,10 @@ from ..utils import (
class MwaveIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
_TEST = {
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
'md5': 'c930e27b7720aaa3c9d0018dfc8ff6cc',
# md5 is unstable
'info_dict': {
'id': '168859',
'ext': 'flv',
@@ -56,3 +57,28 @@ class MwaveIE(InfoExtractor):
'view_count': int_or_none(vod_info.get('hit')),
'formats': formats,
}
class MwaveMeetGreetIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/meetgreet/view/(?P<id>\d+)'
_TEST = {
'url': 'http://mwave.interest.me/meetgreet/view/256',
'info_dict': {
'id': '173294',
'ext': 'flv',
'title': '[MEET&GREET] Park BoRam',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Mwave',
'duration': 3634,
'view_count': int,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
clip_id = self._html_search_regex(
r'<iframe[^>]+src="/mnettv/ifr_clip\.m\?searchVideoDetailVO\.clip_id=(\d+)',
webpage, 'clip ID')
clip_url = MwaveIE._URL_TEMPLATE % clip_id
return self.url_result(clip_url, 'Mwave', clip_id)

View File

@@ -134,6 +134,9 @@ class NBCSportsIE(InfoExtractor):
'ext': 'flv',
'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
'uploader': 'NBCU-SPORTS',
'upload_date': '20150330',
'timestamp': 1427726529,
}
}
@@ -172,7 +175,7 @@ class CSNNEIE(InfoExtractor):
class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today)\.com/
(?:video/.+?/(?P<id>\d+)|
([^/]+/)*(?P<display_id>[^/?]+))
'''
@@ -230,6 +233,18 @@ class NBCNewsIE(ThePlatformIE):
},
'expected_warnings': ['http-6000 is not available']
},
{
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
'md5': '118d7ca3f0bea6534f119c68ef539f71',
'info_dict': {
'id': '669831235788',
'ext': 'mp4',
'title': 'See the aurora borealis from space in stunning new NASA video',
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
'upload_date': '20160420',
'timestamp': 1461152093,
},
},
{
'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952',
'only_matching': True,
@@ -264,7 +279,10 @@ class NBCNewsIE(ThePlatformIE):
info = bootstrap['results'][0]['video']
else:
player_instance_json = self._search_regex(
r'videoObj\s*:\s*({.+})', webpage, 'player instance')
r'videoObj\s*:\s*({.+})', webpage, 'player instance', default=None)
if not player_instance_json:
player_instance_json = self._html_search_regex(
r'data-video="([^"]+)"', webpage, 'video json')
info = self._parse_json(player_instance_json, display_id)
video_id = info['mpxId']
title = info['title']
@@ -295,7 +313,7 @@ class NBCNewsIE(ThePlatformIE):
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
else:
tbr = int_or_none(video_asset.get('bitRate'), 1000)
tbr = int_or_none(video_asset.get('bitRate') or video_asset.get('bitrate'), 1000)
format_id = 'http%s' % ('-%d' % tbr if tbr else '')
video_url = update_url_query(
video_url, {'format': 'redirect'})
@@ -321,10 +339,9 @@ class NBCNewsIE(ThePlatformIE):
'id': video_id,
'title': title,
'description': info.get('description'),
'thumbnail': info.get('description'),
'thumbnail': info.get('thumbnail'),
'duration': int_or_none(info.get('duration')),
'timestamp': parse_iso8601(info.get('pubDate')),
'timestamp': parse_iso8601(info.get('pubDate') or info.get('pub_date')),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -1,80 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_iso8601,
xpath_text,
)
class NerdistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nerdist\.com/vepisode/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.nerdist.com/vepisode/exclusive-which-dc-characters-w',
'md5': '3698ed582931b90d9e81e02e26e89f23',
'info_dict': {
'display_id': 'exclusive-which-dc-characters-w',
'id': 'RPHpvJyr',
'ext': 'mp4',
'title': 'Your TEEN TITANS Revealed! Who\'s on the show?',
'thumbnail': 're:^https?://.*/thumbs/.*\.jpg$',
'description': 'Exclusive: Find out which DC Comics superheroes will star in TEEN TITANS Live-Action TV Show on Nerdist News with Jessica Chobot!',
'uploader': 'Eric Diaz',
'upload_date': '20150202',
'timestamp': 1422892808,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'''(?x)<script\s+(?:type="text/javascript"\s+)?
src="https?://content\.nerdist\.com/players/([a-zA-Z0-9_]+)-''',
webpage, 'video ID')
timestamp = parse_iso8601(self._html_search_meta(
'shareaholic:article_published_time', webpage, 'upload date'))
uploader = self._html_search_meta(
'shareaholic:article_author_name', webpage, 'article author')
doc = self._download_xml(
'http://content.nerdist.com/jw6/%s.xml' % video_id, video_id)
video_info = doc.find('.//item')
title = xpath_text(video_info, './title', fatal=True)
description = xpath_text(video_info, './description')
thumbnail = xpath_text(
video_info, './{http://rss.jwpcdn.com/}image', 'thumbnail')
formats = []
for source in video_info.findall('./{http://rss.jwpcdn.com/}source'):
vurl = source.attrib['file']
ext = determine_ext(vurl)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
vurl, video_id, entry_protocol='m3u8_native', ext='mp4',
preference=0))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
vurl, video_id, fatal=False
))
else:
formats.append({
'format_id': ext,
'url': vurl,
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
'uploader': uploader,
}

View File

@@ -89,6 +89,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1431878400,
'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'No lyrics translation.',
'url': 'http://music.163.com/#/song?id=29822014',
@@ -101,6 +102,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1419523200,
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'No lyrics.',
'url': 'http://music.163.com/song?id=17241424',
@@ -112,6 +114,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20080211',
'timestamp': 1202745600,
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'Has translated name.',
'url': 'http://music.163.com/#/song?id=22735043',
@@ -124,7 +127,8 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20100127',
'timestamp': 1264608000,
'alt_title': '说出愿望吧(Genie)',
}
},
'skip': 'Blocked outside Mainland China',
}]
def _process_lyrics(self, lyrics_info):
@@ -192,6 +196,7 @@ class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
'title': 'B\'day',
},
'playlist_count': 23,
'skip': 'Blocked outside Mainland China',
}
def _real_extract(self, url):
@@ -223,6 +228,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '张惠妹 - aMEI;阿密特',
},
'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}, {
'note': 'Singer has translated name.',
'url': 'http://music.163.com/#/artist?id=124098',
@@ -231,6 +237,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '李昇基 - 이승기',
},
'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}]
def _real_extract(self, url):
@@ -266,6 +273,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
},
'playlist_count': 99,
'skip': 'Blocked outside Mainland China',
}, {
'note': 'Toplist/Charts sample',
'url': 'http://music.163.com/#/discover/toplist?id=3733003',
@@ -275,6 +283,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
},
'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}]
def _real_extract(self, url):
@@ -314,6 +323,7 @@ class NetEaseMusicMvIE(NetEaseMusicBaseIE):
'creator': '白雅言',
'upload_date': '20150520',
},
'skip': 'Blocked outside Mainland China',
}
def _real_extract(self, url):
@@ -357,6 +367,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'upload_date': '20150613',
'duration': 900,
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
@@ -366,6 +377,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
},
'playlist_count': 4,
'skip': 'Blocked outside Mainland China',
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
@@ -379,7 +391,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
},
'params': {
'noplaylist': True
}
},
'skip': 'Blocked outside Mainland China',
}]
def _real_extract(self, url):
@@ -438,6 +451,7 @@ class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
'description': 'md5:766220985cbd16fdd552f64c578a6b15'
},
'playlist_mincount': 40,
'skip': 'Blocked outside Mainland China',
}
_PAGE_SIZE = 1000

View File

@@ -7,8 +7,8 @@ from .common import InfoExtractor
class NewgroundsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/audio/listen/(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.newgrounds.com/audio/listen/549479',
'md5': 'fe6033d297591288fa1c1f780386f07a',
'info_dict': {
@@ -17,7 +17,16 @@ class NewgroundsIE(InfoExtractor):
'title': 'B7 - BusMode',
'uploader': 'Burn7',
}
}
}, {
'url': 'http://www.newgrounds.com/portal/view/673111',
'md5': '3394735822aab2478c31b1004fe5e5bc',
'info_dict': {
'id': '673111',
'ext': 'mp4',
'title': 'Dancin',
'uploader': 'Squirrelman82',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -25,9 +34,11 @@ class NewgroundsIE(InfoExtractor):
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex(
r',"name":"([^"]+)",', webpage, 'music title')
r'<title>([^>]+)</title>', webpage, 'title')
uploader = self._html_search_regex(
r',"artist":"([^"]+)",', webpage, 'music uploader')
[r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'],
webpage, 'uploader')
music_url_json_string = self._html_search_regex(
r'({"url":"[^"]+"),', webpage, 'music url') + '}'

View File

@@ -4,24 +4,24 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
)
class NewstubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
_TEST = {
'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym',
'md5': '801eef0c2a9f4089fa04e4fe3533abdc',
'info_dict': {
'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6',
'ext': 'flv',
'ext': 'mp4',
'title': 'Телеканал CNN переместил город Славянск в Крым',
'description': 'md5:419a8c9f03442bc0b0a794d689360335',
'duration': 31.05,
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
@@ -62,7 +62,6 @@ class NewstubeIE(InfoExtractor):
server = media_location.find(ns('./Server')).text
app = media_location.find(ns('./App')).text
media_id = stream_info.find(ns('./Id')).text
quality_id = stream_info.find(ns('./QualityId')).text
name = stream_info.find(ns('./Name')).text
width = int(stream_info.find(ns('./Width')).text)
height = int(stream_info.find(ns('./Height')).text)
@@ -74,12 +73,38 @@ class NewstubeIE(InfoExtractor):
'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'],
'page_url': url,
'ext': 'flv',
'format_id': quality_id,
'format_note': name,
'format_id': 'rtmp' + ('-%s' % name if name else ''),
'width': width,
'height': height,
})
sources_data = self._download_json(
'http://www.newstube.ru/player2/getsources?guid=%s' % video_guid,
video_guid, fatal=False)
if sources_data:
for source in sources_data.get('Sources', []):
source_url = source.get('Src')
if not source_url:
continue
height = int_or_none(source.get('Height'))
f = {
'format_id': 'http' + ('-%dp' % height if height else ''),
'url': source_url,
'width': int_or_none(source.get('Width')),
'height': height,
}
source_type = source.get('Type')
if source_type:
mobj = re.search(r'codecs="([^,]+),\s*([^"]+)"', source_type)
if mobj:
vcodec, acodec = mobj.groups()
f.update({
'vcodec': vcodec,
'acodec': acodec,
})
formats.append(f)
self._check_formats(formats, video_guid)
self._sort_formats(formats)
return {

View File

@@ -8,10 +8,15 @@ from .common import InfoExtractor
from ..compat import (
compat_urlparse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse
compat_urllib_parse_urlparse,
compat_str,
)
from ..utils import (
unified_strdate,
determine_ext,
int_or_none,
parse_iso8601,
parse_duration,
)
@@ -70,8 +75,8 @@ class NHLBaseInfoExtractor(InfoExtractor):
return ret
class NHLIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com'
class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com:videocenter'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)'
_TESTS = [{
@@ -186,8 +191,8 @@ class NHLNewsIE(NHLBaseInfoExtractor):
return self._real_extract_video(video_id)
class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com:videocenter'
class NHLVideocenterCategoryIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com:videocenter:category'
IE_DESC = 'NHL videocenter category'
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$'
_TEST = {
@@ -236,3 +241,86 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
'id': cat_id,
'entries': [self._extract_video(v) for v in videos],
}
class NHLIE(InfoExtractor):
IE_NAME = 'nhl.com'
_VALID_URL = r'https?://(?:www\.)?nhl\.com/([^/]+/)*c-(?P<id>\d+)'
_TESTS = [{
# type=video
'url': 'https://www.nhl.com/video/anisimov-cleans-up-mess/t-277752844/c-43663503',
'md5': '0f7b9a8f986fb4b4eeeece9a56416eaf',
'info_dict': {
'id': '43663503',
'ext': 'mp4',
'title': 'Anisimov cleans up mess',
'description': 'md5:a02354acdfe900e940ce40706939ca63',
'timestamp': 1461288600,
'upload_date': '20160422',
},
}, {
# type=article
'url': 'https://www.nhl.com/news/dennis-wideman-suspended/c-278258934',
'md5': '1f39f4ea74c1394dea110699a25b366c',
'info_dict': {
'id': '40784403',
'ext': 'mp4',
'title': 'Wideman suspended by NHL',
'description': 'Flames defenseman Dennis Wideman was banned 20 games for violation of Rule 40 (Physical Abuse of Officials)',
'upload_date': '20160204',
'timestamp': 1454544904,
},
}]
def _real_extract(self, url):
tmp_id = self._match_id(url)
video_data = self._download_json(
'https://nhl.bamcontent.com/nhl/id/v1/%s/details/web-v1.json' % tmp_id,
tmp_id)
if video_data.get('type') == 'article':
video_data = video_data['media']
video_id = compat_str(video_data['id'])
title = video_data['title']
formats = []
for playback in video_data.get('playbacks', []):
playback_url = playback.get('url')
if not playback_url:
continue
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=playback.get('name', 'hls'), fatal=False))
else:
height = int_or_none(playback.get('height'))
formats.append({
'format_id': playback.get('name', 'http' + ('-%dp' % height if height else '')),
'url': playback_url,
'width': int_or_none(playback.get('width')),
'height': height,
})
self._sort_formats(formats, ('preference', 'width', 'height', 'tbr', 'format_id'))
thumbnails = []
for thumbnail_id, thumbnail_data in video_data.get('image', {}).get('cuts', {}).items():
thumbnail_url = thumbnail_data.get('src')
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
'width': int_or_none(thumbnail_data.get('width')),
'height': int_or_none(thumbnail_data.get('height')),
})
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'timestamp': parse_iso8601(video_data.get('date')),
'duration': parse_duration(video_data.get('duration')),
'thumbnails': thumbnails,
'formats': formats,
}

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .screenwavemedia import ScreenwaveMediaIE
from ..utils import (
unified_strdate,
@@ -12,7 +13,6 @@ class NormalbootsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?normalboots\.com/video/(?P<id>[0-9a-z-]*)/?$'
_TEST = {
'url': 'http://normalboots.com/video/home-alone-games-jontron/',
'md5': '8bf6de238915dd501105b44ef5f1e0f6',
'info_dict': {
'id': 'home-alone-games-jontron',
'ext': 'mp4',
@@ -22,9 +22,10 @@ class NormalbootsIE(InfoExtractor):
'upload_date': '20140125',
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
'add_ie': ['ScreenwaveMedia'],
}
def _real_extract(self, url):
@@ -38,16 +39,15 @@ class NormalbootsIE(InfoExtractor):
r'<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
webpage, 'date', fatal=False))
player_url = self._html_search_regex(
r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"',
webpage, 'player url')
player_page = self._download_webpage(player_url, video_id)
video_url = self._html_search_regex(
r"file:\s'(?P<file>[^']+\.mp4)'", player_page, 'file')
screenwavemedia_url = self._html_search_regex(
ScreenwaveMediaIE.EMBED_PATTERN, webpage, 'screenwave URL',
group='url')
return {
'_type': 'url_transparent',
'id': video_id,
'url': video_url,
'url': screenwavemedia_url,
'ie_key': ScreenwaveMediaIE.ie_key(),
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),

View File

@@ -16,7 +16,14 @@ class NovaMovIE(InfoExtractor):
IE_NAME = 'novamov'
IE_DESC = 'NovaMov'
_VALID_URL_TEMPLATE = r'http://(?:(?:www\.)?%(host)s/(?:file|video|mobile/#/videos)/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<id>[a-z\d]{13})'
_VALID_URL_TEMPLATE = r'''(?x)
http://
(?:
(?:www\.)?%(host)s/(?:file|video|mobile/\#/videos)/|
(?:(?:embed|www)\.)%(host)s/embed(?:\.php|/)?\?(?:.*?&)?\bv=
)
(?P<id>[a-z\d]{13})
'''
_VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
_HOST = 'www.novamov.com'
@@ -189,7 +196,7 @@ class AuroraVidIE(NovaMovIE):
_FILE_DELETED_REGEX = r'This file no longer exists on our servers!<'
_TEST = {
_TESTS = [{
'url': 'http://www.auroravid.to/video/4rurhn9x446jj',
'md5': '7205f346a52bbeba427603ba10d4b935',
'info_dict': {
@@ -199,4 +206,7 @@ class AuroraVidIE(NovaMovIE):
'description': 'search engine optimization is used to rank the web page in the google search engine'
},
'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
}
}, {
'url': 'http://www.auroravid.to/embed/?v=4rurhn9x446jj',
'only_matching': True,
}]

View File

@@ -23,7 +23,7 @@ class NRKIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.nrk.no/video/PS*150533',
'md5': 'bccd850baebefe23b56d708a113229c2',
# MD5 is unstable
'info_dict': {
'id': '150533',
'ext': 'flv',
@@ -34,7 +34,7 @@ class NRKIE(InfoExtractor):
},
{
'url': 'http://www.nrk.no/video/PS*154915',
'md5': '0b1493ba1aae7d9579a5ad5531bc395a',
# MD5 is unstable
'info_dict': {
'id': '154915',
'ext': 'flv',

View File

@@ -5,8 +5,6 @@ import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
sanitized_Request,
unified_strdate,
)
@@ -20,7 +18,6 @@ class NuvidIE(InfoExtractor):
'ext': 'mp4',
'title': 'Horny babes show their awesome bodeis and',
'duration': 129,
'upload_date': '20140508',
'age_limit': 18,
}
}
@@ -28,28 +25,31 @@ class NuvidIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
formats = []
page_url = 'http://m.nuvid.com/video/%s' % video_id
webpage = self._download_webpage(
page_url, video_id, 'Downloading video page')
# When dwnld_speed exists and has a value larger than the MP4 file's
# bitrate, Nuvid returns the MP4 URL
# It's unit is 100bytes/millisecond, see mobile-nuvid-min.js for the algorithm
self._set_cookie('nuvid.com', 'dwnld_speed', '10.0')
mp4_webpage = self._download_webpage(
page_url, video_id, 'Downloading video page for MP4 format')
for dwnld_speed, format_id in [(0, '3gp'), (5, 'mp4')]:
request = sanitized_Request(
'http://m.nuvid.com/play/%s' % video_id)
request.add_header('Cookie', 'skip_download_page=1; dwnld_speed=%d; adv_show=1' % dwnld_speed)
webpage = self._download_webpage(
request, video_id, 'Downloading %s page' % format_id)
video_url = self._html_search_regex(
r'<a\s+href="([^"]+)"\s+class="b_link">', webpage, '%s video URL' % format_id, fatal=False)
if not video_url:
continue
html5_video_re = r'(?s)<(?:video|audio)[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']',
video_url = self._html_search_regex(html5_video_re, webpage, video_id)
mp4_video_url = self._html_search_regex(html5_video_re, mp4_webpage, video_id)
formats = [{
'url': video_url,
}]
if mp4_video_url != video_url:
formats.append({
'url': video_url,
'format_id': format_id,
'url': mp4_video_url,
})
webpage = self._download_webpage(
'http://m.nuvid.com/video/%s' % video_id, video_id, 'Downloading video page')
title = self._html_search_regex(
[r'<span title="([^"]+)">',
r'<div class="thumb-holder video">\s*<h5[^>]*>([^<]+)</h5>'], webpage, 'title').strip()
r'<div class="thumb-holder video">\s*<h5[^>]*>([^<]+)</h5>',
r'<span[^>]+class="title_thumb">([^<]+)</span>'], webpage, 'title').strip()
thumbnails = [
{
'url': thumb_url,
@@ -57,9 +57,8 @@ class NuvidIE(InfoExtractor):
]
thumbnail = thumbnails[0]['url'] if thumbnails else None
duration = parse_duration(self._html_search_regex(
r'<i class="fa fa-clock-o"></i>\s*(\d{2}:\d{2})', webpage, 'duration', fatal=False))
upload_date = unified_strdate(self._html_search_regex(
r'<i class="fa fa-user"></i>\s*(\d{4}-\d{2}-\d{2})', webpage, 'upload date', fatal=False))
[r'<i class="fa fa-clock-o"></i>\s*(\d{2}:\d{2})',
r'<span[^>]+class="view_time">([^<]+)</span>'], webpage, 'duration', fatal=False))
return {
'id': video_id,
@@ -67,7 +66,6 @@ class NuvidIE(InfoExtractor):
'thumbnails': thumbnails,
'thumbnail': thumbnail,
'duration': duration,
'upload_date': upload_date,
'age_limit': 18,
'formats': formats,
}

View File

@@ -2,7 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..compat import (
compat_parse_qs,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
unified_strdate,
@@ -32,7 +36,7 @@ class OdnoklassnikiIE(InfoExtractor):
'skip': 'Video has been blocked',
}, {
# metadataUrl
'url': 'http://ok.ru/video/63567059965189-0',
'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
'md5': '9676cf86eff5391d35dea675d224e131',
'info_dict': {
'id': '63567059965189-0',
@@ -44,6 +48,7 @@ class OdnoklassnikiIE(InfoExtractor):
'uploader': '☭ Андрей Мещанинов ☭',
'like_count': int,
'age_limit': 0,
'start_time': 5,
},
}, {
# YouTube embed (metadataUrl, provider == USER_YOUTUBE)
@@ -60,6 +65,22 @@ class OdnoklassnikiIE(InfoExtractor):
'uploader': 'Алина П',
'age_limit': 0,
},
}, {
# YouTube embed (metadata, provider == USER_YOUTUBE, no metadata.movie.title field)
'url': 'http://ok.ru/video/62036049272859-0',
'info_dict': {
'id': '62036049272859-0',
'ext': 'mp4',
'title': 'МУЗЫКА ДОЖДЯ .',
'description': 'md5:6f1867132bd96e33bf53eda1091e8ed0',
'upload_date': '20120106',
'uploader_id': '473534735899',
'uploader': 'МARINA D',
'age_limit': 0,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://ok.ru/web-api/video/moviePlayer/20079905452',
'only_matching': True,
@@ -78,6 +99,9 @@ class OdnoklassnikiIE(InfoExtractor):
}]
def _real_extract(self, url):
start_time = int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('fromTime', [None])[0])
video_id = self._match_id(url)
webpage = self._download_webpage(
@@ -106,7 +130,14 @@ class OdnoklassnikiIE(InfoExtractor):
video_id, 'Downloading metadata JSON')
movie = metadata['movie']
title = movie['title']
# Some embedded videos may not contain title in movie dict (e.g.
# http://ok.ru/video/62036049272859-0) thus we allow missing title
# here and it's going to be extracted later by an extractor that
# will process the actual embed.
provider = metadata.get('provider')
title = movie['title'] if provider == 'UPLOADED_ODKL' else movie.get('title')
thumbnail = movie.get('poster')
duration = int_or_none(movie.get('duration'))
@@ -135,9 +166,10 @@ class OdnoklassnikiIE(InfoExtractor):
'uploader_id': uploader_id,
'like_count': like_count,
'age_limit': age_limit,
'start_time': start_time,
}
if metadata.get('provider') == 'USER_YOUTUBE':
if provider == 'USER_YOUTUBE':
info.update({
'_type': 'url_transparent',
'url': movie['contentId'],

View File

@@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import determine_ext
from ..utils import (
determine_ext,
int_or_none,
)
class OnionStudiosIE(InfoExtractor):
@@ -17,7 +20,7 @@ class OnionStudiosIE(InfoExtractor):
'id': '2937',
'ext': 'mp4',
'title': 'Hannibal charges forward, stops for a cocktail',
'description': 'md5:545299bda6abf87e5ec666548c6a9448',
'description': 'md5:e786add7f280b7f0fe237b64cc73df76',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'The A.V. Club',
'uploader_id': 'TheAVClub',
@@ -42,9 +45,19 @@ class OnionStudiosIE(InfoExtractor):
formats = []
for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage):
if determine_ext(src) != 'm3u8': # m3u8 always results in 403
ext = determine_ext(src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
height = int_or_none(self._search_regex(
r'/(\d+)\.%s' % ext, src, 'height', default=None))
formats.append({
'format_id': ext + ('-%sp' % height if height else ''),
'url': src,
'height': height,
'ext': ext,
'preference': 1,
})
self._sort_formats(formats)
@@ -52,7 +65,7 @@ class OnionStudiosIE(InfoExtractor):
r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
webpage, 'title', group='title')
description = self._search_regex(
r'share_description\s*=\s*(["\'])(?P<description>[^\1]+?)\1',
r'share_description\s*=\s*(["\'])(?P<description>[^\'"]+?)\1',
webpage, 'description', default=None, group='description')
thumbnail = self._search_regex(
r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',

View File

@@ -96,6 +96,8 @@ class OoyalaIE(OoyalaBaseIE):
'description': 'How badly damaged does a drive have to be to defeat Russell and his crew? Apparently, smashed to bits.',
'duration': 853.386,
},
# The video in the original webpage now uses PlayWire
'skip': 'Ooyala said: movie expired',
}, {
# Only available for ipad
'url': 'http://player.ooyala.com/player.js?embedCode=x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0',

View File

@@ -6,8 +6,10 @@ import re
from .common import InfoExtractor
from ..compat import compat_chr
from ..utils import (
determine_ext,
encode_base_n,
ExtractorError,
mimetype2ext,
)
@@ -29,6 +31,11 @@ class OpenloadIE(InfoExtractor):
}, {
'url': 'https://openload.io/f/ZAn6oz-VZGE/',
'only_matching': True,
}, {
# unavailable via https://openload.co/f/Sxz5sADo82g/, different layout
# for title and ext
'url': 'https://openload.co/embed/Sxz5sADo82g/',
'only_matching': True,
}]
@staticmethod
@@ -96,12 +103,25 @@ class OpenloadIE(InfoExtractor):
r'<video[^>]+>\s*<script[^>]+>([^<]+)</script>',
webpage, 'JS code')
decoded = self.openload_decode(code)
video_url = self._search_regex(
r'return\s+"(https?://[^"]+)"', self.openload_decode(code), 'video URL')
r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True)
ext = mimetype2ext(self._search_regex(
r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
'mimetype', default=None, group='mimetype')) or determine_ext(
video_url, 'mp4')
return {
'id': video_id,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'title': title,
'ext': ext,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'url': video_url,
}

View File

@@ -185,6 +185,7 @@ class ORFFM4IE(InfoExtractor):
'timestamp': 1452456073,
'upload_date': '20160110',
},
'skip': 'Live streams on FM4 got deleted soon',
}
def _real_extract(self, url):

Some files were not shown because too many files have changed in this diff Show More