Compare commits

..

564 Commits

Author SHA1 Message Date
Philipp Hagemeister
289bbb350e release 2015.07.28 2015-07-28 11:28:33 +02:00
Sergey M․
d247a2c8bf [bbc] Fix regex 2015-07-28 02:06:27 +06:00
Sergey M․
88ed52aec9 [bbc] Add support for direct bbc.co.uk embeds 2015-07-27 22:05:51 +06:00
Jaime Marquínez Ferrándiz
4c6bd5b5b6 [youtube] Use 'vp8' and 'vp9' in lowercase (fixes #6358)
That's how YouTube reports them in their DASH manifest.
2015-07-26 14:14:28 +02:00
tippfeler
aeb7b41d44 [spiegel] Accept iframe urls
Closes #6370.
2015-07-26 14:00:56 +02:00
Sergey M․
5bdec59de1 [comcarcoff] Add support for singleshots (Closes #6366) 2015-07-26 09:51:54 +06:00
Sergey M․
7a89681722 [bbc] Skip DASH until supported 2015-07-25 22:32:54 +06:00
Sergey M․
51da40e621 [bbc] PEP8 2015-07-25 22:19:54 +06:00
Sergey M․
1af330f29f Merge branch 'atomicdryad-pr-bbcnews' 2015-07-25 20:22:13 +06:00
Sergey M․
9afa1770d1 [bbc] Improve playlist extraction, refactor, expand support and document 2015-07-25 20:21:42 +06:00
Yen Chi Hsuan
3ebbcce1c7 Merge branch 'cyb3r-ir90tv' 2015-07-25 18:49:24 +08:00
Yen Chi Hsuan
2c7c721933 [ir90tv] Optional fields should be non-fatal 2015-07-25 18:48:00 +08:00
Yen Chi Hsuan
7523647391 [ir90tv] PEP8 2015-07-25 18:43:07 +08:00
Yen Chi Hsuan
9700cd9097 [ir90tv] Improve title extraction 2015-07-25 18:42:40 +08:00
Yen Chi Hsuan
eab7faa0c1 [ir90tv] Test (?:www\.)? part in _VALID_URL 2015-07-25 18:39:01 +08:00
Yen Chi Hsuan
a56c1e38c7 Merge branch 'ir90tv' of https://github.com/cyb3r/youtube-dl into cyb3r-ir90tv 2015-07-25 18:36:04 +08:00
Sergey M․
40a2d17052 [soundcloud:playlist] Defer download link resolve 2015-07-25 15:48:44 +06:00
Sergey M․
b14fa8e687 [soundcloud:set] Defer download link resolve (Closes #6354) 2015-07-25 15:47:53 +06:00
Sergey M․
678e436f2e [youtube] Handle empty allowed regions (Closes #6351) 2015-07-25 02:09:34 +06:00
Sergey M․
ff81c4c99c Merge branch 'pr-bbcnews' of https://github.com/atomicdryad/youtube-dl into atomicdryad-pr-bbcnews 2015-07-24 20:21:48 +06:00
Sergey M.
420658e6cb Merge pull request #6350 from remitamine/letv
[letv] fix height
2015-07-24 20:03:44 +06:00
remitamine
593ddd851b [letv] fix height 2015-07-24 14:46:45 +01:00
Yen Chi Hsuan
1243402657 [dailymotion:playlist] Detect problematic redirection (fixes #6347) 2015-07-24 21:29:44 +08:00
Sergey M․
2b2ee140c3 [dailymotion:user] Fix _VALID_URL (Closes #6346) 2015-07-24 03:40:24 +06:00
Sergey M․
d97f5cd795 [bbccouk] Make more robust (Closes #6345) 2015-07-24 02:56:54 +06:00
Sergey M․
f3f0b8e403 [canalplus] Modernize 2015-07-24 00:21:13 +06:00
Sergey M․
660f9459da [canalplus] Fix m3u8 videos extension 2015-07-24 00:17:36 +06:00
Sergey M․
10952eb2cf [extractor/common] Consistent URL spelling 2015-07-23 23:37:45 +06:00
Sergey M.
cdad742700 Merge pull request #6306 from jaimeMF/extract_start_time
[youtube] Extract start_time
2015-07-23 23:33:54 +06:00
Sergey M․
a9e8f60ef6 [rtlnl] Update unencrypted streams comment (#6337) 2015-07-23 23:20:24 +06:00
corone17
a8b7b26068 Undo adaptive -> flash workaround
For a couple of days now rtlXL's non-DRM adaptive (m3u8) streams don't work anymore. By undoing the adaptive -> flash workaround youtube-dl returns the DRM streams again. Too bad for foreigners (geoblock).
The progressive streams still work fine.
2015-07-23 23:08:39 +06:00
Sergey M․
ba911137fa [rts] Add support for articles with videos on rhs (Closes #6332) 2015-07-23 22:50:59 +06:00
Sergey M․
d3f007af18 [daylimotion] Adapt to player v5 and modernize (Closes #6151, closes #6250) 2015-07-23 22:04:47 +06:00
Jaime Marquínez Ferrándiz
2929fa0e79 [youtube] Also look into the 'start' field for start_time 2015-07-23 13:21:18 +02:00
Jaime Marquínez Ferrándiz
297a564bee [youtube] Extract end_time 2015-07-23 13:20:21 +02:00
fnord
53b8247cb5 NationalGeographic._VALID_URL: work site-wide
Closes #6343.
2015-07-23 12:21:41 +02:00
Jaime Marquínez Ferrándiz
59db9f8018 [downloader/dash] Improve 'combine_url' (fixes #6341)
In some videos the base_url already ends with '/'.
2015-07-23 12:09:30 +02:00
Yen Chi Hsuan
b73b14f72c [viki] Rewrite dict selection codes 2015-07-23 14:02:19 +08:00
Yen Chi Hsuan
41597d9bed [viki] Fix description extraction (closes #6339) 2015-07-23 13:39:19 +08:00
Jaime Marquínez Ferrándiz
b37317d8b0 [generic] Unescape HTML escape sequences in redirect urls (fixes #6311) 2015-07-22 23:33:49 +02:00
Jaime Marquínez Ferrándiz
87dc451108 [youtube] Don't use the DASH manifest from 'get_video_info' if 'use_cipher_signature' is True (#5118)
Currently they give a 403 Forbidden error.
2015-07-22 23:27:41 +02:00
Jaime Marquínez Ferrándiz
ca4456eda8 [tagesschau] Remove unused import 2015-07-22 23:23:38 +02:00
Sergey M․
993df6bc22 [prosiebensat1] Modernize 2015-07-23 03:05:16 +06:00
Sergey M․
61be92e26a [prosiebensat1] Recognize DRM protected videos (#6334) 2015-07-23 03:04:23 +06:00
Sergey M․
c59b61c0da [viki] Fix height (Closes #6333) 2015-07-23 02:13:51 +06:00
Sergey M․
3e214851a4 [tagesschau] Improve tests 2015-07-23 01:01:35 +06:00
Sergey M․
a47b602b08 [tagesschau] Add support for audio 2015-07-23 01:01:04 +06:00
Sergey M․
a083b859e4 Merge branch 'rohieb-patch/enhance-tagesschau-regex' 2015-07-23 00:31:09 +06:00
Sergey M․
948199deac [tagesschau] Relax _VALID_URL and simplify 2015-07-23 00:30:48 +06:00
Sergey M․
c356620ec1 Merge branch 'patch/enhance-tagesschau-regex' of https://github.com/rohieb/youtube-dl into rohieb-patch/enhance-tagesschau-regex 2015-07-23 00:12:46 +06:00
Yen Chi Hsuan
f79ebf09a2 Credit @nichdu for Lecture2Go extractor 2015-07-23 01:56:55 +08:00
Sergey M․
c7620992d2 [pbs] No need to escape colon 2015-07-22 23:49:55 +06:00
Sergey M․
ce1bafdce9 [pbs] Clean up title construction rationale 2015-07-22 23:49:08 +06:00
Sergey M.
9872e588c8 Merge pull request #6292 from atomicdryad/pr-fix_pbs_titles
pbs: fix vague 'Full Episode' titles; prepend name of show
2015-07-22 23:42:19 +06:00
Sergey M․
d609edf4f1 [udemy] Handle already-logged-in scenario (Closes #6327) 2015-07-22 22:49:00 +06:00
Yen Chi Hsuan
3a99d321a8 Merge pull request #6303 from jaimeMF/dash_no_live
[YoutubeDL] don't default to 'bestvideo+bestaudio/best' for live videos
2015-07-23 00:39:44 +08:00
Yen Chi Hsuan
4bb3d999ac Merge branch 'nichdu-lecture2go' 2015-07-22 23:33:28 +08:00
Yen Chi Hsuan
40101dc311 [lecture2go] Make optional fields non-fatal 2015-07-22 23:25:32 +08:00
Yen Chi Hsuan
e9c6deffee [lecture2go] Add more metadata fields 2015-07-22 23:22:19 +08:00
Yen Chi Hsuan
9c29bc69f7 [utils] Improve parse_duration
Now dots are parsed. For example '87 Min.'
2015-07-22 23:15:22 +08:00
Yen Chi Hsuan
1e12429564 [lecture2go] Update _TEST 2015-07-22 23:05:14 +08:00
Yen Chi Hsuan
795704f0f1 [lecture2go] Support more formats 2015-07-22 22:39:46 +08:00
Yen Chi Hsuan
981b9cdc8c [lecture2go] Improve some regular expressions 2015-07-22 22:09:01 +08:00
Yen Chi Hsuan
3f724339db Merge branch 'lecture2go' of https://github.com/nichdu/youtube-dl into nichdu-lecture2go 2015-07-22 20:10:45 +08:00
Jaime Marquínez Ferrándiz
70c857b728 Credit Zach Bruggeman for the appleconnect extractor (#6190) 2015-07-22 11:49:54 +02:00
Sergey M․
c84683c88b [viewster] Strip titles 2015-07-22 02:08:25 +06:00
Sergey M․
b68a2613f8 [viewster] Rewrite for new API (Closes #6317) 2015-07-22 02:00:21 +06:00
Jaime Marquínez Ferrándiz
28afa6e77a [snagfilms] Don't use '_' as a variable that is used 2015-07-21 20:50:02 +02:00
Sergey M․
496ce6b349 [snagfilms] Improve m3u8 extraction (Closes #6309) 2015-07-21 23:54:31 +06:00
Philipp Hagemeister
ce9512b78b release 2015.07.21 2015-07-21 17:20:54 +02:00
Sergey M․
4eb59a6b1c [options] Fix a typo (#6307) 2015-07-21 09:11:23 +06:00
Sergey M.
80b1ee0a4c Merge pull request #6307 from jwilk/spelling
README: fix a typo
2015-07-21 09:09:55 +06:00
Jakub Wilk
f993afb26d README: fix a typo 2015-07-20 23:10:29 +02:00
Jaime Marquínez Ferrándiz
7c80519cbf [youtube] Extract start_time
From the 't=*' in the url.
Currently youtube-dl doesn't use the value, but it was requested for the mpv plugin.
2015-07-20 21:10:28 +02:00
Jaime Marquínez Ferrándiz
8250c32f49 [YoutubeDL] don't default to 'bestvideo+bestaudio/best' for live videos
Doesn't work currently.
2015-07-20 20:25:53 +02:00
Jaime Marquínez Ferrándiz
2fe1ff8582 [youtube] Set 'is_live' 2015-07-20 20:14:20 +02:00
Jaime Marquínez Ferrándiz
17ee98e1a5 [youtube] Fix upload_date in test 2015-07-20 19:48:50 +02:00
Jaime Marquínez Ferrándiz
2ee8f5d80f Merge remote-tracking branch 'yan12125/download-dash-segments' (#5886) 2015-07-20 19:34:24 +02:00
Yen Chi Hsuan
3f302bca8c [tumblr] Improve downloading notes 2015-07-20 18:54:47 +08:00
Yen Chi Hsuan
c909e5820e [tumblr] Delegate to GenericIE for non-tumblr videos
Fixes #6162
2015-07-20 18:51:25 +08:00
Yen Chi Hsuan
a1b85269a4 [extractor/generic] Support vid.me embeds 2015-07-20 18:49:53 +08:00
Sergey M․
faa1f83ab4 [twitch:stream] Improve _VALID_URL (Closes #6294) 2015-07-20 02:16:07 +06:00
Sergey M․
308c505c3d [francetv] Improve _VALID_URL 2015-07-20 01:03:43 +06:00
fnord
0eacd2aaae less clunky if statement 2015-07-19 13:59:12 -05:00
Sergey M․
18ae46ad4b [vice] Modernize 2015-07-20 00:42:25 +06:00
Sergey M․
65c2b21df1 [vice] Make test only matching 2015-07-20 00:40:43 +06:00
Sergey M․
772acaf31f [vice] Do not capture unused groups in _VALID_URL 2015-07-20 00:39:55 +06:00
Sergey M.
f8d0745e27 Merge pull request #6291 from atomicdryad/pr-fixvice
fix/support news.vice.com
2015-07-20 00:39:16 +06:00
Sergey M․
d719c6a5ab [ard:mediathek] Add test for direct mp4 2015-07-20 00:13:39 +06:00
Sergey M․
769efa16af [ard:mediathek] Remove dead test 2015-07-20 00:08:21 +06:00
Sergey M․
86b4e98ac6 [ard:mediathek] Add audio test 2015-07-20 00:05:10 +06:00
Sergey M․
3bf8c316a6 [sportschau] Reimplement in terms of ard extractor 2015-07-20 00:01:22 +06:00
Sergey M․
e37c92ec6d [ard] Extract all formats 2015-07-19 23:59:50 +06:00
fnord
a5dd9a0c5d pbs: fix vague 'Full Episode' titles; prepend name of show
Many videos are given the title 'Full Episode' or 'Episode x', etc;
in the info json 'title' is the episode title (for oneshots this is
'Full Episode') while the program name is 'program':{'title'. This
changes the title from '(episode title)' to
 '(program name) - (episode title)'

The following urls demonstrate useless titles

    http://video.pbs.org/video/2365367186/
    http://video.pbs.org/video/2365519307/
    http://video.pbs.org/video/2365527039/
    http://video.pbs.org/video/2365530605/

Before change

    Full Episode
    Episode 5 | Preview
    Season 3 | Episode 4 Preview
    July 17, 2015

After change

    To Catch A Comet - Full Episode
    The Crimson Field - Episode 5 | Preview
    Last Tango in Halifax - Season 3 | Episode 4 Preview
    Charlie Rose The Week - July 17, 2015
2015-07-19 11:47:58 -05:00
fnord
7a4a945f13 fix/support news.vice.com 2015-07-19 11:33:02 -05:00
Sergey M․
1d18e26eca [francetv] Remove unused import 2015-07-19 21:54:12 +06:00
Sergey M․
ac4b8df5e4 [francetv] Fix embed test 2015-07-19 21:53:54 +06:00
Sergey M․
3bc9fb5889 [francetv] Update jt test 2015-07-19 21:53:09 +06:00
Sergey M․
632cbb8efa [francetv] Fix f4m extraction completely 2015-07-19 21:51:06 +06:00
Sergey M․
789a12aaaf [francetv] Restore support for jt videos 2015-07-19 21:50:25 +06:00
Sergey M․
ecdbe09e10 [francetv] Fix f4m extraction 2015-07-19 21:45:49 +06:00
Zach Bruggeman
1dc31c2786 [appleconnect] Add new extractor (fixes #6189)
Closes #6190.
2015-07-19 13:00:30 +02:00
Jaime Marquínez Ferrándiz
32470bf619 [sportschau] Improve title extraction
The html '<title>' ends with '- sportschau.de', which shouldn't be part of the title.
2015-07-19 11:27:12 +02:00
Jaime Marquínez Ferrándiz
8b61bfd638 [sportschau] skip download in test 2015-07-19 11:27:12 +02:00
slangangular
8a7a208905 [sportschau] Add support for sportschau.de
Closes #6199.
2015-07-19 11:25:51 +02:00
Jaime Marquínez Ferrándiz
0215103e92 [francetv] Remove unused import 2015-07-19 11:13:27 +02:00
Yen Chi Hsuan
c2d1be8981 [iqiyi] Add skip_download to reduce network traffic
MD5 checksums are commented out in
36068ae019, and actual downloads are not
necessary anymore.
2015-07-19 11:20:05 +08:00
Yen Chi Hsuan
4951c9f821 Credit @sceext2 for fixing iQiyi extractor (#6266) 2015-07-19 11:16:29 +08:00
Roland Hieber
726adc43ec [tagesschau] set description=None for empty descriptions 2015-07-19 05:09:29 +02:00
Roland Hieber
3c6ae8b59e [tagesschau] add support for Bericht aus Berlin videos 2015-07-19 04:28:00 +02:00
Sergey M.
605be3f7f8 Merge pull request #6283 from rohieb/patch/fix-tagesschau-tests
[tagesschau] fix test which links to nonexisting video
2015-07-19 08:01:22 +06:00
Roland Hieber
c51bc70e0f [tagesschau] fix test which links to nonexisting video 2015-07-19 03:52:58 +02:00
Roland Hieber
e89d7e3029 [tagesschau] add support for more video types
I found that currently only tagesschau videos are played. There are some
more shows hosted on tagesschau.de (see [0] for example) which are
easily playable by adjusting the regex. So this patch adds support for:
* tagesthemen
* tagesschau vor 20 Jahren
* tagesschau (mit Gebärdensprache)
* nachtmagazin

Note that some videos don't provide a description, so in order for the
tests to succeed, an ExtractorError needs to get caught.

[0]: http://www.tagesschau.de/multimedia/video/videoarchiv2~_date-20150714.html
2015-07-19 03:44:16 +02:00
Yen Chi Hsuan
4b0f45f667 Merge branch 'sceext2-fix-iqiyi-2015-07-17' 2015-07-19 03:57:36 +08:00
Yen Chi Hsuan
36068ae019 [iqiyi] Comment out some MD5 sums
The value is different on Travis CI server
2015-07-19 03:54:59 +08:00
Yen Chi Hsuan
761ee0d827 [iqiyi] Remove unused imports 2015-07-19 02:28:43 +08:00
Yen Chi Hsuan
fb8bc3f818 Merge branch 'fix-iqiyi-2015-07-17' of https://github.com/sceext2/youtube_dl into sceext2-fix-iqiyi-2015-07-17 2015-07-19 02:27:59 +08:00
Sergey M․
826a7da808 [rtlnl] Avoid episodes completely (Closes #6275) 2015-07-19 00:27:23 +06:00
Sergey M․
cbd55ade68 [extractor/generic] Add support for francetv embeds 2015-07-18 22:56:00 +06:00
Sergey M․
5705ee6ef8 [francetv] Fix duration extraction 2015-07-18 22:43:42 +06:00
Sergey M․
3f5c6d0c1b [francetv] Add support for embeds and clean up _VALID_URL 2015-07-18 22:41:34 +06:00
Sergey M․
e58066e244 [vk] Add age restricted video test for reference 2015-07-18 19:25:06 +06:00
Sergey M․
ee48b6a88f [vk] Capture error message 2015-07-18 19:15:20 +06:00
Sergey M․
9ac09ed4de [bliptv] Remove unused import 2015-07-18 19:13:10 +06:00
Sergey M․
22603348aa [compat] Fix _asciire 2015-07-18 18:32:52 +06:00
Sergey M․
fec73daaa3 [vk:uservideos] Revert orderedSet 2015-07-18 18:23:35 +06:00
Sergey M․
c6b68648f4 [bilibili] Show georestriction error 2015-07-18 17:36:46 +06:00
Sergey M․
1ecb5d1d83 [vk] Clarify extractor names 2015-07-18 17:23:33 +06:00
Sergey M․
dc786d3db5 [vk:uservideos] Improve extraction 2015-07-18 17:22:25 +06:00
Sergey M․
74fe23ec35 [extractor/common] Style 2015-07-18 16:35:28 +06:00
Philipp Hagemeister
b0bff54b08 release 2015.07.18 2015-07-18 11:59:45 +02:00
sceext
1b541d8d6e [iqiyi] fix iqiyi (2015-07-17), update the md5 salt (enc_key) to iqiyi latest (2015-07-17) flash player. 2015-07-18 12:34:26 +08:00
Sergey M․
f29ac588dd [nationalgeographics] Fix extraction (Closes #6262) 2015-07-18 06:33:50 +06:00
Sergey M․
0696667734 [twitch:stream] Clarify channel_id reassignment rationale (#6263) 2015-07-18 06:27:45 +06:00
Sergey M․
1793d71db6 [twitch:stream] Fix channel_id in different case (Closes #6263) 2015-07-18 06:18:03 +06:00
Sergey M․
4211e1941b [videomega] Add shortcut to _VALID_URL 2015-07-18 04:27:09 +06:00
Sergey M․
4bdfef5a18 [videomega] Add tests 2015-07-18 04:25:30 +06:00
Sergey M․
8a37f53685 [videomega] Revert iframe URL 2015-07-18 04:25:10 +06:00
Sergey M․
4e1ad6e9a8 [videomega] Extend _VALID_URL and improve extraction (Closes #6260) 2015-07-18 04:13:45 +06:00
Sergey M․
fb10e1aa57 Merge branch 'seamusphelan-sbs_website_changes' 2015-07-18 02:43:44 +06:00
Sergey M․
3c283a381e [sbs] Simplify 2015-07-18 02:43:18 +06:00
Sergey M․
dac4d5be12 Merge branch 'sbs_website_changes' of https://github.com/seamusphelan/youtube-dl into seamusphelan-sbs_website_changes 2015-07-18 02:13:55 +06:00
Sergey M․
530857182d [bliptv] Add test with missing duration 2015-07-18 01:45:54 +06:00
Sergey M․
9441f77faa [bliptv] Use xpath for extraction and fix test 2015-07-18 01:45:36 +06:00
Sergey M․
3cc8b4c327 [compat] Fix missing _asciire on python 2.6 2015-07-18 00:24:39 +06:00
Sergey M․
6b19647d57 [veehd] Use compat_urllib_parse_unquote 2015-07-18 00:04:25 +06:00
Sergey M․
7bd42d0d96 [openfilm] Fix compat_urllib_parse_unquote_plus import 2015-07-17 23:56:27 +06:00
Sergey M․
c60e8cfaf7 [ehow] Simplify 2015-07-17 23:54:38 +06:00
Sergey M․
7fd002c006 [youtube] Use compat_urllib_parse_unquote and compat_urllib_parse_unquote_plus 2015-07-17 23:51:57 +06:00
Sergey M․
db6c50f109 [ynet] Use compat_urllib_parse_unquote_plus 2015-07-17 23:50:52 +06:00
Sergey M․
aa4789d632 [xvideos] Use compat_urllib_parse_unquote 2015-07-17 23:50:25 +06:00
Sergey M․
ee8de13e14 [xtube] Use compat_urllib_parse_unquote 2015-07-17 23:50:02 +06:00
Sergey M․
7dde5f6a8d [xnxx] Use compat_urllib_parse_unquote 2015-07-17 23:49:38 +06:00
Sergey M․
736f003f2e [xbef] Use compat_urllib_parse_unquote 2015-07-17 23:49:14 +06:00
Sergey M․
47af21e8f1 [spankwire] Use compat_urllib_parse_unquote 2015-07-17 23:47:51 +06:00
Sergey M․
605cbef653 [pornhub] Use compat_urllib_parse_unquote and compat_urllib_parse_unquote_plus 2015-07-17 23:47:23 +06:00
Sergey M․
388ad0c05c [playvid] Use compat_urllib_parse_unquote and compat_urllib_parse_unquote_plus 2015-07-17 23:46:33 +06:00
Sergey M․
2ebbb6f1f7 [photobucket] Use compat_urllib_parse_unquote 2015-07-17 23:45:55 +06:00
Sergey M․
d54f1c7477 [openfilm] Use compat_urllib_parse_unquote_plus 2015-07-17 23:45:26 +06:00
Sergey M․
b78f5ec4c3 [odnoklassniki] Use compat_urllib_parse_unquote 2015-07-17 23:45:00 +06:00
Sergey M․
9fd3bf04b7 [myvideo] Use compat_urllib_parse_unquote 2015-07-17 23:44:38 +06:00
Sergey M․
e97bb3de83 [mofosex] Use compat_urllib_parse_unquote 2015-07-17 23:43:36 +06:00
Sergey M․
c2daf8dfa4 [mixcloud] Use compat_urllib_parse_unquote 2015-07-17 23:42:43 +06:00
Sergey M․
09b718c439 [mitele] Use compat_urllib_parse_unquote 2015-07-17 23:42:11 +06:00
Sergey M․
c177bb3a50 [metacafe] Use compat_urllib_parse_unquote 2015-07-17 23:41:47 +06:00
Sergey M․
977a247a06 [malemotion] Use compat_urllib_parse_unquote 2015-07-17 23:41:14 +06:00
Sergey M․
899a3e2f13 [karaoketv] Use compat_urllib_parse_unquote_plus 2015-07-17 23:40:35 +06:00
Sergey M․
8ee4ecb48d [infoq] Use compat_urllib_parse_unquote 2015-07-17 23:39:54 +06:00
Sergey M․
f7e6f7fa23 [extractor/generic] Use compat_urllib_parse_unquote 2015-07-17 23:39:32 +06:00
Sergey M․
1f80e360fc [gamespot] Use compat_urllib_parse_unquote 2015-07-17 23:38:30 +06:00
Sergey M․
d7011316d0 [facebook] Use compat_urllib_parse_unquote 2015-07-17 23:37:56 +06:00
Sergey M․
d3671b344f [ehow] Use compat_urllib_parse_unquote 2015-07-17 23:37:23 +06:00
Sergey M․
a60cccbf9f [crunchyroll] Use compat_urllib_parse_unquote 2015-07-17 23:36:58 +06:00
Sergey M․
3e72f5f10e [ceskatelevize] Use compat_urllib_parse_unquote 2015-07-17 23:36:11 +06:00
Sergey M․
b94b78971c [bet] Use compat_urllib_parse_unquote 2015-07-17 23:35:14 +06:00
Sergey M․
4d08161ac2 [compat] Mention unquote_plus 2015-07-17 23:32:43 +06:00
Sergey M․
8954e48140 [test_compat] Add tests for compat_urllib_parse_unquote_plus 2015-07-17 23:31:23 +06:00
Sergey M․
aa99aa4e85 [compat] Add compat_urllib_parse_unquote_plus 2015-07-17 23:28:34 +06:00
Sergey M․
d79febcd06 [test_compat] Remove redundant test 2015-07-17 23:09:56 +06:00
Sergey M․
13fc7f3a05 Merge branch 'atomicdryad-pr-crashfix_compat_urllib_unquote' 2015-07-17 22:59:17 +06:00
Sergey M․
14309e1ddc [test_compat] Make tests more idiomatic 2015-07-17 22:58:39 +06:00
Sergey M․
5513967926 [compat] Simplify and use latest cpython 3 code 2015-07-17 22:58:13 +06:00
Sergey M.
eacd875f3b Merge pull request #6259 from remitamine/howstuffworks
[howstuffwoks] fix _VALID_URL regex
2015-07-17 22:08:40 +06:00
remitamine
c4fe07c7af match only the test url 2015-07-17 17:07:55 +01:00
remitamine
1186e3f91a do not capture a group 2015-07-17 16:45:53 +01:00
remitamine
f354385bf5 Add test for urls without a number 2015-07-17 16:43:27 +01:00
remitamine
cabe001590 [howstuffwoks] fix _VALID_URL regex 2015-07-17 16:18:33 +01:00
Sergey M․
89f691e141 Merge branch 'pr-crashfix_compat_urllib_unquote' of https://github.com/atomicdryad/youtube-dl into atomicdryad-pr-crashfix_compat_urllib_unquote 2015-07-17 21:15:55 +06:00
fnord
4a63291144 Add tests for compat_urllib_parse_unquote 2015-07-17 09:46:08 -05:00
fnord
593b77064c Don't forget trailing '%' 2015-07-17 09:45:49 -05:00
fnord
9fefc88656 fix TestCompat test_all_present 2015-07-17 07:24:07 -05:00
fnord
a3bfddfa5e bbc.py: correct syntax 2015-07-17 02:47:02 -05:00
fnord
36da48798a handle titles and captions set to '' 2015-07-17 02:27:50 -05:00
fnord
a0f28f90fa remove kebab 2015-07-17 01:50:43 -05:00
fnord
851229a01f remove debugprint 2015-07-17 01:49:55 -05:00
fnord
c9c854cea7 replace old compat_urllib_parse_unquote with backport from python3's function
* required unquote_to_bytes function ported as well
    (uses .decode('hex') instead of dynamically populated _hextobyte global)
  * required implicit conversion to bytes and/or unicode in places due to
    differing type assumptions in p3
2015-07-17 01:31:29 -05:00
Yen Chi Hsuan
a38436e889 [extractor/common] Add 'transform_source' parameter to _extract_f4m_formats() 2015-07-17 12:02:49 +08:00
Sergey M․
23fc384f2c [npo] Compound title 2015-07-17 01:28:52 +06:00
Sergey M․
1540119723 [npo] Remove unused imports 2015-07-17 00:54:05 +06:00
Sergey M․
574f42d79a [vpro] Improve playlist extraction 2015-07-17 00:53:12 +06:00
Sergey M․
536b0700b0 [npo] Allow missing description 2015-07-17 00:40:04 +06:00
Sergey M․
5ba761eb85 [npo] Prefer aflevering_titel over titel 2015-07-17 00:39:22 +06:00
Sergey M․
611ac379bb [vpro] Fix extraction and add support for vpro playlists 2015-07-17 00:34:24 +06:00
Sergey M․
03f32a7ead [wnl] Add extractor for omroepwnl playlists 2015-07-17 00:14:38 +06:00
Sergey M․
50ea2bb20d [npo] Update test 2015-07-16 23:56:57 +06:00
Sergey M․
525daedd5a [npo] Add support for omroepwnl fragments 2015-07-16 23:54:43 +06:00
Sergey M․
e118031ef8 [npo] Extend _VALID_URL to support ntr.nl (Closes #6248) 2015-07-16 23:21:04 +06:00
fnord
45eedbe58c Generic: use compat_urllib_parse_unquote to prevent utf8 mangling
of the entire page in python 2.

-requires- fixed compat_urllib_parse_unquote

example - the following will save with a mangled playlist title,
 instead of the kanji for 'tsunami'. This affects all utf8encoded
 urls as well

youtube-dl -f18 -o '%(playlist_title)s-%(title)s.%(ext)s' \
  61c14c1e3a/tsunami.html
2015-07-15 15:30:47 -05:00
fnord
e37c932fca compat_urllib_parse_unquote: crash fix: only decode valid hex
on python 2 the following has a { "crash_rate": "100%" } of the time
as it tries to parse '" ' as hex.
2015-07-15 15:28:50 -05:00
Sergey M.
5eb778bf4d Merge pull request #6242 from dstftw/f4m-improvements
[f4m] Improvements
2015-07-16 02:23:26 +06:00
Sergey M․
ab9b890b52 [prosiebensat1] Clarify test purpose 2015-07-16 02:23:07 +06:00
Sergey M․
31c746e5dc [extractor/common] Keep going in some media_url is missing 2015-07-16 01:25:33 +06:00
Sergey M․
f01f731107 [prosiebensat1] Use generic f4m manifest extraction 2015-07-16 01:15:47 +06:00
Sergey M․
70f0f5a8ca [extractor/common] Recursively extract child f4m manifests 2015-07-16 01:15:15 +06:00
Sergey M․
cc357c4db8 [extractor/common] Properly handle full URLs 2015-07-16 01:14:52 +06:00
Sergey M․
97f4aecfc1 [extractor/common] Handle malformed f4m manifests 2015-07-16 01:14:08 +06:00
Sergey M․
2af0f87c8b [prosiebensat1] Fix extraction (Closes #6215) 2015-07-15 23:32:52 +06:00
Sergey M.
b062d94eef [README.md] Clarify authentication with .netrc file 2015-07-15 03:08:36 +06:00
Sergey M․
6c1b0c0ed2 [dfb] Extract formats 2015-07-15 00:01:41 +06:00
Sergey M․
ddcdc684e2 [dfb] Extract display_id and modernize 2015-07-14 23:59:21 +06:00
Sergey M․
eae89f92e6 [dfb] Update test 2015-07-14 23:54:13 +06:00
Jaime Marquínez Ferrándiz
01d115b06b [jeuxvideo] Relax _VALID_URL (fixes #6230) 2015-07-14 19:07:40 +02:00
Sergey M․
79057965a8 [udemy] Remove superfluous field 2015-07-14 22:46:26 +06:00
Sergey M․
dcd4d95c8e [udemy] Fix authentication (Closes #6224) 2015-07-14 22:39:41 +06:00
Sergey M․
cf61d96df0 [extractor/common] Add _form_hidden_inputs 2015-07-14 22:38:10 +06:00
Sergey M․
f8da79f828 [extractor/common] Improve _form_hidden_inputs and rename to _hidden_inputs 2015-07-14 22:36:30 +06:00
Jaime Marquínez Ferrándiz
9750e7d70e [postprocessor/ffmpeg] Don't use '[youtube] ...' in messages
Because it can be used for other extractors.
2015-07-14 12:56:32 +02:00
cazulu
50aa2bb6b9 [dailymotion] Extract duration (closes #6221) 2015-07-14 12:50:34 +02:00
Sergey M․
1d1dd597ed [dramafever] Extract srt (Closes #6207) 2015-07-14 01:35:44 +06:00
Yen Chi Hsuan
cfe5537ee5 [myspass] Fix extraction (closes #6206) 2015-07-14 00:23:50 +08:00
Sergey M․
7869eb3fc4 Credit Bernhard Minks for nowtv patch 2015-07-13 22:21:35 +06:00
Sergey M․
6dfa0602f0 [nowtv] Fix extraction (Closes #6169) 2015-07-13 22:11:05 +06:00
Seamus Phelan
75a40b2251 [SBS] fixes due to website changes 2015-07-13 22:35:45 +10:00
Sergey M․
28fb109ed0 [rds] Improve _VALID_URL 2015-07-12 23:45:47 +06:00
Sergey M․
48607afac5 Merge branch 'fstirlitz-rdsca' 2015-07-12 23:36:32 +06:00
Sergey M․
b6ea9ef21a [rds] Improve 2015-07-12 23:35:56 +06:00
Sergey M․
b8dd44baa9 Merge branch 'rdsca' of https://github.com/fstirlitz/youtube-dl into fstirlitz-rdsca 2015-07-12 21:34:32 +06:00
Sergey M․
c4f1fde75b [nrk:tv] Add format id prefixes 2015-07-12 20:42:18 +06:00
Sergey M․
667170e2c7 [nrk:tv] Add support for radio URLs (Closes #6200) 2015-07-12 20:40:00 +06:00
Sergey M․
53429e6551 [onionstudios] Fix extraction 2015-07-12 12:53:15 +06:00
felix
ac8f97f2b3 [rdsca] New extractor 2015-07-11 19:05:17 +02:00
Sergey M․
41c0d2f8cb Credit @aurium for avi recode and postprocessort args (#5942) 2015-07-11 23:00:19 +06:00
Sergey M․
1f3a43dbe6 Merge branch 'aurium-master' 2015-07-11 22:59:18 +06:00
Sergey M․
369e195a44 Handle postprocessor_args similarly to external_downloader_args 2015-07-11 22:43:02 +06:00
Sergey M․
15006fedb9 [postprocessor/ffmpeg] Spread postprocessor args usage on all ffmpeg extractors 2015-07-11 22:42:03 +06:00
Sergey M․
e35b23f54d [postprocessor/common] Improve postprocessor args fetching and clarify doc 2015-07-11 22:41:33 +06:00
Sergey M․
f72b0a6032 Revert xvid to avi and make docs to be similar to existing external downloader option 2015-07-11 22:15:16 +06:00
Sergey M․
ac9ed061ec Merge branch 'master' of https://github.com/aurium/youtube-dl into aurium-master 2015-07-11 21:58:12 +06:00
Sergey M․
d919fa3344 [vk] Handle access denied error 2015-07-11 21:26:03 +06:00
Sergey M․
79913fde35 [vk] Add list id to info_url 2015-07-11 21:23:49 +06:00
Sergey M․
da634d0a8b Merge branch 'dufferzafar-webofstories' 2015-07-11 04:43:55 +06:00
Sergey M․
fac54cb426 [webofstories:playlist] Improve and add test 2015-07-11 04:43:29 +06:00
Sergey M․
3f19b9b7c1 Merge branch 'webofstories' of https://github.com/dufferzafar/youtube-dl into dufferzafar-webofstories 2015-07-11 04:17:54 +06:00
Jaime Marquínez Ferrándiz
86f2541695 Remove unused 're' imports 2015-07-10 22:22:33 +02:00
Yen Chi Hsuan
181c4ccaaa [qqmusic] Add more localized names 2015-07-11 01:21:36 +08:00
Yen Chi Hsuan
ed848087d5 [neteasemusic] Add more localized names 2015-07-11 01:21:18 +08:00
Yen Chi Hsuan
edd66be5be [kuwo] Add more localized names 2015-07-11 01:21:04 +08:00
Yen Chi Hsuan
246995dbc8 [youku] Add localized name 2015-07-11 00:58:13 +08:00
Yen Chi Hsuan
b931fbe5ab [yinyuetai] Add localized name 2015-07-11 00:57:59 +08:00
Yen Chi Hsuan
e014ff015d [yam] Add localized name 2015-07-11 00:53:08 +08:00
Yen Chi Hsuan
4fa5f40232 [xuite] Add localized name 2015-07-11 00:52:56 +08:00
Yen Chi Hsuan
9b15be97aa [udn] Add localized name 2015-07-11 00:52:19 +08:00
Yen Chi Hsuan
a7ada46bd9 [qqmusic] Add localized name 2015-07-11 00:52:07 +08:00
Yen Chi Hsuan
9d16788ad9 [nextmedia] Add localized name 2015-07-11 00:51:55 +08:00
Yen Chi Hsuan
6ce89aecc3 [neteasemusic] Add localized name 2015-07-11 00:51:35 +08:00
Yen Chi Hsuan
963d0ce7e3 [letv] Add localized name 2015-07-11 00:51:26 +08:00
Yen Chi Hsuan
0f08d7f851 [kuwo] Add localized name 2015-07-11 00:51:14 +08:00
Yen Chi Hsuan
44c514eb9c [iqiyi] Add localized name 2015-07-11 00:51:03 +08:00
Yen Chi Hsuan
513cbdda93 [douyutv] Add localized name 2015-07-11 00:50:45 +08:00
Yen Chi Hsuan
e1ba152352 [ctsnews] Add localized name 2015-07-11 00:50:31 +08:00
Yen Chi Hsuan
446e764500 [baidu] Add localized name 2015-07-11 00:49:59 +08:00
Yen Chi Hsuan
901d00caa6 Merge branch 'ping-kuwo' 2015-07-11 00:11:41 +08:00
Yen Chi Hsuan
094790d2c9 [kuwo:song] Give warnings for unavailable optional fields 2015-07-11 00:03:49 +08:00
Sergey M․
1c0163a5cc [gorillavid] Use _form_hidden_inputs 2015-07-10 21:58:49 +06:00
Sergey M․
8fa7e5817a [hostingbulk] Use _form_hidden_inputs 2015-07-10 21:58:24 +06:00
Sergey M․
01b89d5682 [played] Use _form_hidden_inputs 2015-07-10 21:57:29 +06:00
Yen Chi Hsuan
9f01c1a803 [kuwo] Use \d instead of [0-9] 2015-07-10 23:56:51 +08:00
Sergey M․
46f0f50016 [primesharetv] Use _form_hidden_inputs 2015-07-10 21:56:41 +06:00
Sergey M․
b8070dbbd7 [promptfile] Use _form_hidden_inputs 2015-07-10 21:56:14 +06:00
Sergey M․
3b16d803c9 [shared] Use _form_hidden_inputs 2015-07-10 21:55:22 +06:00
Sergey M․
de195c23a6 [vimeo] Use _form_hidden_inputs 2015-07-10 21:54:39 +06:00
Yen Chi Hsuan
d3b8908886 [kuwo] Simpler calls to url_result() 2015-07-10 23:53:48 +08:00
Sergey M․
2688176c77 [vodlocker] Use _form_hidden_inputs 2015-07-10 21:53:38 +06:00
Sergey M․
a5839317aa [vk] Use _form_hidden_inputs when logging in 2015-07-10 21:52:03 +06:00
Sergey M․
a0aab26a41 [twitch] Use _form_hidden_inputs when logging in 2015-07-10 21:49:39 +06:00
Sergey M․
27713812a0 [extractor/common] Add method for extracting form hidden input fields as dict 2015-07-10 21:49:09 +06:00
Yen Chi Hsuan
cf2c5fda4f [kuwo] Use single quotes 2015-07-10 23:48:48 +08:00
Yen Chi Hsuan
a9684c0dbf [kuwo] Add KuwoBaseIE 2015-07-10 23:46:44 +08:00
Sergey M․
c0bf5e1c4d [twitch] Fix non-ASCII logins/passwords on python 2 2015-07-10 21:40:21 +06:00
Yen Chi Hsuan
a31e3e7dcb [kuwo] Regular expression improvements
1. Prevent .+ and .*
2. Use [^>]+ instead of spaces for HTML tags
3. Remove unnecessary trailing parts
2015-07-10 23:38:54 +08:00
Sergey M․
17b41a3337 [twitch] Show reset password request 2015-07-10 21:16:42 +06:00
Sergey M․
89a683ae74 [twitch] Fix error message regex 2015-07-10 21:15:09 +06:00
Sergey M․
008661069b Merge branch 'master' of github.com:rg3/youtube-dl 2015-07-10 20:56:08 +06:00
Sergey M․
9296e92e1c [twitch] Fix login (Closes #6186) 2015-07-10 20:55:53 +06:00
Yen Chi Hsuan
a34af8d066 [kuwo] PEP8 2015-07-10 19:13:52 +08:00
Yen Chi Hsuan
8726e04629 Merge branch 'kuwo' of https://github.com/ping/youtube-dl into ping-kuwo 2015-07-10 19:11:58 +08:00
Yen Chi Hsuan
2a01c940ec Merge branch 'ping-neteasemusic' 2015-07-10 18:59:58 +08:00
Yen Chi Hsuan
4eab60cbd2 [netease:djradio] Use compat_itertools_count 2015-07-10 18:59:12 +08:00
Yen Chi Hsuan
a0e060ac1e [compat] Add compat_itertools_count
'step' parameter is added in Python 2.7
2015-07-10 18:58:12 +08:00
Yen Chi Hsuan
397a8ea96e [neteasemusic] Encoding fixes for Python 2.6 and 3.x 2015-07-10 18:43:38 +08:00
Yen Chi Hsuan
15830339ef [neteasemusic] PEP8 2015-07-10 18:30:23 +08:00
Yen Chi Hsuan
b29280285e Merge branch 'neteasemusic' of https://github.com/ping/youtube-dl into ping-neteasemusic 2015-07-10 18:25:12 +08:00
ping
1633491bff [kuwo] Merge KuwoSingerMusicIE into KuwoSingerIE (missed kuwo.py) 2015-07-10 15:19:07 +08:00
ping
2b0fa1f7dd [kuwo] Merge KuwoSingerMusicIE into KuwoSingerIE 2015-07-10 15:09:12 +08:00
ping
02b386f80a [neteasemusic] Changes after review 2015-07-10 13:29:57 +08:00
Sergey M․
bf20b9c540 [extractor/generic] Add test for myvi embed 2015-07-10 01:15:55 +06:00
Sergey M․
06a12933f3 [pbs] Add support for subtitles (Closes #6184) 2015-07-10 00:58:01 +06:00
Sergey M․
6dd94d3a79 [myvi:embed] Rename to myvi 2015-07-10 00:27:44 +06:00
Sergey M․
f2f89c762a [myvi:embed] Improve _VALID_URL 2015-07-10 00:27:02 +06:00
Sergey M․
e6c2d9ad29 [extractor/generic:myvi] Add support for myvi embeds 2015-07-10 00:25:36 +06:00
Sergey M․
83423254cc [myvi:embed] Extend _VALID_URL 2015-07-10 00:07:26 +06:00
Sergey M․
1c20ddc966 [myvi:embed] Add extractor (Closes #6167) 2015-07-09 23:53:50 +06:00
Sergey M․
675e9f22ea [vimple] Extract spruto player based extractor class 2015-07-09 23:52:03 +06:00
Sergey M․
77c6fb5b24 [youtube] Make further DASH manifests not fatal after succeeded one 2015-07-09 20:48:38 +06:00
Sergey M․
082a0140ef [yinyuetai] Do not capture unused group 2015-07-08 23:40:19 +06:00
Yen Chi Hsuan
9e535ce055 Merge branch 'ping-yinyuetai' 2015-07-09 01:11:26 +08:00
Yen Chi Hsuan
d76dea001b [yinyuetai] Miscellaneous improvements
1. Include all fields in _TEST
2. Use .get() for optional fields
3. Clarify the intention of 'ext' in formats
2015-07-09 01:07:45 +08:00
Yen Chi Hsuan
af0f9b0e95 [yinyuetai] Style 2015-07-09 00:54:37 +08:00
Yen Chi Hsuan
e2082ea942 [yinyuetai] Add test for h5/ part in _VALID_URL 2015-07-09 00:50:32 +08:00
Yen Chi Hsuan
68923e52a3 Merge branch 'yinyuetai' of https://github.com/ping/youtube-dl into ping-yinyuetai 2015-07-09 00:48:23 +08:00
Sergey M․
9281f6d253 [vk] Add test for youtube embed 2015-07-08 20:41:08 +06:00
Sergey M․
4647845679 [vk] Fix youtube extraction 2015-07-08 20:34:50 +06:00
Sergey M․
cf9cf7dd04 [vk] Extend _VALID_URL to handle biqle.ru (Closes #6179) 2015-07-08 20:27:06 +06:00
Yen Chi Hsuan
1316b54956 [clipsyndicate] Use _match_id 2015-07-08 13:43:23 +08:00
Yen Chi Hsuan
cbc1fadd6f [clipsyndicate] Support chic subdomain (fixes #6176) 2015-07-08 13:40:21 +08:00
Philipp Hagemeister
4dd09c9add release 2015.07.07 2015-07-07 10:36:07 +02:00
Yen Chi Hsuan
267dc07e6b [gfycat] Catch errors 2015-07-07 14:22:13 +08:00
Yen Chi Hsuan
d7b4d5dd50 [gfycat] Extract id correctly (fixes #6165) 2015-07-07 14:16:56 +08:00
Sergey M․
7f220b2fac [vk] Catch ownership confirmation request 2015-07-07 00:04:19 +06:00
Sergey M․
275c0423aa [vk] Fix extraction (Closes #6153) 2015-07-07 00:02:34 +06:00
Yen Chi Hsuan
d3ee4bbc5a Merge branch 'ping-qqmusic-format-fix' 2015-07-06 17:55:45 +08:00
Yen Chi Hsuan
85a064861f [qqmusic] Use regex for thumbnails in test cases 2015-07-06 17:54:41 +08:00
Yen Chi Hsuan
d0b436bff2 Merge branch 'qqmusic-format-fix' of https://github.com/ping/youtube-dl into ping-qqmusic-format-fix 2015-07-06 17:24:44 +08:00
Yen Chi Hsuan
92b2f18072 Merge branch 'ping-qqmusic-album-fix' 2015-07-06 17:09:56 +08:00
Yen Chi Hsuan
dfc4eca21f [qqmusic:album] Playlist names are optional 2015-07-06 17:09:17 +08:00
Yen Chi Hsuan
fc7ae675e2 [qqmusic:album] Strip description 2015-07-06 17:08:32 +08:00
Yen Chi Hsuan
804ad79985 Merge branch 'qqmusic-album-fix' of https://github.com/ping/youtube-dl into ping-qqmusic-album-fix 2015-07-06 17:01:59 +08:00
Yen Chi Hsuan
da839880e9 Merge branch 'ping-qqmusic-playlist' 2015-07-06 16:20:46 +08:00
Yen Chi Hsuan
e9d33454b5 [qqmusic:playlist] Playlist names are optional 2015-07-06 16:19:49 +08:00
Yen Chi Hsuan
d80891efc4 Merge branch 'qqmusic-playlist' of https://github.com/ping/youtube-dl into ping-qqmusic-playlist 2015-07-06 16:08:30 +08:00
ping
37c1e4025c [yinyuetai] New extractor for yinyuetai.com 2015-07-06 15:26:49 +08:00
Yen Chi Hsuan
59a83d3e5b [spiegeltv] Skip invalid m3u8 manifests (closes #6157) 2015-07-06 08:41:14 +08:00
Yen Chi Hsuan
13af92fdc4 [common] Add 'fatal' to _extract_m3u8_formats 2015-07-06 08:39:38 +08:00
Sergey M․
0c20ee7d4b [rtlnl] Clarify current adaptive -> flash workaround rationale 2015-07-06 04:16:56 +06:00
Sergey M․
89d42c2c75 [rtlnl] Clarify test 2015-07-06 02:58:02 +06:00
Sergey M․
04611765a4 Merge branch 'corone17-patch-1' 2015-07-05 19:07:51 +06:00
Sergey M․
9dfc4fa1a1 [rtlnl] Add test with encrypted m3u8 streams for reference 2015-07-05 19:07:07 +06:00
Sergey M․
43232d5c14 [rtlnl] Improve 2015-07-05 19:01:07 +06:00
Sergey M․
f7c272d4fa Merge branch 'patch-1' of https://github.com/corone17/youtube-dl into corone17-patch-1 2015-07-05 18:07:39 +06:00
Sergey M․
ede21449c8 [crunchyroll] Fix extraction (Closes #5855, closes #5881) 2015-07-05 06:29:36 +06:00
Sergey M․
d7c9a3e976 Credit @remitamine for snagfilms (#6096) 2015-07-04 17:22:11 +06:00
Philipp Hagemeister
35eb649e9d release 2015.07.04 2015-07-04 09:24:00 +02:00
Sergey M․
e56a4c9e9b [thisamericanlife] Improve and simplify 2015-07-04 05:42:53 +06:00
Eric Wong
95506e37af [thisamericanlife] Remove unnecessary comment 2015-07-04 05:12:28 +06:00
Eric Wong
e41840c522 [thisamericanlife] get info from <meta> tags 2015-07-04 05:12:20 +06:00
Eric Wong
2a46a27e6c [thisamericanlife] Add a new extractor 2015-07-04 05:12:10 +06:00
Sergey M․
0bcdc27653 [dailymotion:cloud] Extend _VALID_URL (Closes #6145) 2015-07-03 22:47:52 +06:00
Sergey M․
ddf0f74de7 [howcast] Fix extraction and modernize 2015-07-03 22:32:56 +06:00
Yen Chi Hsuan
91b21b2334 [infoq] Fix extraction (closes #6141) 2015-07-03 11:54:36 +08:00
Sergey M․
66e568de3b [extractor/generic] Improve kaltura embeds support (Closes #6137) 2015-07-02 21:39:46 +06:00
Sergey M․
f5ca97e393 [npo] Clarify token decryption algorithm source 2015-07-02 20:20:09 +06:00
Yen Chi Hsuan
8d06a62485 [npo] Decrypting token (closes #6136) 2015-07-02 16:47:55 +08:00
Yen Chi Hsuan
93f9420993 [pbs] Add coding declaration
Python 2.x does not work without it.
2015-07-02 13:13:27 +08:00
Yen Chi Hsuan
5b61070c70 [pbs] skip_download for m3u8 test cases 2015-07-02 13:08:48 +08:00
Yen Chi Hsuan
dbe1a93526 [pbs] Fix player URL (closes #6139) 2015-07-02 13:05:43 +08:00
Aurélio A. Heckert
aa5d9a79d6 Simplify postprocessor_args transmission to PP base class
* Remove `extra_cmd_args` transmission from sub to super class.
* Simplify params transmission through `downloader.params`.
2015-07-01 20:12:26 -03:00
Sergey M․
86511ea417 [drtuber] Fix extraction 2015-07-01 21:47:56 +06:00
Aurélio A. Heckert
1866432db7 Rename --pp-params to --postprocessor-args and access value as super class attribute 2015-06-30 16:22:09 -03:00
Sergey M.
33f1f81b8b Merge pull request #6132 from alarig/master
Add support of HTTPS for ina.fr
2015-06-30 20:53:49 +06:00
Sergey M․
9d0b581fea [youtube] Prefer meta for upload date and modernize 2015-06-30 20:52:26 +06:00
alarig
c05724cb18 Add support of HTTPS for ina.fr 2015-06-30 16:47:14 +02:00
Sergey M․
f0714c9f86 [youtube] Speed up upload date regex (#6125) 2015-06-30 01:02:48 +06:00
Sergey M․
cf386750c9 [hentaistigma] Modernize 2015-06-29 22:21:09 +06:00
Sergey M.
54f428f645 Merge pull request #6120 from nawl/master
[hentaistigma] Fix video extractor
2015-06-29 21:14:49 +05:00
Sergey M.
dc2bd20e55 Merge pull request #6098 from dstftw/use-codecs-from-dash-manifest
[youtube] Pick up codecs info from DASH manifest when not set explicitly
2015-06-29 20:58:52 +05:00
Sergey M.
c608ee491f Merge pull request #6097 from dstftw/union-itags-from-multiple-dashmpd
[youtube] Extract formats from multiple DASH manifests (Closes #6093)
2015-06-29 20:58:34 +05:00
nawl
738b926322 [hentaistigma] Fix video extractor 2015-06-28 17:24:00 -06:00
corone17
bea41c7f3f Update rtlnl.py
Better to extract 'http://manifest.us.rtl.nl' from the json, I'd say. And I think it's better to use the default json-url to make it more futureproof.
Succesfully tested with tarball.
2015-06-29 00:59:18 +02:00
Sergey M.
1bbe660dfa Merge pull request #6117 from Kagee/patch-1
NRK now supports / requires HTTPS
2015-06-29 03:15:53 +05:00
Anders Einar Hilden
c4bd188da4 NRK now supports / requires HTTPS
Add s? to regexp to support new urls. Update testcases to use HTTPS.
2015-06-29 00:11:31 +02:00
Sergey M․
5414623791 [extractor/common] Remove superfluous line 2015-06-29 00:49:19 +06:00
Sergey M․
c93d53f5e3 [youtube] Fix likes/dislike extraction 2015-06-29 00:48:06 +06:00
Sergey M․
507683780e Credit @gebn for moviefap 2015-06-28 23:08:05 +06:00
Sergey M․
e8b9ee5e08 Merge branch 'gebn-moviefap' 2015-06-28 23:05:49 +06:00
Sergey M․
d16154d163 [tnaflix] Generalize tnaflix extractors 2015-06-28 23:05:09 +06:00
Sergey M․
c342041fba [extractor/common] Use NO_DEFAULT from utils 2015-06-28 22:56:45 +06:00
Sergey M․
bf42a9906d [utils] Add default value for xpath_text 2015-06-28 22:56:07 +06:00
Sergey M․
9603e8a7d9 [YoutubeDL] Handle None width and height similarly to formats 2015-06-28 22:55:28 +06:00
Sergey M․
c7c040b825 Merge branch 'moviefap' of https://github.com/gebn/youtube-dl into gebn-moviefap 2015-06-28 18:00:49 +06:00
Yen Chi Hsuan
ac0474f89d [twitch:vod] Update _TEST
The original test case is gone
2015-06-28 13:33:09 +08:00
Yen Chi Hsuan
bb512e57dc [twitch:vod] Fix 'Source' format in m3u8 (closes #6115) 2015-06-28 13:33:09 +08:00
George Brighton
db652ea186 [moviefap] Fix flake8 warnings introduced in 1a5fd4e 2015-06-27 23:04:55 +01:00
George Brighton
5a9cc19972 [moviefap] Move flv videos to formats in the metadata 2015-06-27 23:03:06 +01:00
George Brighton
1a5fd4eebc [moviefap] Wrap long lines 2015-06-27 22:32:56 +01:00
George Brighton
8a1b49ff19 [moviefap] Explicitly sort formats to handle possible site changes 2015-06-27 22:28:17 +01:00
George Brighton
b971abe897 [moviefap] Replace call to str() with compat.compat_str() 2015-06-27 21:04:53 +01:00
George Brighton
43b925ce74 [moviefap] Replace calls to find() with util.xpath_text(). 2015-06-27 20:52:12 +01:00
George Brighton
62b742ece3 [moviefap] Remove redundant comments 2015-06-27 20:51:11 +01:00
George Brighton
d16ef949ca [moviefap] Allow non-critical fields to change without breaking extraction 2015-06-27 20:36:46 +01:00
Sergey M․
23e7cba87f [twitter:card] Add extractor (#5239) 2015-06-28 01:22:25 +06:00
George Brighton
a8e6f30d8e [moviefap] Swap and justify tests 2015-06-27 20:16:53 +01:00
George Brighton
9c49410898 [moviefap] Add categories to tests 2015-06-27 20:16:53 +01:00
George Brighton
802d74aa6b [moviefap] Swap test for an alternative non-copyrighted video 2015-06-27 20:16:53 +01:00
George Brighton
71f9e49e67 [moviefap] Fix dictionary comprehension syntax incompatible with Python 2.6 2015-06-27 20:16:53 +01:00
George Brighton
82ea1051b5 [moviefap] Add new extractor 2015-06-27 20:16:53 +01:00
Sergey M․
6c4d20cd6f [downloader/external] Fix externals downloaders specified with extension on Windows 2015-06-28 00:08:52 +06:00
Sergey M․
04c27802c0 [smotri] Add tests for password protected videos 2015-06-27 23:31:27 +06:00
Sergey M․
c3b7202f4f [smotri] Remove non relevant test 2015-06-27 23:03:26 +06:00
Sergey M․
81103ef35d [smotri] Fix password protected video extraction 2015-06-27 23:00:27 +06:00
Sergey M.
0eb5c1c62a Merge pull request #6081 from yan12125/skip_problematic_sites
[planetaplay/quickvid/vube] Skip inaccessible sites
2015-06-27 18:49:29 +05:00
Sergey M․
a9de951744 [snagfilms] More tests 2015-06-27 18:57:01 +06:00
Sergey M․
a42a1bb09d [snagfilms] Capture not available error 2015-06-27 18:54:08 +06:00
Sergey M․
9fbfc9bd4d [snagfilms:embed] Capture geolocation restriction error 2015-06-27 18:50:26 +06:00
Sergey M․
242a998bdc [snagfilms] Add support for shows 2015-06-27 18:40:01 +06:00
Sergey M․
9d1bf70234 Merge branch 'remitamine-snagfilms' 2015-06-27 18:29:16 +06:00
Sergey M․
b8c1cc1a51 [extractor/generic] Add test for snagfilms embeds 2015-06-27 18:28:10 +06:00
Sergey M․
eedd20ef96 [extractor/generic] Add support for snagfilms embeds 2015-06-27 18:26:14 +06:00
Sergey M․
7c197ad96d [snagfilms] Add routine for generic embeds extractions 2015-06-27 18:25:50 +06:00
Sergey M․
654fd03c73 [snagfilms] Improve and simplify 2015-06-27 18:20:42 +06:00
Jaime Marquínez Ferrándiz
cee16e0fa3 [newstube] style: fix alignment 2015-06-27 14:20:33 +02:00
Jaime Marquínez Ferrándiz
73c471e9ef [newstube] Fix GUID extraction (fixes #6109) 2015-06-27 14:18:01 +02:00
Sergey M․
533b99fbf9 Merge branch 'snagfilms' of https://github.com/remitamine/youtube-dl into remitamine-snagfilms 2015-06-27 16:52:51 +06:00
remitamine
f39eb98bab download all pages before start extracting info 2015-06-27 10:55:25 +01:00
Sergey M․
da77d856a1 [youtube] Add test for #6093 2015-06-27 14:55:46 +06:00
Sergey M․
b2575b38e7 [options] Clarify --youtube-skip-dash-manifest 2015-06-27 14:38:41 +06:00
Sergey M․
0a3cf9ad3d [youtube] Skip get_video_info requests when --youtube-skip-dash-manifest is specified 2015-06-27 14:31:18 +06:00
Sergey M․
00334d0de0 [options] Add missing whitespace and split lines 2015-06-27 14:26:51 +06:00
Sergey M․
226b886ca8 [vk] Fix authentication (Closes #6105) 2015-06-27 14:04:55 +06:00
Sergey M․
bc93bdb5bb [youtube] Fix reference before assignment for video_info 2015-06-27 13:19:46 +06:00
Yen Chi Hsuan
af214c3a79 [youtube] More useful messages for georestricted videos (#5716) 2015-06-27 13:15:57 +08:00
Yen Chi Hsuan
4eb10f6621 [utils] Add ISO3166Utils 2015-06-27 13:13:57 +08:00
remitamine
7d7d469025 add support for embed links 2015-06-27 00:13:14 +01:00
remitamine
fd40bdc0be remove unnecessary symbolic name for group 2015-06-26 21:56:15 +01:00
remitamine
7e0480ae0e convert tabs to 4 spaces identation 2015-06-26 21:50:27 +01:00
Sergey M․
d80265ccd6 [youtube] Simplify non-DASH formats exclusion 2015-06-27 02:48:50 +06:00
Sergey M․
1b5a1ae257 [youtube] Pick up codecs info from DASH manifest when not set explicitly 2015-06-27 00:41:26 +06:00
Sergey M․
d8d24a922a [youtube] Extract formats from multiple DASH manifests (Closes #6093)
DASH manifest pointed by dashmpd from the video webpage and one pointed by get_video_info may
be different (namely different itag set) - some itags are missing from DASH manifest pointed by
webpage's dashmpd, some - from DASH manifest pointed by get_video_info's dashmpd).
The general idea is to take a union of itags of both DASH manifests (for example video with such
'manifest behavior' see https://github.com/rg3/youtube-dl/issues/6093).
2015-06-27 00:36:23 +06:00
remitamine
03339b7b5b [snagfilms] Add new extractor 2015-06-26 18:25:43 +01:00
Shadab Zafar
2028c6e03d Added a Playlist Info Extractor for WebOfStories 2015-06-26 21:27:43 +05:30
Sergey M․
2988835af5 [lynda] Fix non-ASCII logins/passwords on python 2 2015-06-26 19:48:23 +06:00
Sergey M․
62cca96b72 [lynda] Fix confirm login request (#6088) 2015-06-26 19:46:42 +06:00
Sergey M․
b4dea075a3 [lynda] Fix login request (Closes #6088) 2015-06-26 19:36:04 +06:00
Sergey M․
533f67d3fa [infoq] Relax _VALID_URL (Closes #6071) 2015-06-25 19:54:44 +06:00
Jaime Marquínez Ferrándiz
906e2f0eac [downloader/external] Add downloader for httpie (closes #6079) 2015-06-25 15:48:04 +02:00
Yen Chi Hsuan
b8091db6b9 [planetaplay/quickvid/vube] Skip inaccessible sites 2015-06-25 16:40:29 +08:00
Yen Chi Hsuan
381c067755 [thesixtyone] Modernize 2015-06-25 16:19:04 +08:00
Yen Chi Hsuan
2182ab5187 [thesixtyone] Fix audio_server
Some of the songs are moved to Amazon AWS
2015-06-25 16:15:13 +08:00
Philipp Hagemeister
a1593a4a0e release 2015.06.25 2015-06-25 07:43:49 +02:00
fnord
aa5740fb61 Merge remote-tracking branch 'origin/master' into pr-bbcnews 2015-06-25 00:34:46 -05:00
fnord
da92eeae42 Fix tests, description formatting 2015-06-25 00:34:26 -05:00
Sergey M.
12e9e8445d [README.md] Fix typo 2015-06-25 01:10:42 +06:00
Sergey M․
8084be78c5 [extractor/generic] Add test for OnionStudios embeds 2015-06-24 23:23:16 +06:00
Sergey M․
1ac1c4c26e [extractor/generic] Add support for OnionStudios embeds (Closes #5841) 2015-06-24 23:19:50 +06:00
Sergey M․
d4f58034f7 [onionstudios] Add generic embed extraction routine 2015-06-24 23:16:33 +06:00
Sergey M․
f843300fe5 [onionstudios] Add extractor 2015-06-24 23:12:13 +06:00
Sergey M․
03b9c94437 [soundcloud:user] Add test 2015-06-24 20:49:22 +06:00
Sergey M․
a219d175c6 [soundcloud] Improve _VALID_URL
Do not takeover soundcloud:user URLs
2015-06-24 20:48:57 +06:00
Sergey M.
3c3b4176bd Merge pull request #5961 from dstftw/force-generic-extractor
Add --force-generic-extractor
2015-06-24 19:10:45 +05:00
Sergey M.
022383139b Merge pull request #6067 from mitsukarenai/master
[noco.tv] Fix issue #6066: title interpreted as integer
2015-06-24 19:10:01 +05:00
Mitsukarenai
1a1251e877 [noco.tv] Fix issue #6066: title interpreted as integer 2015-06-24 10:40:01 +02:00
Yen Chi Hsuan
18b5e1e534 [drbonanza] Fix extraction of videos 2015-06-24 16:02:10 +08:00
Sergey M․
2fece970b8 [extractor/generic] Clarify generic extraction warning 2015-06-24 03:08:24 +06:00
Sergey M․
e20d0c1e69 [brightcove] Use compat_xml_parse_error (Closes #6060) 2015-06-23 21:34:29 +06:00
fnord
a9dcf4a860 Prefer externalId over non-mediaserver-specific hashkey for video id. 2015-06-23 01:08:07 -05:00
Sergey M․
255f5694aa [faz] Extend _VALID_URL (Closes #6050) 2015-06-22 20:11:15 +06:00
Jaime Marquínez Ferrándiz
25701d5a2c [xhamster] pep8: remove trailing ';' 2015-06-22 11:18:52 +02:00
Yen Chi Hsuan
a5158f38a3 [generic/adobetv] Support AdobeTVVideo embeds (#6039) 2015-06-22 15:07:32 +08:00
Sergey M․
c76799c555 [extractor/generic] Add test for xhamster embed 2015-06-21 23:18:28 +06:00
Sergey M․
2bb5b6d0a1 [generic] Add support for xhamster embeds 2015-06-21 23:11:25 +06:00
Sergey M․
0bbba43ed0 [xhamster:embed] Add extractor (Closes #6032) 2015-06-21 23:10:38 +06:00
Yen Chi Hsuan
98ca102441 [sohu] Fix extraction again 2015-06-22 00:59:55 +08:00
Yen Chi Hsuan
3f3308cd75 Revert "[sohu] Update extractor"
This reverts commit 32060c6d6b.
2015-06-21 23:29:40 +08:00
Yen Chi Hsuan
6f96e308d0 [francetvinfo.fr] Support dmcloud embeds (fixes #6034) 2015-06-21 21:31:33 +08:00
Yen Chi Hsuan
756f574e4e [dailymotion/generic] Add DailymotionCloudIE 2015-06-21 21:30:34 +08:00
Yen Chi Hsuan
78294e6a9c [bbccouk] Remove TTML to srt conversion codes
It's broken. See #6038
2015-06-21 19:24:45 +08:00
Yen Chi Hsuan
4e33577173 [utils] Support ttaf1 namespace in TTML
It's found in bbc.co.uk. See #6038
2015-06-21 19:24:39 +08:00
Yen Chi Hsuan
607841af64 [adobetv] Support embeddable videos (closes #6039) 2015-06-21 18:55:26 +08:00
Yen Chi Hsuan
396726244a [utils/ffmpeg] Move ISO 639 related codes to utils 2015-06-21 18:53:17 +08:00
Yen Chi Hsuan
c5895d5dbd [tumblr] Support Vimeo embeds (fixes #5969) 2015-06-21 18:30:38 +08:00
Yen Chi Hsuan
b407e173e4 [vimeo/generic] Move detection logic from GenericIE to VimeoIE 2015-06-21 18:23:58 +08:00
Yen Chi Hsuan
6a745c2c0f [pinkbike] PEP8 2015-06-21 18:23:08 +08:00
ping
2da0cad6ae [neteasemusic] Do proper rounding conversion of millisecond timestamps/durations 2015-06-21 11:44:50 +08:00
ping
af1fa6234e [neteasemusic] Add new extractor for music.163.com 2015-06-21 11:12:31 +08:00
Sergey M․
c9ac7fa909 [imdb] Fix extraction 2015-06-21 04:17:54 +06:00
Yen Chi Hsuan
964afd0689 [xvideos] Support lower-quality formats found on Android
Closes #5968
2015-06-21 03:12:17 +08:00
fnord
2a282a3b5f Unbreak breakage that was broken to test breakage 2015-06-20 11:11:41 -05:00
fnord
7bb23aeca4 rename bbccouk.py -> bbc.py 2015-06-20 11:08:13 -05:00
fnord
de939d89eb Support BBC news in other languages, non-mediaselector videos 2015-06-20 11:04:46 -05:00
fnord
77c975f536 typofix 2015-06-20 08:28:14 -05:00
fnord
75ab0ebcf5 no .get('..',None) 2015-06-20 08:24:02 -05:00
fnord
10273d6e08 toss new stuff into old file 2015-06-20 08:22:13 -05:00
Sergey M․
16d6973f8a [viki] Pass session token around (#6005) 2015-06-20 00:49:28 +06:00
Sergey M․
edcd2d665b Merge branch 'misterhat-pinkbike' 2015-06-20 00:10:27 +06:00
Sergey M․
385c3e5e91 [pinkbike] Improve and simplify 2015-06-20 00:10:08 +06:00
Sergey M․
c8e337450b Merge branch 'pinkbike' of https://github.com/misterhat/youtube-dl into misterhat-pinkbike 2015-06-19 23:00:00 +06:00
Sergey M․
10464af5d1 [dramafever:series] Fix extraction while authenticated 2015-06-19 22:02:07 +06:00
Sergey M․
cbcd1a5474 [dramafever] Add support for authentication (Closes #6017) 2015-06-19 21:57:31 +06:00
Sergey M․
c9bebed294 [youtube] Add itag 59 and 78 (Closes #5979) 2015-06-19 20:52:44 +06:00
fnord
d5552a3477 bbcnews: Switch to parse_duration, revert change to docs/supportedsites.md 2015-06-19 06:25:50 -05:00
fnord
a8b081a052 BBCNewsIE: eliminate redundant function.
BBCCoUkIE._download_media_selector: use class variable instead of
hardcoded string for mediaselector_url template.
2015-06-19 01:52:25 -05:00
fnord
9e96dc8b35 Support BBC News (bbc.com/news) 2015-06-19 01:39:30 -05:00
Sergey M․
360075e28a [pornhub] Extend _VALID_URL (Closes #6019) 2015-06-18 22:26:17 +06:00
Sergey M․
accf79b107 [viki] Add support for authentication (Closes #6005) 2015-06-18 21:17:26 +06:00
ping
4d58b24c15 [qqmusic] Use _check_formats instead 2015-06-18 23:09:04 +08:00
Yen Chi Hsuan
c33a8639a7 [postprocessor/embedthumbnail] Skipping if thumbnail file missing
Fixes #5996
2015-06-18 22:05:02 +08:00
Yen Chi Hsuan
25fa8d66e6 [bbccouk] Fix description (closes #6006) 2015-06-18 22:00:13 +08:00
Yen Chi Hsuan
974a6146fe [bbccouk] Fix error handling 2015-06-18 21:40:45 +08:00
ping
0392ac98d2 [qqmusic] Fix code formatting 2015-06-18 21:13:03 +08:00
ping
5e3915cbe3 [qqmusic] Fix song extraction when certain formats are unavailable 2015-06-18 21:06:25 +08:00
ping
29b809de68 [qqmusic] Fix album extraction 2015-06-18 15:52:04 +08:00
ping
8f73e89ca0 [kuwo] New extractor for kuwo.cn 2015-06-18 14:15:54 +08:00
ping
0d0d5d3717 [qqmusic] Add support for playlists 2015-06-18 13:59:37 +08:00
Sergey M․
a69e8bfdd9 [lifenews] Fix tests 2015-06-17 20:29:15 +06:00
Sergey M․
062a3fdf36 [lifenews] Modernize 2015-06-17 20:28:47 +06:00
Sergey M․
028a33d7f2 [lifenews] Fix extraction 2015-06-17 20:27:38 +06:00
Aurélio A. Heckert
14835de9fb Use shlex.split for --pp-params and update related docs. 2015-06-16 18:10:31 -03:00
Sergey M․
447053668f [spankwire] Fix extraction 2015-06-16 21:19:18 +06:00
Jaime Marquínez Ferrándiz
f3aecb27a4 [youku] Simplify a bit the 'entries' construction
Mainly avoid having to use an index.
2015-06-16 14:41:52 +02:00
Sergey M․
7f0172b3e5 Credit @jackyzy823 for iqiyi 2015-06-15 22:29:41 +06:00
Yen Chi Hsuan
79fe954d79 Merge branch 'PeterDing-youku' 2015-06-16 00:20:52 +08:00
Yen Chi Hsuan
0501bfa159 [YoutubeDL] Youku extractor now uses the standard format selection 2015-06-16 00:15:30 +08:00
Yen Chi Hsuan
a155b7e76c [youku] Coding style 2015-06-16 00:15:09 +08:00
Yen Chi Hsuan
5228b756af [youku] Add cn_verification_proxy support and add a georestricted test case 2015-06-16 00:06:23 +08:00
Yen Chi Hsuan
04e7596680 [youku] Better error handling 2015-06-15 23:54:55 +08:00
Yen Chi Hsuan
f1e66cb2eb [youku] Change video_id and add a multipart test case 2015-06-15 23:46:07 +08:00
Sergey M․
4fd35ee072 [safari] Add test for #5985 2015-06-15 21:36:30 +06:00
Yen Chi Hsuan
ee69799262 [youku] Add a v.swf test case 2015-06-15 23:36:28 +08:00
Sergey M.
636a9637f4 Merge pull request #5985 from zx8/master
[safari] make url regex more lenient
2015-06-15 20:31:47 +05:00
Yen Chi Hsuan
9383e66f94 [youku] Use _match_id 2015-06-15 23:31:30 +08:00
Sergey M․
99e6833c85 [francetv] Update f4m manifest token URL (Closes #5981, Closes #5989) 2015-06-15 21:30:27 +06:00
Yen Chi Hsuan
c203be3fb4 [youku] Better handling for Python 2/3 compatibility 2015-06-15 23:28:59 +08:00
Sergey M․
02175a7986 [youtube:search] Fix search query (Closes #5988) 2015-06-15 21:01:26 +06:00
Sergey M․
8117df4cd9 [vk] Extract view count 2015-06-15 20:55:25 +06:00
Sergey M․
7c7dd9dc7f [vk] Fix upload date extraction 2015-06-15 20:47:01 +06:00
Sergey M․
054932f403 [vk] Fix extraction (Closes #5987) 2015-06-15 20:46:10 +06:00
Yen Chi Hsuan
aed473ccf9 [youku] PEP8 2015-06-15 22:41:24 +08:00
Yen Chi Hsuan
8268e94cd4 Merge branch 'youku' of https://github.com/PeterDing/youtube-dl into PeterDing-youku 2015-06-15 22:36:44 +08:00
Yen Chi Hsuan
4af98ecdfb [vbox7] Fix extraction (fixes #5967) 2015-06-15 18:49:27 +08:00
zx8
2a0fcf6113 [safari] make url regex more lenient 2015-06-15 00:27:43 +01:00
Sergey M․
0072afca8e [YoutubeDL] Remove force_generic_extractor arg from process_ie_result 2015-06-13 02:21:29 +06:00
Sergey M․
61aa5ba36e [YoutubeDL] Remove global state for force_generic_extractor flag in favor of passing argument 2015-06-13 02:05:21 +06:00
Sergey M․
9f4323252a [YoutubeDL] Fix for multiple URLs 2015-06-12 21:56:50 +06:00
Sergey M․
d22dec74ff Add --force-generic-extractor
For some extractors that are hard to workout a good _VALID_URL we use very vague and unrestrictive ones,
e.g. just allowing anything after hostname and capturing part of URL as id.
If some of these extractors happen to have an video embed of some different hoster or platform
and this scenario was not handled in extractor itself we end up with inability to download this embed
until extractor is fixed to support embed of this kind.
Forcing downloader to use the generic extractor can be a neat temporary solution for this problem.
Example: FiveTV extractor with Tvigle embed - http://www.5-tv.ru/rabota/broadcasts/48/
2015-06-12 19:20:12 +06:00
Yen Chi Hsuan
8a1a26ce4c [youtube] Add a test for the DASH segment downloader 2015-06-10 14:47:02 +08:00
Yen Chi Hsuan
5bf3276e8d [downloader/dash] Add testing facility 2015-06-10 14:45:54 +08:00
Yen Chi Hsuan
93dfcb9357 [downloader/dash] Do not pollute ``self`` 2015-06-10 13:44:54 +08:00
Yen Chi Hsuan
0c8662d2b6 [youtube] Fix a TypeError caused by 4da31bd566 2015-06-10 13:40:41 +08:00
Aurélio A. Heckert
d84f1d14b5 Adds support for XviD output with extra parametrization
As the "LG Time Machine" (a (not so) smart TV) has a limitation for video dimensions (as for codecs), I take to implement an extra parameter `--pp-params` where we can send extra parameterization for the video converter (post-processor).

Example:
```
$ youtube-dl --recode-video=xvid --pp-params='-s 720x480' -c https://www.youtube.com/watch?v=BE7Qoe2ZiXE
```
That works fine on a 4yo LG Time Machine.

Closes #5733
2015-06-09 22:08:16 -03:00
Yen Chi Hsuan
4da31bd566 [youtube] Fix a FutureWarning from xml.etree.ElementTree 2015-06-06 22:22:26 +08:00
Yen Chi Hsuan
423d2be5f8 [downloader/dash] Rename the protocol
'http_dash_segments' looks more like a protocol name than
'dash_segments'
2015-06-04 22:27:29 +08:00
Yen Chi Hsuan
453a1617aa [downloader/dash] Reorder imports 2015-06-04 22:12:05 +08:00
Yen Chi Hsuan
b9258c6178 [YoutubeDL] Change how DashSegmentsFD is selected 2015-06-04 22:05:33 +08:00
Yen Chi Hsuan
6800d3372f [YoutubeDL] Support DASH manifest downloading 2015-06-03 23:10:18 +08:00
PeterDing
08f7db20c1 [youku] change format_id 2015-05-30 10:03:32 +08:00
PeterDing
1498940b10 [youku] compare bytes and str for compatible; use compat_urllib_parse for making video_url 2015-05-29 10:13:09 +08:00
PeterDing
ca45246627 [youku] compatible for python > 3.3 or > 2.7 2015-05-28 21:04:58 +08:00
PeterDing
f9355dc989 [youku] update youku 2015-05-28 17:00:09 +08:00
Mister Hat
680f9744c4 [pinkbike] used proper conversion methods 2015-05-24 16:45:10 -05:00
Mister Hat
2c935c0c72 [pinkbike] converted duration to int 2015-05-24 16:30:03 -05:00
Mister Hat
7198063d96 [pinkbike] new extractor 2015-05-24 15:26:59 -05:00
Behrooz
a650110ba7 remove print 2015-05-08 04:32:08 +02:00
Behrooz
54b31d149e Ir90Tv Add new extractor 2015-05-08 02:55:01 +02:00
Behrooz
a745475808 Ir90Tv Add new extractor 2015-05-08 02:50:46 +02:00
Tjark Saul
f11554092b [Lecture2Go] Add new extractor 2015-04-17 09:21:54 +02:00
142 changed files with 6064 additions and 1802 deletions

View File

@@ -127,3 +127,11 @@ Julian Richen
Ping O.
Mister Hat
Peter Ding
jackyzy823
George Brighton
Remita Amine
Aurélio A. Heckert
Bernhard Minks
sceext
Zach Bruggeman
Tjark Saul

View File

@@ -54,6 +54,7 @@ which means you can modify it, redistribute it or use it however you like.
--dump-user-agent Display the current browser identification
--list-extractors List all supported extractors
--extractor-descriptions Output descriptions of all supported extractors
--force-generic-extractor Force extraction to use the generic extractor
--default-search PREFIX Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple".
Use the value "auto" to let youtube-dl guess ("auto_warning" to emit a warning when guessing). "error" just throws an error. The
default value "fixup_error" repairs broken URLs, but emits an error if this is not possible instead of searching.
@@ -74,7 +75,7 @@ which means you can modify it, redistribute it or use it however you like.
## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1)
--playlist-end NUMBER Playlist video to end at (default is last)
--playlist-items ITEM_SPEC Playlist video items to download. Specify indices of the videos in the playlist seperated by commas like: "--playlist-items 1,2,5,8"
--playlist-items ITEM_SPEC Playlist video items to download. Specify indices of the videos in the playlist separated by commas like: "--playlist-items 1,2,5,8"
if you want to download videos indexed 1, 2, 5, 8 in the playlist. You can specify range: "--playlist-items 1-3,7,10-13", it will
download the videos at index 1, 2, 3, 7, 10, 11, 12 and 13.
--match-title REGEX Download only matching titles (regex or caseless sub-string)
@@ -107,7 +108,7 @@ which means you can modify it, redistribute it or use it however you like.
--playlist-reverse Download playlist videos in reverse order
--xattr-set-filesize Set file xattribute ytdl.filesize with expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of ffmpeg (experimental)
--external-downloader COMMAND Use the specified external downloader. Currently supports aria2c,curl,wget
--external-downloader COMMAND Use the specified external downloader. Currently supports aria2c,curl,httpie,wget
--external-downloader-args ARGS Give these arguments to the external downloader
## Filesystem Options:
@@ -189,8 +190,8 @@ which means you can modify it, redistribute it or use it however you like.
--all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific one is requested
-F, --list-formats List all available formats
--youtube-skip-dash-manifest Do not download the DASH manifest on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv.Ignored if no
--youtube-skip-dash-manifest Do not download the DASH manifests and related data on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv. Ignored if no
merge is required
## Subtitle Options:
@@ -213,7 +214,8 @@ which means you can modify it, redistribute it or use it however you like.
--audio-format FORMAT Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "best" by default
--audio-quality QUALITY Specify ffmpeg/avconv audio quality, insert a value between 0 (better) and 9 (worse) for VBR or a specific bitrate like 128K (default
5)
--recode-video FORMAT Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv)
--recode-video FORMAT Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv|avi)
--postprocessor-args ARGS Give these arguments to the postprocessor
-k, --keep-video Keep the video file on disk after the post-processing; the video is erased by default
--no-post-overwrites Do not overwrite post-processed files; the post-processed files are overwritten by default
--embed-subs Embed subtitles in the video (only for mkv and mp4 videos)
@@ -236,6 +238,26 @@ which means you can modify it, redistribute it or use it however you like.
You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<user name>\youtube-dl.conf`.
### Authentication with `.netrc` file ###
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in shell command history. You can achieve this using [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
```
touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc
```
After that you can add credentials for extractor in the following format, where *extractor* is the name of extractor in lowercase:
```
machine <extractor> login <login> password <password>
```
For example:
```
machine youtube login myaccount@gmail.com password my_youtube_password
machine twitch login my_twitch_account_name password my_twitch_password
```
To activate authentication with `.netrc` file you should pass `--netrc` to youtube-dl or to place it in [configuration file](#configuration).
On Windows you may also need to setup `%HOME%` environment variable manually.
# OUTPUT TEMPLATE
The `-o` option allows users to indicate a template for the output file names. The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parenthesis, followed by a lowercase S. Allowed names are:
@@ -379,7 +401,7 @@ In February 2015, the new YouTube player contained a character sequence in a str
### HTTP Error 429: Too Many Requests or 402: Payment Required
These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--network-address` options](#network-options) to select another IP address.
These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
### SyntaxError: Non-ASCII character ###

View File

@@ -17,6 +17,7 @@
- **AcademicEarth:Course**
- **AddAnime**
- **AdobeTV**
- **AdobeTVVideo**
- **AdultSwim**
- **Aftenposten**
- **Aftonbladet**
@@ -27,7 +28,8 @@
- **anitube.se**
- **AnySex**
- **Aparat**
- **AppleDaily**
- **AppleConnect**
- **AppleDaily**: 臺灣蘋果日報
- **AppleTrailers**
- **archive.org**: archive.org videos
- **ARD**
@@ -44,11 +46,12 @@
- **audiomack**
- **audiomack:album**
- **Azubu**
- **BaiduVideo**
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
- **Bandcamp**
- **Bandcamp:album**
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
- **BeatportPro**
- **Beeg**
@@ -105,11 +108,12 @@
- **Crunchyroll**
- **crunchyroll:playlist**
- **CSpan**: C-SPAN
- **CtsNews**
- **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr**
- **dailymotion**
- **dailymotion:playlist**
- **dailymotion:user**
- **DailymotionCloud**
- **daum.net**
- **DBTV**
- **DctpTv**
@@ -119,7 +123,7 @@
- **Discovery**
- **divxstage**: DivxStage
- **Dotsub**
- **DouyuTV**
- **DouyuTV**: 斗鱼
- **dramafever**
- **dramafever:series**
- **DRBonanza**
@@ -220,7 +224,8 @@
- **instagram:user**: Instagram user profile
- **InternetVideoArchive**
- **IPrima**
- **iqiyi**
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
- **Izlesene**
@@ -241,9 +246,16 @@
- **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью
- **Ku6**
- **kuwo:album**: 酷我音乐 - 专辑
- **kuwo:category**: 酷我音乐 - 分类
- **kuwo:chart**: 酷我音乐 - 排行榜
- **kuwo:mv**: 酷我音乐 - MV
- **kuwo:singer**: 酷我音乐 - 歌手
- **kuwo:song**: 酷我音乐
- **la7.tv**
- **Laola1Tv**
- **Letv**
- **Lecture2Go**
- **Letv**: 乐视网
- **LetvPlaylist**
- **LetvTv**
- **Libsyn**
@@ -281,6 +293,7 @@
- **Motherless**
- **Motorsport**: motorsport.com
- **MovieClips**
- **MovieFap**
- **Moviezine**
- **movshare**: MovShare
- **MPORA**
@@ -294,6 +307,7 @@
- **MySpace**
- **MySpace:album**
- **MySpass**
- **Myvi**
- **myvideo**
- **MyVidster**
- **N-JOY**
@@ -309,11 +323,18 @@
- **NDTV**
- **NerdCubedFeed**
- **Nerdist**
- **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV
- **netease:playlist**: 网易云音乐 - 歌单
- **netease:program**: 网易云音乐 - 电台节目
- **netease:singer**: 网易云音乐 - 歌手
- **netease:song**: 网易云音乐
- **Netzkino**
- **Newgrounds**
- **Newstube**
- **NextMedia**
- **NextMediaActionNews**
- **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nfb**: National Film Board of Canada
- **nfl.com**
- **nhl.com**
@@ -329,13 +350,14 @@
- **Nowness**
- **NowTV**
- **nowvideo**: NowVideo
- **npo.nl**
- **npo**: npo.nl and ntr.nl
- **npo**: npo.nl and ntr.nl
- **npo.nl:live**
- **npo.nl:radio**
- **npo.nl:radio:fragment**
- **NRK**
- **NRKPlaylist**
- **NRKTV**
- **NRKTV**: NRK TV and NRK Radio
- **ntv.ru**
- **Nuvid**
- **NYTimes**
@@ -344,6 +366,7 @@
- **Odnoklassniki**
- **OktoberfestTV**
- **on.aol.com**
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **OpenFilm**
@@ -357,6 +380,7 @@
- **PhilharmonieDeParis**: Philharmonie de Paris
- **Phoenix**
- **Photobucket**
- **Pinkbike**
- **Pladform**
- **PlanetaPlay**
- **play.fm**
@@ -377,10 +401,11 @@
- **prosiebensat1**: ProSiebenSat.1 Digital
- **Puls4**
- **Pyvideo**
- **qqmusic**
- **qqmusic:album**
- **qqmusic:singer**
- **qqmusic:toplist**
- **qqmusic**: QQ音乐
- **qqmusic:album**: QQ音乐 - 专辑
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuickVid**
- **R7**
- **radio.de**
@@ -389,6 +414,7 @@
- **RadioJavan**
- **Rai**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedTube**
- **Restudy**
- **ReverbNation**
@@ -436,6 +462,8 @@
- **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos
- **SnagFilms**
- **SnagFilmsEmbed**
- **Snotr**
- **Sohu**
- **soompi**
@@ -462,6 +490,7 @@
- **SportBox**
- **SportBoxEmbed**
- **SportDeutschland**
- **Sportschau**
- **Srf**
- **SRMediathek**: Saarländischer Rundfunk
- **SSA**
@@ -487,7 +516,6 @@
- **TechTalks**
- **techtv.mit.edu**
- **ted**
- **tegenlicht.vpro.nl**
- **TeleBruxelles**
- **telecinco.es**
- **TeleMB**
@@ -498,6 +526,7 @@
- **TheOnion**
- **ThePlatform**
- **TheSixtyOne**
- **ThisAmericanLife**
- **ThisAV**
- **THVideo**
- **THVideoPlaylist**
@@ -538,10 +567,11 @@
- **twitch:stream**
- **twitch:video**
- **twitch:vod**
- **TwitterCard**
- **Ubu**
- **udemy**
- **udemy:course**
- **UDNEmbed**
- **UDNEmbed**: 聯合影音
- **Ultimedia**
- **Unistra**
- **Urort**: NRK P3 Urørt
@@ -586,8 +616,8 @@
- **Vimple**: Vimple - one-click video hosting
- **Vine**
- **vine:user**
- **vk.com**
- **vk.com:user-videos**: vk.com:All of a user's videos
- **vk**: VK
- **vk:uservideos**: VK - User's Videos
- **Vodlocker**
- **VoiceRepublic**
- **Vporn**
@@ -603,32 +633,36 @@
- **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus
- **WebOfStories**
- **WebOfStoriesPlaylist**
- **Weibo**
- **Wimp**
- **Wistia**
- **WNL**
- **WorldStarHipHop**
- **wrzuta.pl**
- **WSJ**: Wall Street Journal
- **XBef**
- **XboxClips**
- **XHamster**
- **XHamsterEmbed**
- **XMinus**
- **XNXX**
- **Xstream**
- **XTube**
- **XTubeUser**: XTube user profile
- **Xuite**
- **Xuite**: 隨意窩Xuite影音
- **XVideos**
- **XXXYMovies**
- **Yahoo**: Yahoo screen and movies
- **Yam**
- **Yam**: 蕃薯藤yam天空部落
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек
- **YesJapan**
- **yinyuetai:video**: 音悦Tai
- **Ynet**
- **YouJizz**
- **Youku**
- **youku**: 优酷
- **YouPorn**
- **YourUpload**
- **youtube**: YouTube.com

View File

@@ -14,6 +14,8 @@ from youtube_dl.utils import get_filesystem_encoding
from youtube_dl.compat import (
compat_getenv,
compat_expanduser,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
)
@@ -42,5 +44,28 @@ class TestCompat(unittest.TestCase):
dir(youtube_dl.compat))) - set(['unicode_literals'])
self.assertEqual(all_names, sorted(present_names))
def test_compat_urllib_parse_unquote(self):
self.assertEqual(compat_urllib_parse_unquote('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote('%7e/abc+def'), '~/abc+def')
self.assertEqual(compat_urllib_parse_unquote(''), '')
self.assertEqual(compat_urllib_parse_unquote('%'), '%')
self.assertEqual(compat_urllib_parse_unquote('%%'), '%%')
self.assertEqual(compat_urllib_parse_unquote('%%%'), '%%%')
self.assertEqual(compat_urllib_parse_unquote('%2F'), '/')
self.assertEqual(compat_urllib_parse_unquote('%2f'), '/')
self.assertEqual(compat_urllib_parse_unquote('%E6%B4%A5%E6%B3%A2'), '津波')
self.assertEqual(
compat_urllib_parse_unquote('''<meta property="og:description" content="%E2%96%81%E2%96%82%E2%96%83%E2%96%84%25%E2%96%85%E2%96%86%E2%96%87%E2%96%88" />
%<a href="https://ar.wikipedia.org/wiki/%D8%AA%D8%B3%D9%88%D9%86%D8%A7%D9%85%D9%8A">%a'''),
'''<meta property="og:description" content="▁▂▃▄%▅▆▇█" />
%<a href="https://ar.wikipedia.org/wiki/تسونامي">%a''')
self.assertEqual(
compat_urllib_parse_unquote('''%28%5E%E2%97%A3_%E2%97%A2%5E%29%E3%81%A3%EF%B8%BB%E3%83%87%E2%95%90%E4%B8%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%86%B6%I%Break%25Things%'''),
'''(^◣_◢^)っ︻デ═一 ⇀ ⇀ ⇀ ⇀ ⇀ ↶%I%Break%Things%''')
def test_compat_urllib_parse_unquote_plus(self):
self.assertEqual(compat_urllib_parse_unquote_plus('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote_plus('%7e/abc+def'), '~/abc def')
if __name__ == '__main__':
unittest.main()

View File

@@ -324,6 +324,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('02:03:04'), 7384)
self.assertEqual(parse_duration('01:02:03:04'), 93784)
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
self.assertEqual(parse_duration('87 Min.'), 5220)
def test_fix_xml_ampersands(self):
self.assertEqual(

View File

@@ -139,6 +139,7 @@ class YoutubeDL(object):
outtmpl: Template for output names.
restrictfilenames: Do not allow "&" and spaces in file names
ignoreerrors: Do not stop on download errors.
force_generic_extractor: Force downloader to use the generic extractor
nooverwrites: Prevent overwriting files.
playliststart: Playlist item to start at.
playlistend: Playlist item to end at.
@@ -261,6 +262,8 @@ class YoutubeDL(object):
The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,
otherwise prefer avconv.
postprocessor_args: A list of additional command-line arguments for the
postprocessor.
"""
params = None
@@ -626,13 +629,16 @@ class YoutubeDL(object):
info_dict.setdefault(key, value)
def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True):
process=True, force_generic_extractor=False):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
if not ie_key and force_generic_extractor:
ie_key = 'Generic'
if ie_key:
ies = [self.get_info_extractor(ie_key)]
else:
@@ -1004,7 +1010,7 @@ class YoutubeDL(object):
t.get('preference'), t.get('width'), t.get('height'),
t.get('id'), t.get('url')))
for i, t in enumerate(thumbnails):
if 'width' in t and 'height' in t:
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None:
t['id'] = '%d' % i
@@ -1033,12 +1039,6 @@ class YoutubeDL(object):
info_dict['id'], info_dict.get('subtitles'),
info_dict.get('automatic_captions'))
# This extractors handle format selection themselves
if info_dict['extractor'] in ['Youku']:
if download:
self.process_info(info_dict)
return info_dict
# We now pick which formats have to be downloaded
if info_dict.get('formats') is None:
# There's only one format available
@@ -1104,7 +1104,8 @@ class YoutubeDL(object):
if req_format is None:
req_format_list = []
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
info_dict['extractor'] in ['youtube', 'ted']):
info_dict['extractor'] in ['youtube', 'ted'] and
not info_dict.get('is_live')):
merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge():
req_format_list.append('bestvideo+bestaudio')
@@ -1499,7 +1500,8 @@ class YoutubeDL(object):
for url in url_list:
try:
# It also downloads the videos
res = self.extract_info(url)
res = self.extract_info(
url, force_generic_extractor=self.params.get('force_generic_extractor', False))
except UnavailableVideoError:
self.report_error('unable to download video')
except MaxDownloadsReached:

View File

@@ -169,7 +169,7 @@ def _real_main(argv=None):
if not opts.audioquality.isdigit():
parser.error('invalid audio quality specified')
if opts.recodevideo is not None:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv']:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv', 'avi']:
parser.error('invalid video recode format specified')
if opts.convertsubtitles is not None:
if opts.convertsubtitles not in ['srt', 'vtt', 'ass']:
@@ -263,6 +263,9 @@ def _real_main(argv=None):
external_downloader_args = None
if opts.external_downloader_args:
external_downloader_args = shlex.split(opts.external_downloader_args)
postprocessor_args = None
if opts.postprocessor_args:
postprocessor_args = shlex.split(opts.postprocessor_args)
match_filter = (
None if opts.match_filter is None
else match_filter_func(opts.match_filter))
@@ -293,6 +296,7 @@ def _real_main(argv=None):
'autonumber_size': opts.autonumber_size,
'restrictfilenames': opts.restrictfilenames,
'ignoreerrors': opts.ignoreerrors,
'force_generic_extractor': opts.force_generic_extractor,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
'retries': opts_retries,
@@ -366,6 +370,7 @@ def _real_main(argv=None):
'ffmpeg_location': opts.ffmpeg_location,
'hls_prefer_native': opts.hls_prefer_native,
'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy,
}

View File

@@ -9,6 +9,7 @@ import shutil
import socket
import subprocess
import sys
import itertools
try:
@@ -74,42 +75,74 @@ except ImportError:
import BaseHTTPServer as compat_http_server
try:
from urllib.parse import unquote_to_bytes as compat_urllib_parse_unquote_to_bytes
from urllib.parse import unquote as compat_urllib_parse_unquote
except ImportError:
def compat_urllib_parse_unquote(string, encoding='utf-8', errors='replace'):
if string == '':
from urllib.parse import unquote_plus as compat_urllib_parse_unquote_plus
except ImportError: # Python 2
_asciire = (compat_urllib_parse._asciire if hasattr(compat_urllib_parse, '_asciire')
else re.compile('([\x00-\x7f]+)'))
# HACK: The following are the correct unquote_to_bytes, unquote and unquote_plus
# implementations from cpython 3.4.3's stdlib. Python 2's version
# is apparently broken (see https://github.com/rg3/youtube-dl/pull/6244)
def compat_urllib_parse_unquote_to_bytes(string):
"""unquote_to_bytes('abc%20def') -> b'abc def'."""
# Note: strings are encoded as UTF-8. This is only an issue if it contains
# unescaped non-ASCII characters, which URIs should not.
if not string:
# Is it a string-like object?
string.split
return b''
if isinstance(string, unicode):
string = string.encode('utf-8')
bits = string.split(b'%')
if len(bits) == 1:
return string
res = string.split('%')
if len(res) == 1:
res = [bits[0]]
append = res.append
for item in bits[1:]:
try:
append(compat_urllib_parse._hextochr[item[:2]])
append(item[2:])
except KeyError:
append(b'%')
append(item)
return b''.join(res)
def compat_urllib_parse_unquote(string, encoding='utf-8', errors='replace'):
"""Replace %xx escapes by their single-character equivalent. The optional
encoding and errors parameters specify how to decode percent-encoded
sequences into Unicode characters, as accepted by the bytes.decode()
method.
By default, percent-encoded sequences are decoded with UTF-8, and invalid
sequences are replaced by a placeholder character.
unquote('abc%20def') -> 'abc def'.
"""
if '%' not in string:
string.split
return string
if encoding is None:
encoding = 'utf-8'
if errors is None:
errors = 'replace'
# pct_sequence: contiguous sequence of percent-encoded bytes, decoded
pct_sequence = b''
string = res[0]
for item in res[1:]:
try:
if not item:
raise ValueError
pct_sequence += item[:2].decode('hex')
rest = item[2:]
if not rest:
# This segment was just a single percent-encoded character.
# May be part of a sequence of code units, so delay decoding.
# (Stored in pct_sequence).
continue
except ValueError:
rest = '%' + item
# Encountered non-percent-encoded characters. Flush the current
# pct_sequence.
string += pct_sequence.decode(encoding, errors) + rest
pct_sequence = b''
if pct_sequence:
# Flush the final pct_sequence
string += pct_sequence.decode(encoding, errors)
return string
bits = _asciire.split(string)
res = [bits[0]]
append = res.append
for i in range(1, len(bits), 2):
append(compat_urllib_parse_unquote_to_bytes(bits[i]).decode(encoding, errors))
append(bits[i + 1])
return ''.join(res)
def compat_urllib_parse_unquote_plus(string, encoding='utf-8', errors='replace'):
"""Like unquote(), but also replace plus signs by spaces, as required for
unquoting HTML form values.
unquote_plus('%7e/abc+def') -> '~/abc def'
"""
string = string.replace('+', ' ')
return compat_urllib_parse_unquote(string, encoding, errors)
try:
compat_str = unicode # Python 2
@@ -388,6 +421,15 @@ else:
pass
return _terminal_size(columns, lines)
try:
itertools.count(start=0, step=1)
compat_itertools_count = itertools.count
except TypeError: # Python 2.6
def compat_itertools_count(start=0, step=1):
n = start
while True:
yield n
n += step
__all__ = [
'compat_HTTPError',
@@ -401,6 +443,7 @@ __all__ = [
'compat_html_entities',
'compat_http_client',
'compat_http_server',
'compat_itertools_count',
'compat_kwargs',
'compat_ord',
'compat_parse_qs',
@@ -411,6 +454,8 @@ __all__ = [
'compat_urllib_error',
'compat_urllib_parse',
'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlparse',
'compat_urllib_request',
'compat_urlparse',

View File

@@ -8,6 +8,7 @@ from .hls import NativeHlsFD
from .http import HttpFD
from .rtsp import RtspFD
from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from ..utils import (
determine_protocol,
@@ -20,6 +21,7 @@ PROTOCOL_MAP = {
'mms': RtspFD,
'rtsp': RtspFD,
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
}

View File

@@ -0,0 +1,66 @@
from __future__ import unicode_literals
import re
from .common import FileDownloader
from ..compat import compat_urllib_request
class DashSegmentsFD(FileDownloader):
"""
Download segments in a DASH manifest
"""
def real_download(self, filename, info_dict):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
base_url = info_dict['url']
segment_urls = info_dict['segment_urls']
is_test = self.params.get('test', False)
remaining_bytes = self._TEST_FILE_SIZE if is_test else None
byte_counter = 0
def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
req = compat_urllib_request.Request(target_url)
if remaining_bytes is not None:
req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
data = self.ydl.urlopen(req).read()
if remaining_bytes is not None:
data = data[:remaining_bytes]
outf.write(data)
return len(data)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
with open(tmpfilename, 'wb') as outf:
append_url_to_file(
outf, combine_url(base_url, info_dict['initialization_url']),
'initialization segment')
for i, segment_url in enumerate(segment_urls):
segment_len = append_url_to_file(
outf, combine_url(base_url, segment_url),
'segment %d / %d' % (i + 1, len(segment_urls)),
remaining_bytes)
byte_counter += segment_len
if remaining_bytes is not None:
remaining_bytes -= segment_len
if remaining_bytes <= 0:
break
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': byte_counter,
'total_bytes': byte_counter,
'filename': filename,
'status': 'finished',
})
return True

View File

@@ -109,6 +109,14 @@ class Aria2cFD(ExternalFD):
cmd += ['--', info_dict['url']]
return cmd
class HttpieFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict):
cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
for key, val in info_dict['http_headers'].items():
cmd += ['%s:%s' % (key, val)]
return cmd
_BY_NAME = dict(
(klass.get_basename(), klass)
for name, klass in globals().items()
@@ -123,5 +131,6 @@ def list_external_downloaders():
def get_external_downloader(external_downloader):
""" Given the name of the executable, see whether we support the given
downloader . """
bn = os.path.basename(external_downloader)
# Drop .exe extension on Windows
bn = os.path.splitext(os.path.basename(external_downloader))[0]
return _BY_NAME[bn]

View File

@@ -4,7 +4,10 @@ from .abc import ABCIE
from .abc7news import Abc7NewsIE
from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE
from .adobetv import AdobeTVIE
from .adobetv import (
AdobeTVIE,
AdobeTVVideoIE,
)
from .adultswim import AdultSwimIE
from .aftenposten import AftenpostenIE
from .aftonbladet import AftonbladetIE
@@ -16,9 +19,14 @@ from .anysex import AnySexIE
from .aol import AolIE
from .allocine import AllocineIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
from .appletrailers import AppleTrailersIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE, ARDMediathekIE
from .ard import (
ARDIE,
ARDMediathekIE,
SportschauIE,
)
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
@@ -35,7 +43,10 @@ from .azubu import AzubuIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbccouk import BBCCoUkIE
from .bbc import (
BBCCoUkIE,
BBCIE,
)
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .beatportpro import BeatportProIE
@@ -103,6 +114,7 @@ from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
DailymotionCloudIE,
)
from .daum import DaumIE
from .dbtv import DBTVIE
@@ -140,7 +152,6 @@ from .ellentv import (
)
from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .empflix import EMPFlixIE
from .engadget import EngadgetIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
@@ -235,6 +246,7 @@ from .instagram import InstagramIE, InstagramUserIE
from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE
from .iqiyi import IqiyiIE
from .ir90tv import Ir90TvIE
from .ivi import (
IviIE,
IviCompilationIE
@@ -257,8 +269,17 @@ from .keek import KeekIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
from .kuwo import (
KuwoIE,
KuwoAlbumIE,
KuwoChartIE,
KuwoSingerIE,
KuwoCategoryIE,
KuwoMvIE,
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lecture2go import Lecture2GoIE
from .letv import (
LetvIE,
LetvTvIE,
@@ -320,6 +341,7 @@ from .musicvault import MusicVaultIE
from .muzu import MuzuTVIE
from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE
from .myvi import MyviIE
from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE
from .nationalgeographic import NationalGeographicIE
@@ -339,6 +361,15 @@ from .ndtv import NDTVIE
from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE
from .nerdist import NerdistIE
from .neteasemusic import (
NetEaseMusicIE,
NetEaseMusicAlbumIE,
NetEaseMusicSingerIE,
NetEaseMusicListIE,
NetEaseMusicMvIE,
NetEaseMusicProgramIE,
NetEaseMusicDjRadioIE,
)
from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE
from .nextmedia import (
@@ -368,7 +399,8 @@ from .npo import (
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
TegenlichtVproIE,
VPROIE,
WNLIE
)
from .nrk import (
NRKIE,
@@ -384,6 +416,7 @@ from .nytimes import (
from .nuvid import NuvidIE
from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE
from .onionstudios import OnionStudiosIE
from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
@@ -401,6 +434,7 @@ from .pbs import PBSIE
from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE
from .pinkbike import PinkbikeIE
from .planetaplay import PlanetaPlayIE
from .pladform import PladformIE
from .played import PlayedIE
@@ -427,6 +461,7 @@ from .qqmusic import (
QQMusicSingerIE,
QQMusicAlbumIE,
QQMusicToplistIE,
QQMusicPlaylistIE,
)
from .quickvid import QuickVidIE
from .r7 import R7IE
@@ -436,6 +471,7 @@ from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .rai import RaiIE
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redtube import RedTubeIE
from .restudy import RestudyIE
from .reverbnation import ReverbNationIE
@@ -487,6 +523,10 @@ from .smotri import (
SmotriUserIE,
SmotriBroadcastIE,
)
from .snagfilms import (
SnagFilmsIE,
SnagFilmsEmbedIE,
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .soompi import (
@@ -560,6 +600,7 @@ from .tf1 import TF1IE
from .theonion import TheOnionIE
from .theplatform import ThePlatformIE
from .thesixtyone import TheSixtyOneIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .tlc import TlcIE, TlcDeIE
@@ -567,7 +608,11 @@ from .tmz import (
TMZIE,
TMZArticleIE,
)
from .tnaflix import TNAFlixIE
from .tnaflix import (
TNAFlixIE,
EMPFlixIE,
MovieFapIE,
)
from .thvideo import (
THVideoIE,
THVideoPlaylistIE
@@ -611,6 +656,7 @@ from .twitch import (
TwitchBookmarksIE,
TwitchStreamIE,
)
from .twitter import TwitterCardIE
from .ubu import UbuIE
from .udemy import (
UdemyIE,
@@ -687,7 +733,10 @@ from .wdr import (
WDRMobileIE,
WDRMausIE,
)
from .webofstories import WebOfStoriesIE
from .webofstories import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE
@@ -696,7 +745,10 @@ from .wrzuta import WrzutaIE
from .wsj import WSJIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
from .xhamster import XHamsterIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
)
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE
@@ -715,6 +767,7 @@ from .yandexmusic import (
YandexMusicPlaylistIE,
)
from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE
from .youjizz import YouJizzIE
from .youku import YoukuIE

View File

@@ -5,6 +5,8 @@ from ..utils import (
parse_duration,
unified_strdate,
str_to_int,
float_or_none,
ISO639Utils,
)
@@ -69,3 +71,61 @@ class AdobeTVIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
}
class AdobeTVVideoIE(InfoExtractor):
_VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)'
_TEST = {
# From https://helpx.adobe.com/acrobat/how-to/new-experience-acrobat-dc.html?set=acrobat--get-started--essential-beginners
'url': 'https://video.tv.adobe.com/v/2456/',
'md5': '43662b577c018ad707a63766462b1e87',
'info_dict': {
'id': '2456',
'ext': 'mp4',
'title': 'New experience with Acrobat DC',
'description': 'New experience with Acrobat DC',
'duration': 248.667,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_params = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'player parameters'),
video_id)
formats = [{
'url': source['src'],
'width': source.get('width'),
'height': source.get('height'),
'tbr': source.get('bitrate'),
} for source in player_params['sources']]
# For both metadata and downloaded files the duration varies among
# formats. I just pick the max one
duration = max(filter(None, [
float_or_none(source.get('duration'), scale=1000)
for source in player_params['sources']]))
subtitles = {}
for translation in player_params.get('translations', []):
lang_id = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
if lang_id not in subtitles:
subtitles[lang_id] = []
subtitles[lang_id].append({
'url': translation['vttPath'],
'ext': 'vtt',
})
return {
'id': video_id,
'formats': formats,
'title': player_params['title'],
'description': self._og_search_description(webpage),
'duration': duration,
'subtitles': subtitles,
}

View File

@@ -0,0 +1,50 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
str_to_int,
ExtractorError
)
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': '10d0f2799111df4cb1c924520ca78f98',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
'title': 'Energy',
'uploader': 'Drake',
'thumbnail': 'http://is5.mzstatic.com/image/thumb/Video5/v4/78/61/c5/7861c5fa-ad6d-294b-1464-cf7605b911d6/source/1920x1080sr.jpg',
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
try:
video_json = self._html_search_regex(
r'class="auc-video-data">(\{.*?\})', webpage, 'json')
except ExtractorError:
raise ExtractorError('This post doesn\'t contain a video', expected=True)
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
return {
'id': video_id,
'url': video_data['sslSrc'],
'title': video_data['title'],
'description': video_data['description'],
'uploader': video_data['artistName'],
'thumbnail': video_data['artworkUrl'],
'timestamp': timestamp,
'like_count': like_count,
}

View File

@@ -8,6 +8,7 @@ from .generic import GenericIE
from ..utils import (
determine_ext,
ExtractorError,
get_element_by_attribute,
qualities,
int_or_none,
parse_duration,
@@ -22,19 +23,125 @@ class ARDMediathekIE(InfoExtractor):
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
'info_dict': {
'id': '29582122',
'ext': 'mp4',
'title': 'Ich liebe das Leben trotzdem',
'description': 'md5:45e4c225c72b27993314b31a84a5261c',
'duration': 4557,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
'info_dict': {
'id': '29522730',
'ext': 'mp4',
'title': 'Tatort: Scheinwelten - Hörfassung (Video tgl. ab 20 Uhr)',
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252,
},
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
'md5': '219d94d8980b4f538c7fcb0865eb7f2c',
'info_dict': {
'id': '28488308',
'ext': 'mp3',
'title': 'Tod eines Fußballers',
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240,
},
}, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True,
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Das-Wunder-von-Wolbeck-Video-tgl-ab-20/Das-Erste/Video?documentId=22490580&bcastId=602916',
'info_dict': {
'id': '22490580',
'ext': 'mp4',
'title': 'Das Wunder von Wolbeck (Video tgl. ab 20 Uhr)',
'description': 'Auf einem restaurierten Hof bei Wolbeck wird der Heilpraktiker Raffael Lembeck eines morgens von seiner Frau Stella tot aufgefunden. Das Opfer war offensichtlich in seiner Praxis zu Fall gekommen und ist dann verblutet, erklärt Prof. Boerne am Tatort.',
},
'skip': 'Blocked outside of Germany',
}]
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
formats = self._extract_formats(media_info, video_id)
if not formats:
if '"fsk"' in webpage:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
raise ExtractorError('This video is not available due to geo restriction', expected=True)
self._sort_formats(formats)
duration = int_or_none(media_info.get('_duration'))
thumbnail = media_info.get('_previewImage')
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'srt',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': duration,
'thumbnail': thumbnail,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
ext = determine_ext(stream_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
video_id, preference=-1, f4m_id='hds'))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls'))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
elif stream_url.startswith('http'):
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
else:
continue
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
def _real_extract(self, url):
# determine video id from url
m = re.match(self._VALID_URL, url)
@@ -92,46 +199,22 @@ class ARDMediathekIE(InfoExtractor):
'format_id': fid,
'url': furl,
})
self._sort_formats(formats)
info = {
'formats': formats,
}
else: # request JSON file
media_info = self._download_json(
'http://www.ardmediathek.de/play/media/%s' % video_id, video_id)
# The second element of the _mediaArray contains the standard http urls
streams = media_info['_mediaArray'][1]['_mediaStreamArray']
if not streams:
if '"fsk"' in webpage:
raise ExtractorError('This video is only available after 20:00')
info = self._extract_media_info(
'http://www.ardmediathek.de/play/media/%s' % video_id, webpage, video_id)
formats = []
for s in streams:
if type(s['_stream']) == list:
for index, url in enumerate(s['_stream'][::-1]):
quality = s['_quality'] + index
formats.append({
'quality': quality,
'url': url,
'format_id': '%s-%s' % (determine_ext(url), quality)
})
continue
format = {
'quality': s['_quality'],
'url': s['_stream'],
}
format['format_id'] = '%s-%s' % (
determine_ext(format['url']), format['quality'])
formats.append(format)
self._sort_formats(formats)
return {
info.update({
'id': video_id,
'title': title,
'description': description,
'formats': formats,
'thumbnail': thumbnail,
}
})
return info
class ARDIE(InfoExtractor):
@@ -189,3 +272,41 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date,
'thumbnail': thumbnail,
}
class SportschauIE(ARDMediathekIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
_TESTS = [{
'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
'info_dict': {
'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
'ext': 'mp4',
'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_media_info(
base_url + '-mc_defaultQuality-h.json', webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@@ -8,6 +8,7 @@ from ..compat import compat_urlparse
class BaiduVideoIE(InfoExtractor):
IE_DESC = '百度视频'
_VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_TESTS = [{
'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',

722
youtube_dl/extractor/bbc.py Normal file
View File

@@ -0,0 +1,722 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_duration,
parse_iso8601,
)
from ..compat import compat_HTTPError
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:(?:programmes|iplayer(?:/[^/]+)?/(?:episode|playlist))/)|music/clips[/#])(?P<id>[\da-z]{8})'
_MEDIASELECTOR_URL = 'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s'
_TESTS = [
{
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
'duration': 1740,
},
'params': {
# rtmp download
'skip_download': True,
}
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b00yng5w/The_Man_in_Black_Series_3_The_Printed_Name/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Man in Black: Series 3: The Printed Name',
'description': "Mark Gatiss introduces Nicholas Pierpan's chilling tale of a writer's devilish pact with a mysterious man. Stars Ewan Bailey.",
'duration': 1800,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Episode is no longer available on BBC iPlayer Radio',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b03vhd1f/The_Voice_UK_Series_3_Blind_Auditions_5/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Voice UK: Series 3: Blind Auditions 5',
'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
'duration': 5100,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/p026c7jt/tomorrows-worlds-the-unearthly-history-of-science-fiction-2-invasion',
'info_dict': {
'id': 'b03k3pb7',
'ext': 'flv',
'title': "Tomorrow's Worlds: The Unearthly History of Science Fiction",
'description': '2. Invasion',
'duration': 3600,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
}, {
'url': 'http://www.bbc.co.uk/programmes/b04v20dw',
'info_dict': {
'id': 'b04v209v',
'ext': 'flv',
'title': 'Pete Tong, The Essential New Tune Special',
'description': "Pete has a very special mix - all of 2014's Essential New Tunes!",
'duration': 10800,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
'note': 'Audio',
'info_dict': {
'id': 'p02frcch',
'ext': 'flv',
'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
'duration': 3507,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p025c0zz',
'note': 'Video',
'info_dict': {
'id': 'p025c103',
'ext': 'flv',
'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)',
'description': 'Rae Morris performs Closer for BBC Three at Reading 2014',
'duration': 226,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b054fn09/ad/natural-world-20152016-2-super-powered-owls',
'info_dict': {
'id': 'p02n76xf',
'ext': 'flv',
'title': 'Natural World, 2015-2016: 2. Super Powered Owls',
'description': 'md5:e4db5c937d0e95a7c6b5e654d429183d',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b05zmgwn/royal-academy-summer-exhibition',
'info_dict': {
'id': 'b05zmgw1',
'ext': 'flv',
'description': 'Kirsty Wark and Morgan Quaintance visit the Royal Academy as it prepares for its annual artistic extravaganza, meeting people who have come together to make the show unique.',
'title': 'Royal Academy Summer Exhibition',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/music/clips#p02frcc3',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/iplayer/cbeebies/episode/b0480276/bing-14-atchoo',
'only_matching': True,
}
]
def _extract_asx_playlist(self, connection, programme_id):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
transfer_format = connection.get('transferFormat')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Skip DASH until supported
elif transfer_format == 'dash':
pass
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist):
return playlist.findall('./{http://bbc.co.uk/2008/emp/playlist}item')
def _extract_medias(self, media_selection):
error = media_selection.find('./{http://bbc.co.uk/2008/mp/mediaselection}error')
if error is not None:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error.get('id')), expected=True)
return media_selection.findall('./{http://bbc.co.uk/2008/mp/mediaselection}media')
def _extract_connections(self, media):
return media.findall('./{http://bbc.co.uk/2008/mp/mediaselection}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int_or_none(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int_or_none(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id):
subtitles = {}
for connection in self._extract_connections(media):
captions = self._download_xml(connection.get('href'), programme_id, 'Downloading captions')
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en')
subtitles[lang] = [
{
'url': connection.get('href'),
'ext': 'ttml',
},
]
return subtitles
def _download_media_selector(self, programme_id):
return self._download_media_selector_url(
self._MEDIASELECTOR_URL % programme_id, programme_id)
def _download_media_selector_url(self, url, programme_id=None):
try:
media_selection = self._download_xml(
url, programme_id, 'Downloading media selection XML')
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
media_selection = xml.etree.ElementTree.fromstring(ee.cause.read().decode('utf-8'))
else:
raise
return self._process_media_selector(media_selection, programme_id)
def _process_media_selector(self, media_selection, programme_id):
formats = []
subtitles = None
for media in self._extract_medias(media_selection):
kind = media.get('kind')
if kind == 'audio':
formats.extend(self._extract_audio(media, programme_id))
elif kind == 'video':
formats.extend(self._extract_video(media, programme_id))
elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles
def _download_playlist(self, playlist_id):
try:
playlist = self._download_json(
'http://www.bbc.co.uk/programmes/%s/playlist.json' % playlist_id,
playlist_id, 'Downloading playlist JSON')
version = playlist.get('defaultAvailableVersion')
if version:
smp_config = version['smpConfig']
title = smp_config['title']
description = smp_config['summary']
for item in smp_config['items']:
kind = item['kind']
if kind != 'programme' and kind != 'radioProgramme':
continue
programme_id = item.get('vpid')
duration = int_or_none(item.get('duration'))
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
except ExtractorError as ee:
if not (isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404):
raise
# fallback to legacy playlist
return self._process_legacy_playlist(playlist_id)
def _process_legacy_playlist_url(self, url, display_id):
playlist = self._download_legacy_playlist_url(url, display_id)
return self._extract_from_legacy_playlist(playlist, display_id)
def _process_legacy_playlist(self, playlist_id):
return self._process_legacy_playlist_url(
'http://www.bbc.co.uk/iplayer/playlist/%s' % playlist_id, playlist_id)
def _download_legacy_playlist_url(self, url, playlist_id=None):
return self._download_xml(
url, playlist_id, 'Downloading legacy playlist XML')
def _extract_from_legacy_playlist(self, playlist, playlist_id):
no_items = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}noItems')
if no_items is not None:
reason = no_items.get('reason')
if reason == 'preAvailability':
msg = 'Episode %s is not yet available' % playlist_id
elif reason == 'postAvailability':
msg = 'Episode %s is no longer available' % playlist_id
elif reason == 'noMedia':
msg = 'Episode %s is not currently available' % playlist_id
else:
msg = 'Episode %s is not available: %s' % (playlist_id, reason)
raise ExtractorError(msg, expected=True)
for item in self._extract_items(playlist):
kind = item.get('kind')
if kind != 'programme' and kind != 'radioProgramme':
continue
title = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}title').text
description = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}summary').text
def get_programme_id(item):
def get_from_attributes(item):
for p in('identifier', 'group'):
value = item.get(p)
if value and re.match(r'^[pb][\da-z]{7}$', value):
return value
get_from_attributes(item)
mediator = item.find('./{http://bbc.co.uk/2008/emp/playlist}mediator')
if mediator is not None:
return get_from_attributes(mediator)
programme_id = get_programme_id(item)
duration = int_or_none(item.get('duration'))
# TODO: programme_id can be None and media items can be incorporated right inside
# playlist's item (e.g. http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
# as f4m and m3u8
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
def _real_extract(self, url):
group_id = self._match_id(url)
webpage = self._download_webpage(url, group_id, 'Downloading video page')
programme_id = None
tviplayer = self._search_regex(
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
webpage, 'player', default=None)
if tviplayer:
player = self._parse_json(tviplayer, group_id).get('player', {})
duration = int_or_none(player.get('duration'))
programme_id = player.get('vpid')
if not programme_id:
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
title = self._og_search_title(webpage)
description = self._search_regex(
r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
webpage, 'description', fatal=False)
else:
programme_id, title, description, duration, formats, subtitles = self._download_playlist(group_id)
self._sort_formats(formats)
return {
'id': programme_id,
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
class BBCIE(BBCCoUkIE):
IE_NAME = 'bbc'
IE_DESC = 'BBC'
_VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
# fails with notukerror for some videos
# _MEDIASELECTOR_URL = 'http://open.live.bbc.co.uk/mediaselector/4/mtis/stream/%s'
_MEDIASELECTOR_URL = 'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/journalism-pc/vpid/%s'
_TESTS = [{
# article with multiple videos embedded with data-media-meta containing
# playlist.sxml, externalId and no direct video links
'url': 'http://www.bbc.com/news/world-europe-32668511',
'info_dict': {
'id': 'world-europe-32668511',
'title': 'Russia stages massive WW2 parade despite Western boycott',
'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
},
'playlist_count': 2,
}, {
# article with multiple videos embedded with data-media-meta (more videos)
'url': 'http://www.bbc.com/news/business-28299555',
'info_dict': {
'id': 'business-28299555',
'title': 'Farnborough Airshow: Video highlights',
'description': 'BBC reports and video highlights at the Farnborough Airshow.',
},
'playlist_count': 9,
'skip': 'Save time',
}, {
# article with multiple videos embedded with `new SMP()`
'url': 'http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460',
'info_dict': {
'id': '3662a707-0af9-3149-963f-47bea720b460',
'title': 'BBC Blogs - Adam Curtis - BUGGER',
},
'playlist_count': 18,
}, {
# single video embedded with mediaAssetPage.init()
'url': 'http://www.bbc.com/news/world-europe-32041533',
'info_dict': {
'id': 'p02mprgb',
'ext': 'flv',
'title': 'Aerial footage showed the site of the crash in the Alps - courtesy BFM TV',
'duration': 47,
'timestamp': 1427219242,
'upload_date': '20150324',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# article with single video embedded with data-media-meta containing
# direct video links (for now these are extracted) and playlist.xml (with
# media items as f4m and m3u8 - currently unsupported)
'url': 'http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu',
'info_dict': {
'id': '150615_telabyad_kentin_cogu',
'ext': 'mp4',
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
'duration': 47,
'timestamp': 1434397334,
'upload_date': '20150615',
},
'params': {
'skip_download': True,
}
}, {
# single video embedded with mediaAssetPage.init() (regional section)
'url': 'http://www.bbc.com/mundo/video_fotos/2015/06/150619_video_honduras_militares_hospitales_corrupcion_aw',
'info_dict': {
'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
'ext': 'mp4',
'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
'duration': 87,
'timestamp': 1434713142,
'upload_date': '20150619',
},
'params': {
'skip_download': True,
}
}, {
# single video story with digitalData
'url': 'http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret',
'info_dict': {
'id': 'p02q6gc4',
'ext': 'flv',
'title': 'Sri Lankas spicy secret',
'description': 'As a new train line to Jaffna opens up the countrys north, travellers can experience a truly distinct slice of Tamil culture.',
'timestamp': 1437674293,
'upload_date': '20150723',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# single video story without digitalData
'url': 'http://www.bbc.com/autos/story/20130513-hyundais-rock-star',
'info_dict': {
'id': 'p018zqqg',
'ext': 'flv',
'title': 'Hyundai Santa Fe Sport: Rock star',
'description': 'md5:b042a26142c4154a6e472933cf20793d',
'timestamp': 1368473503,
'upload_date': '20130513',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# single video with playlist.sxml URL
'url': 'http://www.bbc.com/sport/0/football/33653409',
'info_dict': {
'id': 'p02xycnp',
'ext': 'flv',
'title': 'Transfers: Cristiano Ronaldo to Man Utd, Arsenal to spend?',
'description': 'md5:398fca0e2e701c609d726e034fa1fc89',
'duration': 140,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# single video with playlist URL from weather section
'url': 'http://www.bbc.com/weather/features/33601775',
'only_matching': True,
}, {
# custom redirection to www.bbc.com
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if BBCCoUkIE.suitable(url) else super(BBCIE, cls).suitable(url)
def _extract_from_media_meta(self, media_meta, video_id):
# Direct links to media in media metadata (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
# TODO: there are also f4m and m3u8 streams incorporated in playlist.sxml
source_files = media_meta.get('sourceFiles')
if source_files:
return [{
'url': f['url'],
'format_id': format_id,
'ext': f.get('encoding'),
'tbr': float_or_none(f.get('bitrate'), 1000),
'filesize': int_or_none(f.get('filesize')),
} for format_id, f in source_files.items() if f.get('url')], []
programme_id = media_meta.get('externalId')
if programme_id:
return self._download_media_selector(programme_id)
# Process playlist.sxml as legacy playlist
href = media_meta.get('href')
if href:
playlist = self._download_legacy_playlist_url(href)
_, _, _, _, formats, subtitles = self._extract_from_legacy_playlist(playlist, video_id)
return formats, subtitles
return [], []
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
timestamp = parse_iso8601(self._search_regex(
[r'"datePublished":\s*"([^"]+)',
r'<meta[^>]+property="article:published_time"[^>]+content="([^"]+)"',
r'itemprop="datePublished"[^>]+datetime="([^"]+)"'],
webpage, 'date', default=None))
# single video with playlist.sxml URL (e.g. http://www.bbc.com/sport/0/football/3365340ng)
playlist = self._search_regex(
r'<param[^>]+name="playlist"[^>]+value="([^"]+)"',
webpage, 'playlist', default=None)
if playlist:
programme_id, title, description, duration, formats, subtitles = \
self._process_legacy_playlist_url(playlist, playlist_id)
self._sort_formats(formats)
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
}
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex(
[r'data-video-player-vpid="([\da-z]{8})"',
r'<param[^>]+name="externalIdentifier"[^>]+value="([\da-z]{8})"'],
webpage, 'vpid', default=None)
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
# digitalData may be missing (e.g. http://www.bbc.com/autos/story/20130513-hyundais-rock-star)
digital_data = self._parse_json(
self._search_regex(
r'var\s+digitalData\s*=\s*({.+?});?\n', webpage, 'digital data', default='{}'),
programme_id, fatal=False)
page_info = digital_data.get('page', {}).get('pageInfo', {})
title = page_info.get('pageName') or self._og_search_title(webpage)
description = page_info.get('description') or self._og_search_description(webpage)
timestamp = parse_iso8601(page_info.get('publicationDate')) or timestamp
return {
'id': programme_id,
'title': title,
'description': description,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
}
playlist_title = self._html_search_regex(
r'<title>(.*?)(?:\s*-\s*BBC [^ ]+)?</title>', webpage, 'playlist title')
playlist_description = self._og_search_description(webpage, default=None)
def extract_all(pattern):
return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False),
re.findall(pattern, webpage))))
# Multiple video article (e.g.
# http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460)
EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+[\da-z]{8}(?:\b[^"]+)?'
entries = []
for match in extract_all(r'new\s+SMP\(({.+?})\)'):
embed_url = match.get('playerSettings', {}).get('externalEmbedUrl')
if embed_url and re.match(EMBED_URL, embed_url):
entries.append(embed_url)
entries.extend(re.findall(
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
if entries:
return self.playlist_result(
[self.url_result(entry, 'BBCCoUk') for entry in entries],
playlist_id, playlist_title, playlist_description)
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
medias = extract_all(r"data-media-meta='({[^']+})'")
if not medias:
# Single video article (e.g. http://www.bbc.com/news/video_and_audio/international)
media_asset_page = self._parse_json(
self._search_regex(
r'mediaAssetPage\.init\(\s*({.+?}), "/', webpage, 'media asset'),
playlist_id)
medias = []
for video in media_asset_page.get('videos', {}).values():
medias.extend(video.values())
entries = []
for num, media_meta in enumerate(medias, start=1):
formats, subtitles = self._extract_from_media_meta(media_meta, playlist_id)
if not formats:
continue
self._sort_formats(formats)
video_id = media_meta.get('externalId')
if not video_id:
video_id = playlist_id if len(medias) == 1 else '%s-%s' % (playlist_id, num)
title = media_meta.get('caption')
if not title:
title = playlist_title if len(medias) == 1 else '%s - Video %s' % (playlist_title, num)
duration = int_or_none(media_meta.get('durationInSeconds')) or parse_duration(media_meta.get('duration'))
images = []
for image in media_meta.get('images', {}).values():
images.extend(image.values())
if 'image' in media_meta:
images.append(media_meta['image'])
thumbnails = [{
'url': image.get('href'),
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
} for image in images]
entries.append({
'id': video_id,
'title': title,
'thumbnails': thumbnails,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
})
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@@ -1,380 +0,0 @@
from __future__ import unicode_literals
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
from ..compat import compat_HTTPError
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:(?:programmes|iplayer(?:/[^/]+)?/(?:episode|playlist))/)|music/clips[/#])(?P<id>[\da-z]{8})'
_TESTS = [
{
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
'duration': 1740,
},
'params': {
# rtmp download
'skip_download': True,
}
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b00yng5w/The_Man_in_Black_Series_3_The_Printed_Name/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Man in Black: Series 3: The Printed Name',
'description': "Mark Gatiss introduces Nicholas Pierpan's chilling tale of a writer's devilish pact with a mysterious man. Stars Ewan Bailey.",
'duration': 1800,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Episode is no longer available on BBC iPlayer Radio',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b03vhd1f/The_Voice_UK_Series_3_Blind_Auditions_5/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Voice UK: Series 3: Blind Auditions 5',
'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
'duration': 5100,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/p026c7jt/tomorrows-worlds-the-unearthly-history-of-science-fiction-2-invasion',
'info_dict': {
'id': 'b03k3pb7',
'ext': 'flv',
'title': "Tomorrow's Worlds: The Unearthly History of Science Fiction",
'description': '2. Invasion',
'duration': 3600,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
}, {
'url': 'http://www.bbc.co.uk/programmes/b04v20dw',
'info_dict': {
'id': 'b04v209v',
'ext': 'flv',
'title': 'Pete Tong, The Essential New Tune Special',
'description': "Pete has a very special mix - all of 2014's Essential New Tunes!",
'duration': 10800,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
'note': 'Audio',
'info_dict': {
'id': 'p02frcch',
'ext': 'flv',
'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
'duration': 3507,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p025c0zz',
'note': 'Video',
'info_dict': {
'id': 'p025c103',
'ext': 'flv',
'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)',
'description': 'Rae Morris performs Closer for BBC Three at Reading 2014',
'duration': 226,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b054fn09/ad/natural-world-20152016-2-super-powered-owls',
'info_dict': {
'id': 'p02n76xf',
'ext': 'flv',
'title': 'Natural World, 2015-2016: 2. Super Powered Owls',
'description': 'md5:e4db5c937d0e95a7c6b5e654d429183d',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/music/clips#p02frcc3',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/iplayer/cbeebies/episode/b0480276/bing-14-atchoo',
'only_matching': True,
}
]
def _extract_asx_playlist(self, connection, programme_id):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist):
return playlist.findall('./{http://bbc.co.uk/2008/emp/playlist}item')
def _extract_medias(self, media_selection):
error = media_selection.find('./{http://bbc.co.uk/2008/mp/mediaselection}error')
if error is not None:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error.get('id')), expected=True)
return media_selection.findall('./{http://bbc.co.uk/2008/mp/mediaselection}media')
def _extract_connections(self, media):
return media.findall('./{http://bbc.co.uk/2008/mp/mediaselection}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int(media.get('width'))
height = int(media.get('height'))
file_size = int(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id):
subtitles = {}
for connection in self._extract_connections(media):
captions = self._download_xml(connection.get('href'), programme_id, 'Downloading captions')
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en')
ps = captions.findall('./{0}body/{0}div/{0}p'.format('{http://www.w3.org/2006/10/ttaf1}'))
srt = ''
def _extract_text(p):
if p.text is not None:
stripped_text = p.text.strip()
if stripped_text:
return stripped_text
return ' '.join(span.text.strip() for span in p.findall('{http://www.w3.org/2006/10/ttaf1}span'))
for pos, p in enumerate(ps):
srt += '%s\r\n%s --> %s\r\n%s\r\n\r\n' % (str(pos), p.get('begin'), p.get('end'), _extract_text(p))
subtitles[lang] = [
{
'url': connection.get('href'),
'ext': 'ttml',
},
{
'data': srt,
'ext': 'srt',
},
]
return subtitles
def _download_media_selector(self, programme_id):
try:
media_selection = self._download_xml(
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s' % programme_id,
programme_id, 'Downloading media selection XML')
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
media_selection = xml.etree.ElementTree.fromstring(ee.cause.read().encode('utf-8'))
else:
raise
formats = []
subtitles = None
for media in self._extract_medias(media_selection):
kind = media.get('kind')
if kind == 'audio':
formats.extend(self._extract_audio(media, programme_id))
elif kind == 'video':
formats.extend(self._extract_video(media, programme_id))
elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles
def _download_playlist(self, playlist_id):
try:
playlist = self._download_json(
'http://www.bbc.co.uk/programmes/%s/playlist.json' % playlist_id,
playlist_id, 'Downloading playlist JSON')
version = playlist.get('defaultAvailableVersion')
if version:
smp_config = version['smpConfig']
title = smp_config['title']
description = smp_config['summary']
for item in smp_config['items']:
kind = item['kind']
if kind != 'programme' and kind != 'radioProgramme':
continue
programme_id = item.get('vpid')
duration = int(item.get('duration'))
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
except ExtractorError as ee:
if not (isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404):
raise
# fallback to legacy playlist
playlist = self._download_xml(
'http://www.bbc.co.uk/iplayer/playlist/%s' % playlist_id,
playlist_id, 'Downloading legacy playlist XML')
no_items = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}noItems')
if no_items is not None:
reason = no_items.get('reason')
if reason == 'preAvailability':
msg = 'Episode %s is not yet available' % playlist_id
elif reason == 'postAvailability':
msg = 'Episode %s is no longer available' % playlist_id
elif reason == 'noMedia':
msg = 'Episode %s is not currently available' % playlist_id
else:
msg = 'Episode %s is not available: %s' % (playlist_id, reason)
raise ExtractorError(msg, expected=True)
for item in self._extract_items(playlist):
kind = item.get('kind')
if kind != 'programme' and kind != 'radioProgramme':
continue
title = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}title').text
description = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}summary').text
programme_id = item.get('identifier')
duration = int(item.get('duration'))
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
def _real_extract(self, url):
group_id = self._match_id(url)
webpage = self._download_webpage(url, group_id, 'Downloading video page')
programme_id = None
tviplayer = self._search_regex(
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
webpage, 'player', default=None)
if tviplayer:
player = self._parse_json(tviplayer, group_id).get('player', {})
duration = int_or_none(player.get('duration'))
programme_id = player.get('vpid')
if not programme_id:
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
title = self._og_search_title(webpage)
description = self._search_regex(
r'<p class="medium-description">([^<]+)</p>',
webpage, 'description', fatal=False)
else:
programme_id, title, description, duration, formats, subtitles = self._download_playlist(group_id)
self._sort_formats(formats)
return {
'id': programme_id,
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -1,7 +1,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_unquote
from ..utils import (
xpath_text,
xpath_with_ns,
@@ -57,7 +57,7 @@ class BetIE(InfoExtractor):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
media_url = compat_urllib_parse.unquote(self._search_regex(
media_url = compat_urllib_parse_unquote(self._search_regex(
[r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
webpage, 'media URL'))

View File

@@ -41,8 +41,15 @@ class BiliBiliIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if self._search_regex(r'(此视频不存在或被删除)', webpage, 'error message', default=None):
raise ExtractorError('The video does not exist or was deleted', expected=True)
if '(此视频不存在或被删除)' in webpage:
raise ExtractorError(
'The video does not exist or was deleted', expected=True)
if '>你没有权限浏览! 由于版权相关问题 我们不对您所在的地区提供服务<' in webpage:
raise ExtractorError(
'The video is not available in your region due to copyright reasons',
expected=True)
video_code = self._search_regex(
r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code')

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_request,
compat_urlparse,
)
@@ -14,6 +13,8 @@ from ..utils import (
int_or_none,
parse_iso8601,
unescapeHTML,
xpath_text,
xpath_with_ns,
)
@@ -23,10 +24,10 @@ class BlipTVIE(InfoExtractor):
_TESTS = [
{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'md5': '80baf1ec5c3d2019037c1c707d676b9f',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'ext': 'm4v',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'timestamp': 1323138843,
@@ -100,6 +101,20 @@ class BlipTVIE(InfoExtractor):
'vcodec': 'none',
}
},
{
# missing duration
'url': 'http://blip.tv/rss/flash/6700880',
'info_dict': {
'id': '6684191',
'ext': 'm4v',
'title': 'Cowboy Bebop: Gateway Shuffle Review',
'description': 'md5:3acc480c0f9ae157f5fe88547ecaf3f8',
'timestamp': 1386639757,
'upload_date': '20131210',
'uploader': 'sfdebris',
'uploader_id': '706520',
}
}
]
@staticmethod
@@ -128,35 +143,34 @@ class BlipTVIE(InfoExtractor):
rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS')
def blip(s):
return '{http://blip.tv/dtd/blip/1.0}%s' % s
def media(s):
return '{http://search.yahoo.com/mrss/}%s' % s
def itunes(s):
return '{http://www.itunes.com/dtds/podcast-1.0.dtd}%s' % s
def _x(p):
return xpath_with_ns(p, {
'blip': 'http://blip.tv/dtd/blip/1.0',
'media': 'http://search.yahoo.com/mrss/',
'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
})
item = rss.find('channel/item')
video_id = item.find(blip('item_id')).text
title = item.find('./title').text
description = clean_html(compat_str(item.find(blip('puredescription')).text))
timestamp = parse_iso8601(item.find(blip('datestamp')).text)
uploader = item.find(blip('user')).text
uploader_id = item.find(blip('userid')).text
duration = int(item.find(blip('runtime')).text)
media_thumbnail = item.find(media('thumbnail'))
thumbnail = media_thumbnail.get('url') if media_thumbnail is not None else item.find(itunes('image')).text
categories = [category.text for category in item.findall('category')]
video_id = xpath_text(item, _x('blip:item_id'), 'video id') or lookup_id
title = xpath_text(item, 'title', 'title', fatal=True)
description = clean_html(xpath_text(item, _x('blip:puredescription'), 'description'))
timestamp = parse_iso8601(xpath_text(item, _x('blip:datestamp'), 'timestamp'))
uploader = xpath_text(item, _x('blip:user'), 'uploader')
uploader_id = xpath_text(item, _x('blip:userid'), 'uploader id')
duration = int_or_none(xpath_text(item, _x('blip:runtime'), 'duration'))
media_thumbnail = item.find(_x('media:thumbnail'))
thumbnail = (media_thumbnail.get('url') if media_thumbnail is not None
else xpath_text(item, 'image', 'thumbnail'))
categories = [category.text for category in item.findall('category') if category is not None]
formats = []
subtitles_urls = {}
media_group = item.find(media('group'))
for media_content in media_group.findall(media('content')):
media_group = item.find(_x('media:group'))
for media_content in media_group.findall(_x('media:content')):
url = media_content.get('url')
role = media_content.get(blip('role'))
role = media_content.get(_x('blip:role'))
msg = self._download_webpage(
url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url',
video_id, 'Resolving URL for %s' % role)
@@ -175,8 +189,8 @@ class BlipTVIE(InfoExtractor):
'url': real_url,
'format_id': role,
'format_note': media_type,
'vcodec': media_content.get(blip('vcodec')) or 'none',
'acodec': media_content.get(blip('acodec')),
'vcodec': media_content.get(_x('blip:vcodec')) or 'none',
'acodec': media_content.get(_x('blip:acodec')),
'filesize': media_content.get('filesize'),
'width': int_or_none(media_content.get('width')),
'height': int_or_none(media_content.get('height')),

View File

@@ -13,6 +13,7 @@ from ..compat import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
)
from ..utils import (
determine_ext,
@@ -119,7 +120,7 @@ class BrightcoveIE(InfoExtractor):
try:
object_doc = xml.etree.ElementTree.fromstring(object_str.encode('utf-8'))
except xml.etree.ElementTree.ParseError:
except compat_xml_parse_error:
return
fv_el = find_xpath_attr(object_doc, './param', 'name', 'flashVars')

View File

@@ -106,15 +106,11 @@ class CanalplusIE(InfoExtractor):
continue
format_id = fmt.tag
if format_id == 'HLS':
hls_formats = self._extract_m3u8_formats(format_url, video_id, 'flv')
for fmt in hls_formats:
fmt['preference'] = preference(format_id)
formats.extend(hls_formats)
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', preference=preference(format_id)))
elif format_id == 'HDS':
hds_formats = self._extract_f4m_formats(format_url + '?hdcore=2.11.3', video_id)
for fmt in hds_formats:
fmt['preference'] = preference(format_id)
formats.extend(hds_formats)
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=2.11.3', video_id, preference=preference(format_id)))
else:
formats.append({
'url': format_url,

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
from ..utils import (
@@ -88,7 +89,7 @@ class CeskaTelevizeIE(InfoExtractor):
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = compat_urllib_request.Request(compat_urllib_parse.unquote(playlist_url))
req = compat_urllib_request.Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist = self._download_json(req, video_id)

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
find_xpath_attr,
@@ -10,9 +8,9 @@ from ..utils import (
class ClipsyndicateIE(InfoExtractor):
_VALID_URL = r'http://www\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_VALID_URL = r'http://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',
'md5': '4d7d549451bad625e0ff3d7bd56d776c',
'info_dict': {
@@ -22,11 +20,13 @@ class ClipsyndicateIE(InfoExtractor):
'duration': 612,
'thumbnail': 're:^https?://.+\.jpg',
},
}
}, {
'url': 'http://chic.clipsyndicate.com/video/play/5844117/shark_attack',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
js_player = self._download_webpage(
'http://eplayer.clipsyndicate.com/embed/player.js?va_id=%s' % video_id,
video_id, 'Downlaoding player')

View File

@@ -36,7 +36,7 @@ class ComCarCoffIE(InfoExtractor):
webpage, 'full data json'))
video_id = full_data['activeVideo']['video']
video_data = full_data['videos'][video_id]
video_data = full_data.get('videos', {}).get(video_id) or full_data['singleshots'][video_id]
thumbnails = [{
'url': video_data['images']['thumb'],
}, {

View File

@@ -22,18 +22,20 @@ from ..compat import (
compat_str,
)
from ..utils import (
NO_DEFAULT,
age_restricted,
bug_reports_message,
clean_html,
compiled_regex_type,
determine_ext,
ExtractorError,
fix_xml_ampersands,
float_or_none,
int_or_none,
RegexNotFoundError,
sanitize_filename,
unescapeHTML,
)
_NO_DEFAULT = object()
class InfoExtractor(object):
@@ -63,7 +65,7 @@ class InfoExtractor(object):
Potential fields:
* url Mandatory. The URL of the video file
* ext Will be calculated from url if missing
* ext Will be calculated from URL if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
Calculated from the format_id, width, height.
@@ -153,7 +155,7 @@ class InfoExtractor(object):
lower to higher preference, each element is a dictionary
with the "ext" entry and one of:
* "data": The subtitles file contents
* "url": A url pointing to the subtitles file
* "url": A URL pointing to the subtitles file
automatic_captions: Like 'subtitles', used by the YoutubeIE for
automatically generated captions
duration: Length of the video in seconds, as an integer.
@@ -174,13 +176,17 @@ class InfoExtractor(object):
Set to "root" to indicate that this is a
comment to the original video.
age_limit: Age restriction for the video, as an integer (years)
webpage_url: The url to the video webpage, if given to youtube-dl it
webpage_url: The URL to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
categories: A list of categories that the video falls in, for example
["Sports", "Berlin"]
is_live: True, False, or None (=unknown). Whether this video is a
live stream that goes on instead of a fixed-length video.
start_time: Time in seconds where the reproduction should start, as
specified in the URL.
end_time: Time in seconds where the reproduction should end, as
specified in the URL.
Unless mentioned otherwise, the fields should be Unicode strings.
@@ -499,7 +505,7 @@ class InfoExtractor(object):
# Methods for following #608
@staticmethod
def url_result(url, ie=None, video_id=None, video_title=None):
"""Returns a url that points to a page that should be processed"""
"""Returns a URL that points to a page that should be processed"""
# TODO: ie should be the class used for getting the info
video_info = {'_type': 'url',
'url': url,
@@ -523,7 +529,7 @@ class InfoExtractor(object):
video_info['description'] = playlist_description
return video_info
def _search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0, group=None):
def _search_regex(self, pattern, string, name, default=NO_DEFAULT, fatal=True, flags=0, group=None):
"""
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
@@ -549,7 +555,7 @@ class InfoExtractor(object):
return next(g for g in mobj.groups() if g is not None)
else:
return mobj.group(group)
elif default is not _NO_DEFAULT:
elif default is not NO_DEFAULT:
return default
elif fatal:
raise RegexNotFoundError('Unable to extract %s' % _name)
@@ -557,7 +563,7 @@ class InfoExtractor(object):
self._downloader.report_warning('unable to extract %s' % _name + bug_reports_message())
return None
def _html_search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0, group=None):
def _html_search_regex(self, pattern, string, name, default=NO_DEFAULT, fatal=True, flags=0, group=None):
"""
Like _search_regex, but strips HTML tags and unescapes entities.
"""
@@ -633,7 +639,7 @@ class InfoExtractor(object):
return unescapeHTML(escaped)
def _og_search_thumbnail(self, html, **kargs):
return self._og_search_property('image', html, 'thumbnail url', fatal=False, **kargs)
return self._og_search_property('image', html, 'thumbnail URL', fatal=False, **kargs)
def _og_search_description(self, html, **kargs):
return self._og_search_property('description', html, fatal=False, **kargs)
@@ -705,6 +711,25 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html,
'twitter card player')
@staticmethod
def _hidden_inputs(html):
return dict([
(input.group('name'), input.group('value')) for input in re.finditer(
r'''(?x)
<input\s+
type=(?P<q_hidden>["\'])hidden(?P=q_hidden)\s+
name=(?P<q_name>["\'])(?P<name>.+?)(?P=q_name)\s+
(?:id=(?P<q_id>["\']).+?(?P=q_id)\s+)?
value=(?P<q_value>["\'])(?P<value>.*?)(?P=q_value)
''', html)
])
def _form_hidden_inputs(self, form_id, html):
form = self._search_regex(
r'(?s)<form[^>]+?id=(["\'])%s\1[^>]*>(?P<form>.+?)</form>' % form_id,
html, '%s form' % form_id, group='form')
return self._hidden_inputs(form)
def _sort_formats(self, formats, field_preference=None):
if not formats:
raise ExtractorError('No video formats found')
@@ -815,10 +840,14 @@ class InfoExtractor(object):
self.to_screen(msg)
time.sleep(timeout)
def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None):
def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip()):
manifest = self._download_xml(
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest')
'Unable to download f4m manifest',
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244)
transform_source=transform_source)
formats = []
manifest_version = '1.0'
@@ -828,8 +857,19 @@ class InfoExtractor(object):
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
for i, media_el in enumerate(media_nodes):
if manifest_version == '2.0':
manifest_url = ('/'.join(manifest_url.split('/')[:-1]) + '/' +
(media_el.attrib.get('href') or media_el.attrib.get('url')))
media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
if not media_url:
continue
manifest_url = (
media_url if media_url.startswith('http://') or media_url.startswith('https://')
else ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url))
# If media_url is itself a f4m manifest do the recursive extraction
# since bitrates in parent manifest (this one) and media_url manifest
# may differ leading to inability to resolve the format by requested
# bitrate in f4m downloader
if determine_ext(manifest_url) == 'f4m':
formats.extend(self._extract_f4m_formats(manifest_url, video_id, preference, f4m_id))
continue
tbr = int_or_none(media_el.attrib.get('bitrate'))
formats.append({
'format_id': '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)])),
@@ -846,7 +886,8 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None):
m3u8_id=None, note=None, errnote=None,
fatal=True):
formats = [{
'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
@@ -866,7 +907,10 @@ class InfoExtractor(object):
m3u8_doc = self._download_webpage(
m3u8_url, video_id,
note=note or 'Downloading m3u8 information',
errnote=errnote or 'Failed to download m3u8 information')
errnote=errnote or 'Failed to download m3u8 information',
fatal=fatal)
if m3u8_doc is False:
return m3u8_doc
last_info = None
last_media = None
kv_rex = re.compile(
@@ -956,7 +1000,7 @@ class InfoExtractor(object):
def _parse_smil_video(self, video, video_id, base, rtmp_count):
src = video.get('src')
if not src:
return ([], rtmp_count)
return [], rtmp_count
bitrate = int_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
width = int_or_none(video.get('width'))
height = int_or_none(video.get('height'))
@@ -969,7 +1013,7 @@ class InfoExtractor(object):
proto = 'http'
ext = video.get('ext')
if proto == 'm3u8':
return (self._extract_m3u8_formats(src, video_id, ext), rtmp_count)
return self._extract_m3u8_formats(src, video_id, ext), rtmp_count
elif proto == 'rtmp':
rtmp_count += 1
streamer = video.get('streamer') or base
@@ -1076,7 +1120,7 @@ class InfoExtractor(object):
class SearchInfoExtractor(InfoExtractor):
"""
Base class for paged search queries extractors.
They accept urls in the format _SEARCH_KEY(|all|[0-9]):{query}
They accept URLs in the format _SEARCH_KEY(|all|[0-9]):{query}
Instances should define _SEARCH_KEY and _MAX_RESULTS.
"""

View File

@@ -12,6 +12,7 @@ from math import pow, sqrt, floor
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request,
)
from ..utils import (
@@ -27,7 +28,7 @@ from ..aes import (
class CrunchyrollIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:[^/]*/[^/?&]*?|media/\?id=)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_NETRC_MACHINE = 'crunchyroll'
_TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
@@ -45,6 +46,22 @@ class CrunchyrollIE(InfoExtractor):
# rtmp
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': {
'id': '589804',
'ext': 'flv',
'title': 'Culture Japan Episode 1 Rebuilding Japan after the 3.11',
'description': 'md5:fe2743efedb49d279552926d0bd0cd9e',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Danny Choo Network',
'upload_date': '20120213',
},
'params': {
# rtmp
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
'only_matching': True,
@@ -238,7 +255,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, 'video_uploader', fatal=False, flags=re.DOTALL)
playerdata_url = compat_urllib_parse.unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = compat_urllib_request.Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
@@ -251,16 +268,17 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
streamdata_req = compat_urllib_request.Request('http://www.crunchyroll.com/xml/')
# urlencode doesn't work!
streamdata_req.data = 'req=RpcApiVideoEncode%5FGetStreamInfo&video%5Fencode%5Fquality=' + stream_quality + '&media%5Fid=' + stream_id + '&video%5Fformat=' + stream_format
streamdata_req = compat_urllib_request.Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata_req.add_header('Content-Length', str(len(streamdata_req.data)))
streamdata = self._download_xml(
streamdata_req, video_id,
note='Downloading media info for %s' % video_format)
video_url = streamdata.find('./host').text
video_play_path = streamdata.find('./file').text
stream_info = streamdata.find('./{default}preload/stream_info')
video_url = stream_info.find('./host').text
video_play_path = stream_info.find('./file').text
formats.append({
'url': video_url,
'play_path': video_play_path,

View File

@@ -6,6 +6,7 @@ from ..utils import parse_iso8601, ExtractorError
class CtsNewsIE(InfoExtractor):
IE_DESC = '華視新聞'
# https connection failed (Connection reset)
_VALID_URL = r'http://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_TESTS = [{

View File

@@ -13,8 +13,10 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
orderedSet,
parse_iso8601,
str_to_int,
unescapeHTML,
)
@@ -28,10 +30,16 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
request.add_header('Cookie', 'family_filter=off; ff=off')
return request
def _download_webpage_handle_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage_handle(request, *args, **kwargs)
def _download_webpage_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage(request, *args, **kwargs)
class DailymotionIE(DailymotionBaseInfoExtractor):
"""Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)'
IE_NAME = 'dailymotion'
@@ -50,9 +58,17 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'info_dict': {
'id': 'x2iuewm',
'ext': 'mp4',
'uploader': 'IGN',
'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News',
'description': 'Several come bundled with the Steam Controller.',
'thumbnail': 're:^https?:.*\.(?:jpg|png)$',
'duration': 74,
'timestamp': 1425657362,
'upload_date': '20150306',
'uploader': 'IGN',
'uploader_id': 'xijv66',
'age_limit': 0,
'view_count': int,
'comment_count': int,
}
},
# Vevo video
@@ -86,38 +102,106 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'https://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information
request = self._build_request(url)
webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage
self.report_extraction(video_id)
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?video=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen('Vevo video detected: %s' % vevo_id)
return self.url_result('vevo:%s' % vevo_id, ie='Vevo')
webpage = self._download_webpage_no_ff(
'https://www.dailymotion.com/video/%s' % video_id, video_id)
age_limit = self._rta_search(webpage)
video_upload_date = None
mobj = re.search(r'<meta property="video:release_date" content="([0-9]{4})-([0-9]{2})-([0-9]{2}).+?"/>', webpage)
if mobj is not None:
video_upload_date = mobj.group(1) + mobj.group(2) + mobj.group(3)
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage, 'description')
view_count = str_to_int(self._search_regex(
[r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:(\d+)"',
r'video_views_count[^>]+>\s+([\d\.,]+)'],
webpage, 'view count', fatal=False))
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', fatal=False))
player_v5 = self._search_regex(
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)
metadata = player['metadata']
formats = []
for quality, media_list in metadata['qualities'].items():
for media in media_list:
media_url = media.get('url')
if not media_url:
continue
type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest':
continue
if type_ == 'application/x-mpegURL' or determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', m3u8_id='hls'))
else:
f = {
'url': media_url,
'format_id': quality,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(f)
self._sort_formats(formats)
title = metadata['title']
duration = int_or_none(metadata.get('duration'))
timestamp = int_or_none(metadata.get('created_time'))
thumbnail = metadata.get('poster_url')
uploader = metadata.get('owner', {}).get('screenname')
uploader_id = metadata.get('owner', {}).get('id')
subtitles = {}
for subtitle_lang, subtitle in metadata.get('subtitles', {}).get('data', {}).items():
subtitles[subtitle_lang] = [{
'ext': determine_ext(subtitle_url),
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'age_limit': age_limit,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats,
'subtitles': subtitles,
}
# vevo embed
vevo_id = self._search_regex(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?video=(?P<id>[\w]*)',
webpage, 'vevo embed', default=None)
if vevo_id:
return self.url_result('vevo:%s' % vevo_id, 'Vevo')
# fallback old player
embed_page = self._download_webpage_no_ff(
'https://www.dailymotion.com/embed/video/%s' % video_id,
video_id, 'Downloading embed page')
timestamp = parse_iso8601(self._html_search_meta(
'video:release_date', webpage, 'upload date'))
info = self._parse_json(
self._search_regex(
r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE),
video_id)
embed_url = 'https://www.dailymotion.com/embed/video/%s' % video_id
embed_request = self._build_request(embed_url)
embed_page = self._download_webpage(
embed_request, video_id, 'Downloading embed page')
info = self._search_regex(r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE)
info = json.loads(info)
if info.get('error') is not None:
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True)
@@ -138,16 +222,11 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'width': width,
'height': height,
})
if not formats:
raise ExtractorError('Unable to extract video URL')
self._sort_formats(formats)
# subtitles
video_subtitles = self.extract_subtitles(video_id, webpage)
view_count = str_to_int(self._search_regex(
r'video_views_count[^>]+>\s+([\d\.,]+)',
webpage, 'view count', fatal=False))
title = self._og_search_title(webpage, default=None)
if title is None:
title = self._html_search_regex(
@@ -158,12 +237,14 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'id': video_id,
'formats': formats,
'uploader': info['owner.screenname'],
'upload_date': video_upload_date,
'timestamp': timestamp,
'title': title,
'description': description,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'],
'age_limit': age_limit,
'view_count': view_count,
'duration': info['duration']
}
def _get_subtitles(self, video_id, webpage):
@@ -198,10 +279,17 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
def _extract_entries(self, id):
video_ids = []
processed_urls = set()
for pagenum in itertools.count(1):
request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum))
webpage = self._download_webpage(request,
id, 'Downloading page %s' % pagenum)
page_url = self._PAGE_TEMPLATE % (id, pagenum)
webpage, urlh = self._download_webpage_handle_no_ff(
page_url, id, 'Downloading page %s' % pagenum)
if urlh.geturl() in processed_urls:
self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
page_url, urlh.geturl()), id)
break
processed_urls.add(urlh.geturl())
video_ids.extend(re.findall(r'data-xid="(.+?)"', webpage))
@@ -225,7 +313,7 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?:(?:old/)?user/)?(?P<user>[^/]+)$'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
@@ -234,6 +322,17 @@ class DailymotionUserIE(DailymotionPlaylistIE):
'title': 'Rémi Gaillard',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'skip': 'Takes too long time',
}]
def _real_extract(self, url):
@@ -251,3 +350,52 @@ class DailymotionUserIE(DailymotionPlaylistIE):
'title': full_user,
'entries': self._extract_entries(user),
}
class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL_PREFIX = r'http://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX
_TESTS = [{
# From http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html
# Tested at FranceTvInfo_2
'url': 'http://api.dmcloud.net/embed/4e7343f894a6f677b10006b4/556e03339473995ee145930c?auth=1464865870-0-jyhsm84b-ead4c701fb750cf9367bf4447167a3db&autoplay=1',
'only_matching': True,
}, {
# http://www.francetvinfo.fr/societe/larguez-les-amarres-le-cobaturage-se-developpe_980101.html
'url': 'http://api.dmcloud.net/player/embed/4e7343f894a6f677b10006b4/559545469473996d31429f06?auth=1467430263-0-90tglw2l-a3a4b64ed41efe48d7fccad85b8b8fda&autoplay=1',
'only_matching': True,
}]
@classmethod
def _extract_dmcloud_url(self, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % self._VALID_EMBED_URL, webpage)
if mobj:
return mobj.group(1)
mobj = re.search(
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % self._VALID_EMBED_URL,
webpage)
if mobj:
return mobj.group(1)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage_no_ff(url, video_id)
title = self._html_search_regex(r'<title>([^>]+)</title>', webpage, 'title')
video_info = self._parse_json(self._search_regex(
r'var\s+info\s*=\s*([^;]+);', webpage, 'video info'), video_id)
# TODO: parse ios_url, which is in fact a manifest
video_url = video_info['mp4_url']
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': video_info.get('thumbnail_url'),
}

View File

@@ -3,42 +3,47 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import unified_strdate
class DFBIE(InfoExtractor):
IE_NAME = 'tv.dfb.de'
_VALID_URL = r'https?://tv\.dfb\.de/video/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://tv\.dfb\.de/video/(?P<display_id>[^/]+)/(?P<id>\d+)'
_TEST = {
'url': 'http://tv.dfb.de/video/highlights-des-empfangs-in-berlin/9070/',
'url': 'http://tv.dfb.de/video/u-19-em-stimmen-zum-spiel-gegen-russland/11633/',
# The md5 is different each time
'info_dict': {
'id': '9070',
'id': '11633',
'display_id': 'u-19-em-stimmen-zum-spiel-gegen-russland',
'ext': 'flv',
'title': 'Highlights des Empfangs in Berlin',
'upload_date': '20140716',
'title': 'U 19-EM: Stimmen zum Spiel gegen Russland',
'upload_date': '20150714',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, display_id)
player_info = self._download_xml(
'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
video_id)
display_id)
video_info = player_info.find('video')
f4m_info = self._download_xml(self._proto_relative_url(video_info.find('url').text.strip()), video_id)
f4m_info = self._download_xml(
self._proto_relative_url(video_info.find('url').text.strip()), display_id)
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id)
return {
'id': video_id,
'display_id': display_id,
'title': video_info.find('title').text,
'url': manifest_url,
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': ''.join(video_info.find('time_date').text.split('.')[::-1]),
'upload_date': unified_strdate(video_info.find('time_date').text),
'formats': formats,
}

View File

@@ -9,6 +9,7 @@ from ..compat import (compat_str, compat_basestring)
class DouyuTVIE(InfoExtractor):
IE_DESC = '斗鱼'
_VALID_URL = r'http://(?:www\.)?douyutv\.com/(?P<id>[A-Za-z0-9]+)'
_TESTS = [{
'url': 'http://www.douyutv.com/iseven',

View File

@@ -6,6 +6,8 @@ import itertools
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
@@ -17,7 +19,54 @@ from ..utils import (
)
class DramaFeverIE(InfoExtractor):
class DramaFeverBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
_NETRC_MACHINE = 'dramafever'
_CONSUMER_SECRET = 'DA59dtVXYLxajktV'
_consumer_secret = None
def _get_consumer_secret(self):
mainjs = self._download_webpage(
'http://www.dramafever.com/static/51afe95/df2014/scripts/main.js',
None, 'Downloading main.js', fatal=False)
if not mainjs:
return self._CONSUMER_SECRET
return self._search_regex(
r"var\s+cs\s*=\s*'([^']+)'", mainjs,
'consumer secret', default=self._CONSUMER_SECRET)
def _real_initialize(self):
self._login()
self._consumer_secret = self._get_consumer_secret()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_form = {
'username': username,
'password': password,
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
if all(logout_pattern not in response
for logout_pattern in ['href="/accounts/logout/"', '>Log out<']):
error = self._html_search_regex(
r'(?s)class="hidden-xs prompt"[^>]*>(.+?)<',
response, 'error message', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
class DramaFeverIE(DramaFeverBaseIE):
IE_NAME = 'dramafever'
_VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
_TEST = {
@@ -85,6 +134,23 @@ class DramaFeverIE(InfoExtractor):
'url': href,
}]
series_id, episode_number = video_id.split('.')
episode_info = self._download_json(
# We only need a single episode info, so restricting page size to one episode
# and dealing with page number as with episode number
r'http://www.dramafever.com/api/4/episode/series/?cs=%s&series_id=%s&page_number=%s&page_size=1'
% (self._consumer_secret, series_id, episode_number),
video_id, 'Downloading episode info JSON', fatal=False)
if episode_info:
value = episode_info.get('value')
if value:
subfile = value[0].get('subfile') or value[0].get('new_subfile')
if subfile and subfile != 'http://www.dramafever.com/st/':
subtitles.setdefault('English', []).append({
'ext': 'srt',
'url': subfile,
})
return {
'id': video_id,
'title': title,
@@ -97,7 +163,7 @@ class DramaFeverIE(InfoExtractor):
}
class DramaFeverSeriesIE(InfoExtractor):
class DramaFeverSeriesIE(DramaFeverBaseIE):
IE_NAME = 'dramafever:series'
_VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+)(?:/(?:(?!\d+(?:/|$)).+)?)?$'
_TESTS = [{
@@ -118,27 +184,14 @@ class DramaFeverSeriesIE(InfoExtractor):
'playlist_count': 20,
}]
_CONSUMER_SECRET = 'DA59dtVXYLxajktV'
_PAGE_SIZE = 60 # max is 60 (see http://api.drama9.com/#get--api-4-episode-series-)
def _get_consumer_secret(self, video_id):
mainjs = self._download_webpage(
'http://www.dramafever.com/static/51afe95/df2014/scripts/main.js',
video_id, 'Downloading main.js', fatal=False)
if not mainjs:
return self._CONSUMER_SECRET
return self._search_regex(
r"var\s+cs\s*=\s*'([^']+)'", mainjs,
'consumer secret', default=self._CONSUMER_SECRET)
def _real_extract(self, url):
series_id = self._match_id(url)
consumer_secret = self._get_consumer_secret(series_id)
series = self._download_json(
'http://www.dramafever.com/api/4/series/query/?cs=%s&series_id=%s'
% (consumer_secret, series_id),
% (self._consumer_secret, series_id),
series_id, 'Downloading series JSON')['series'][series_id]
title = clean_html(series['name'])
@@ -148,11 +201,14 @@ class DramaFeverSeriesIE(InfoExtractor):
for page_num in itertools.count(1):
episodes = self._download_json(
'http://www.dramafever.com/api/4/episode/series/?cs=%s&series_id=%s&page_size=%d&page_number=%d'
% (consumer_secret, series_id, self._PAGE_SIZE, page_num),
% (self._consumer_secret, series_id, self._PAGE_SIZE, page_num),
series_id, 'Downloading episodes JSON page #%d' % page_num)
for episode in episodes.get('value', []):
episode_url = episode.get('episode_url')
if not episode_url:
continue
entries.append(self.url_result(
compat_urlparse.urljoin(url, episode['episode_url']),
compat_urlparse.urljoin(url, episode_url),
'DramaFever', episode.get('guid')))
if page_num == episodes['num_pages']:
break

View File

@@ -15,7 +15,6 @@ class DRBonanzaIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.dr.dk/bonanza/serie/portraetter/Talkshowet.htm?assetId=65517',
'md5': 'fe330252ddea607635cf2eb2c99a0af3',
'info_dict': {
'id': '65517',
'ext': 'mp4',
@@ -26,6 +25,9 @@ class DRBonanzaIE(InfoExtractor):
'upload_date': '20110120',
'duration': 3664,
},
'params': {
'skip_download': True, # requires rtmp
},
}, {
'url': 'http://www.dr.dk/bonanza/radio/serie/sport/fodbold.htm?assetId=59410',
'md5': '6dfe039417e76795fb783c52da3de11d',
@@ -93,6 +95,11 @@ class DRBonanzaIE(InfoExtractor):
'format_id': file['Type'].replace('Video', ''),
'preference': preferencemap.get(file['Type'], -10),
})
if format['url'].startswith('rtmp'):
rtmp_url = format['url']
format['rtmp_live'] = True # --resume does not work
if '/bonanza/' in rtmp_url:
format['play_path'] = rtmp_url.split('/bonanza/')[1]
formats.append(format)
elif file['Type'] == "Thumb":
thumbnail = file['Location']
@@ -111,9 +118,6 @@ class DRBonanzaIE(InfoExtractor):
description = '%s\n%s\n%s\n' % (
info['Description'], info['Actors'], info['Colophon'])
for f in formats:
f['url'] = f['url'].replace('rtmp://vod-bonanza.gss.dr.dk/bonanza/', 'http://vodfiles.dr.dk/')
f['url'] = f['url'].replace('mp4:bonanza', 'bonanza')
self._sort_formats(formats)
display_id = re.sub(r'[^\w\d-]', '', re.sub(r' ', '-', title.lower())) + '-' + asset_id

View File

@@ -36,25 +36,24 @@ class DrTuberIE(InfoExtractor):
r'<source src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex(
[r'class="hd_title" style="[^"]+">([^<]+)</h1>', r'<title>([^<]+) - \d+'],
[r'<p[^>]+class="title_substrate">([^<]+)</p>', r'<title>([^<]+) - \d+'],
webpage, 'title')
thumbnail = self._html_search_regex(
r'poster="([^"]+)"',
webpage, 'thumbnail', fatal=False)
like_count = str_to_int(self._html_search_regex(
r'<span id="rate_likes">\s*<img[^>]+>\s*<span>([\d,\.]+)</span>',
webpage, 'like count', fatal=False))
dislike_count = str_to_int(self._html_search_regex(
r'<span id="rate_dislikes">\s*<img[^>]+>\s*<span>([\d,\.]+)</span>',
webpage, 'like count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'<span class="comments_count">([\d,\.]+)</span>',
webpage, 'comment count', fatal=False))
def extract_count(id_, name):
return str_to_int(self._html_search_regex(
r'<span[^>]+(?:class|id)="%s"[^>]*>([\d,\.]+)</span>' % id_,
webpage, '%s count' % name, fatal=False))
like_count = extract_count('rate_likes', 'like')
dislike_count = extract_count('rate_dislikes', 'dislike')
comment_count = extract_count('comments_count', 'comment')
cats_str = self._search_regex(
r'<span>Categories:</span><div>(.+?)</div>', webpage, 'categories', fatal=False)
r'<div[^>]+class="categories_list">(.+?)</div>', webpage, 'categories', fatal=False)
categories = [] if not cats_str else re.findall(r'<a title="([^"]+)"', cats_str)
return {

View File

@@ -1,9 +1,7 @@
from __future__ import unicode_literals
from ..compat import (
compat_urllib_parse,
)
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
class EHowIE(InfoExtractor):
@@ -26,7 +24,7 @@ class EHowIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'(?:file|source)=(http[^\'"&]*)', webpage, 'video URL')
final_url = compat_urllib_parse.unquote(video_url)
final_url = compat_urllib_parse_unquote(video_url)
uploader = self._html_search_meta('uploader', webpage)
title = self._og_search_title(webpage).replace(' | eHow', '')

View File

@@ -1,31 +0,0 @@
from __future__ import unicode_literals
from .tnaflix import TNAFlixIE
class EMPFlixIE(TNAFlixIE):
_VALID_URL = r'https?://(?:www\.)?empflix\.com/videos/(?P<display_id>.+?)-(?P<id>[0-9]+)\.html'
_TITLE_REGEX = r'name="title" value="(?P<title>[^"]*)"'
_DESCRIPTION_REGEX = r'name="description" value="([^"]*)"'
_CONFIG_REGEX = r'flashvars\.config\s*=\s*escape\("([^"]+)"'
_TESTS = [
{
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': 'b1bc15b6412d33902d6e5952035fcabc',
'info_dict': {
'id': '33051',
'display_id': 'Amateur-Finger-Fuck',
'ext': 'mp4',
'title': 'Amateur Finger Fuck',
'description': 'Amateur solo finger fucking.',
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
}
},
{
'url': 'http://www.empflix.com/videos/[AROMA][ARMD-718]-Aoi-Yoshino-Sawa-25826.html',
'only_matching': True,
}
]

View File

@@ -9,7 +9,7 @@ from ..compat import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request,
)
from ..utils import (
@@ -136,7 +136,7 @@ class FacebookIE(InfoExtractor):
else:
raise ExtractorError('Cannot parse data')
data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse.unquote(data['params'])
params_raw = compat_urllib_parse_unquote(data['params'])
params = json.loads(params_raw)
video_data = params['video_data'][0]

View File

@@ -6,9 +6,9 @@ from .common import InfoExtractor
class FazIE(InfoExtractor):
IE_NAME = 'faz.net'
_VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.)?faz\.net/(?:[^/]+/)*.*?-(?P<id>\d+)\.html'
_TEST = {
_TESTS = [{
'url': 'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
'info_dict': {
'id': '12610585',
@@ -16,7 +16,22 @@ class FazIE(InfoExtractor):
'title': 'Stockholm: Chemie-Nobelpreis für drei amerikanische Forscher',
'description': 'md5:1453fbf9a0d041d985a47306192ea253',
},
}
}, {
'url': 'http://www.faz.net/aktuell/politik/berlin-gabriel-besteht-zerreissprobe-ueber-datenspeicherung-13659345.html',
'only_matching': True,
}, {
'url': 'http://www.faz.net/berlin-gabriel-besteht-zerreissprobe-ueber-datenspeicherung-13659345.html',
'only_matching': True,
}, {
'url': 'http://www.faz.net/-13659345.html',
'only_matching': True,
}, {
'url': 'http://www.faz.net/aktuell/politik/-13659345.html',
'only_matching': True,
}, {
'url': 'http://www.faz.net/foobarblafasel-13659345.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -6,18 +6,15 @@ import re
import json
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..compat import compat_urlparse
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
float_or_none,
parse_duration,
determine_ext,
)
from .dailymotion import DailymotionCloudIE
class FranceTVBaseInfoExtractor(InfoExtractor):
@@ -58,12 +55,12 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
# See https://github.com/rg3/youtube-dl/issues/3963
# m3u8 urls work fine
continue
video_url_parsed = compat_urllib_parse_urlparse(video_url)
f4m_url = self._download_webpage(
'http://hdfauth.francetv.fr/esi/urltokengen2.html?url=%s' % video_url_parsed.path,
'http://hdfauth.francetv.fr/esi/TA?url=%s' % video_url,
video_id, 'Downloading f4m manifest token', fatal=False)
if f4m_url:
formats.extend(self._extract_f4m_formats(f4m_url, video_id, 1, format_id))
formats.extend(self._extract_f4m_formats(
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, 1, format_id))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4', m3u8_id=format_id))
elif video_url.startswith('rtmp'):
@@ -86,7 +83,7 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
'title': info['titre'],
'description': clean_html(info['synopsis']),
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', info['image']),
'duration': float_or_none(info.get('real_duration'), 1000) or parse_duration(info['duree']),
'duration': int_or_none(info.get('real_duration')) or parse_duration(info['duree']),
'timestamp': int_or_none(info['diffusion']['timestamp']),
'formats': formats,
}
@@ -131,12 +128,26 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'skip_download': 'HLS (reqires ffmpeg)'
},
'skip': 'Ce direct est terminé et sera disponible en rattrapage dans quelques minutes.',
}, {
'url': 'http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html',
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': '556e03339473995ee145930c',
'ext': 'mp4',
'title': 'Les entreprises familiales : le secret de la réussite',
'thumbnail': 're:^https?://.*\.jpe?g$',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud')
video_id, catalogue = self._search_regex(
r'id-video=([^@]+@[^"]+)', webpage, 'video id').split('@')
return self._extract_video(video_id, catalogue)
@@ -145,11 +156,21 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
class FranceTVIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetv'
IE_DESC = 'France 2, 3, 4, 5 and Ô'
_VALID_URL = r'''(?x)https?://www\.france[2345o]\.fr/
(?:
emissions/.*?/(videos|emissions)/(?P<id>[^/?]+)
| (emissions?|jt)/(?P<key>[^/?]+)
)'''
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?france[2345o]\.fr/
(?:
emissions/[^/]+/(?:videos|diffusions)|
emission/[^/]+|
videos|
jt
)
/|
embed\.francetv\.fr/\?ue=
)
(?P<id>[^/?]+)
'''
_TESTS = [
# france2
@@ -206,24 +227,46 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
},
# franceo
{
'url': 'http://www.franceo.fr/jt/info-afrique/04-12-2013',
'md5': '52f0bfe202848b15915a2f39aaa8981b',
'url': 'http://www.franceo.fr/jt/info-soir/18-07-2015',
'md5': '47d5816d3b24351cdce512ad7ab31da8',
'info_dict': {
'id': '108634970',
'id': '125377621',
'ext': 'flv',
'title': 'Infô Afrique',
'description': 'md5:ebf346da789428841bee0fd2a935ea55',
'upload_date': '20140915',
'timestamp': 1410822000,
'title': 'Infô soir',
'description': 'md5:01b8c6915a3d93d8bbbd692651714309',
'upload_date': '20150718',
'timestamp': 1437241200,
'duration': 414,
},
},
{
# francetv embed
'url': 'http://embed.francetv.fr/?ue=8d7d3da1e3047c42ade5a5d7dfd3fc87',
'info_dict': {
'id': 'EV_30231',
'ext': 'flv',
'title': 'Alcaline, le concert avec Calogero',
'description': 'md5:61f08036dcc8f47e9cfc33aed08ffaff',
'upload_date': '20150226',
'timestamp': 1424989860,
'duration': 5400,
},
},
{
'url': 'http://www.france4.fr/emission/highlander/diffusion-du-17-07-2015-04h05',
'only_matching': True,
},
{
'url': 'http://www.franceo.fr/videos/125377617',
'only_matching': True,
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage = self._download_webpage(url, mobj.group('key') or mobj.group('id'))
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id, catalogue = self._html_search_regex(
r'href="http://videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'href="http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
webpage, 'video ID').split('@')
return self._extract_video(video_id, catalogue)

View File

@@ -5,7 +5,7 @@ import json
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
@@ -75,7 +75,7 @@ class GameSpotIE(InfoExtractor):
return {
'id': data_video['guid'],
'display_id': page_id,
'title': compat_urllib_parse.unquote(data_video['title']),
'title': compat_urllib_parse_unquote(data_video['title']),
'formats': formats,
'description': self._html_search_meta('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage),

View File

@@ -8,7 +8,6 @@ import re
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request,
compat_urlparse,
@@ -37,12 +36,18 @@ from .rutv import RUTVIE
from .tvc import TVCIE
from .sportbox import SportBoxEmbedIE
from .smotri import SmotriIE
from .myvi import MyviIE
from .condenast import CondeNastIE
from .udn import UDNEmbedIE
from .senateisvp import SenateISVPIE
from .bliptv import BlipTVIE
from .svt import SVTIE
from .pornhub import PornHubIE
from .xhamster import XHamsterEmbedIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE
from .onionstudios import OnionStudiosIE
from .snagfilms import SnagFilmsEmbedIE
class GenericIE(InfoExtractor):
@@ -271,14 +276,6 @@ class GenericIE(InfoExtractor):
'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.',
},
},
# BBC iPlayer embeds
{
'url': 'http://www.bbc.co.uk/blogs/adamcurtis/posts/BUGGER',
'info_dict': {
'title': 'BBC - Blogs - Adam Curtis - BUGGER',
},
'playlist_mincount': 18,
},
# RUTV embed
{
'url': 'http://www.rg.ru/2014/03/15/reg-dfo/anklav-anons.html',
@@ -333,6 +330,26 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# Myvi.ru embed
{
'url': 'http://www.kinomyvi.tv/news/detail/Pervij-dublirovannij-trejler--Uzhastikov-_nOw1',
'info_dict': {
'id': 'f4dafcad-ff21-423d-89b5-146cfd89fa1e',
'ext': 'mp4',
'title': 'Ужастики, русский трейлер (2015)',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 153,
}
},
# XHamster embed
{
'url': 'http://www.numisc.com/forum/showthread.php?11696-FM15-which-pumiscer-was-this-%28-vid-%29-%28-alfa-as-fuck-srx-%29&s=711f5db534502e22260dec8c5e2d66d8',
'info_dict': {
'id': 'showthread',
'title': '[NSFL] [FM15] which pumiscer was this ( vid ) ( alfa as fuck srx )',
},
'playlist_mincount': 7,
},
# Embedded TED video
{
'url': 'http://en.support.wordpress.com/videos/ted-talks/',
@@ -382,6 +399,26 @@ class GenericIE(InfoExtractor):
'skip_download': 'Requires rtmpdump'
}
},
# francetv embed
{
'url': 'http://www.tsprod.com/replay-du-concert-alcaline-de-calogero',
'info_dict': {
'id': 'EV_30231',
'ext': 'mp4',
'title': 'Alcaline, le concert avec Calogero',
'description': 'md5:61f08036dcc8f47e9cfc33aed08ffaff',
'upload_date': '20150226',
'timestamp': 1424989860,
'duration': 5400,
},
'params': {
# m3u8 downloads
'skip_download': True,
},
'expected_warnings': [
'Forbidden'
]
},
# Condé Nast embed
{
'url': 'http://www.wired.com/2014/04/honda-asimo/',
@@ -655,6 +692,18 @@ class GenericIE(InfoExtractor):
'title': 'John Carlson Postgame 2/25/15',
},
},
# Kaltura embed (different embed code)
{
'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014',
'info_dict': {
'id': '1_a52wc67y',
'ext': 'flv',
'upload_date': '20150127',
'uploader_id': 'PremierMedia',
'timestamp': int,
'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
},
},
# Eagle.Platform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -812,6 +861,50 @@ class GenericIE(InfoExtractor):
'description': 'To understand why he was the Toronto Blue Jays top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
'uploader': 'Rogers Sportsnet',
},
},
# Dailymotion Cloud video
{
'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910',
'md5': '49444254273501a64675a7e68c502681',
'info_dict': {
'id': '5585de919473990de4bee11b',
'ext': 'mp4',
'title': 'Le débat',
'thumbnail': 're:^https?://.*\.jpe?g$',
}
},
# OnionStudios embed
{
'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537',
'info_dict': {
'id': '2855',
'ext': 'mp4',
'title': 'Dont Understand Bitcoin? This Man Will Mumble An Explanation At You',
'thumbnail': 're:^https?://.*\.jpe?g$',
'uploader': 'ClickHole',
'uploader_id': 'clickhole',
}
},
# SnagFilms embed
{
'url': 'http://whilewewatch.blogspot.ru/2012/06/whilewewatch-whilewewatch-gripping.html',
'info_dict': {
'id': '74849a00-85a9-11e1-9660-123139220831',
'ext': 'mp4',
'title': '#whilewewatch',
}
},
# AdobeTVVideo embed
{
'url': 'https://helpx.adobe.com/acrobat/how-to/new-experience-acrobat-dc.html?set=acrobat--get-started--essential-beginners',
'md5': '43662b577c018ad707a63766462b1e87',
'info_dict': {
'id': '2456',
'ext': 'mp4',
'title': 'New experience with Acrobat DC',
'description': 'New experience with Acrobat DC',
'duration': 248.667,
},
}
]
@@ -979,7 +1072,9 @@ class GenericIE(InfoExtractor):
}
if not self._downloader.params.get('test', False) and not is_intentional:
self._downloader.report_warning('Falling back on generic information extractor.')
force = self._downloader.params.get('force_generic_extractor', False)
self._downloader.report_warning(
'%s on generic information extractor.' % ('Forcing' if force else 'Falling back'))
if not full_response:
request = compat_urllib_request.Request(url)
@@ -1031,7 +1126,7 @@ class GenericIE(InfoExtractor):
# Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/rg3/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse.unquote(webpage)
webpage = compat_urllib_parse_unquote(webpage)
# it's tempting to parse this further, but you would
# have to take into account all the variations like
@@ -1089,18 +1184,15 @@ class GenericIE(InfoExtractor):
if matches:
return _playlist_from_matches(matches, ie='RtlNl')
# Look for embedded (iframe) Vimeo player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
if mobj:
player_url = unescapeHTML(mobj.group('url'))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl)
# Look for embedded (swf embed) Vimeo player
mobj = re.search(
r'<embed[^>]+?src="((?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)"', webpage)
if mobj:
return self.url_result(mobj.group(1))
vimeo_url = VimeoIE._extract_vimeo_url(url, webpage)
if vimeo_url is not None:
return self.url_result(vimeo_url)
vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',
webpage, 'vid.me embed', default=None)
if vid_me_embed_url is not None:
return self.url_result(vid_me_embed_url, 'Vidme')
# Look for embedded YouTube player
matches = re.findall(r'''(?x)
@@ -1294,7 +1386,7 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'))
mobj = re.search(r'class=["\']embedly-embed["\'][^>]src=["\'][^"\']*url=(?P<url>[^&]+)', webpage)
if mobj is not None:
return self.url_result(compat_urllib_parse.unquote(mobj.group('url')))
return self.url_result(compat_urllib_parse_unquote(mobj.group('url')))
# Look for funnyordie embed
matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage)
@@ -1327,6 +1419,11 @@ class GenericIE(InfoExtractor):
if pornhub_url:
return self.url_result(pornhub_url, 'PornHub')
# Look for embedded XHamster player
xhamster_urls = XHamsterEmbedIE._extract_urls(webpage)
if xhamster_urls:
return _playlist_from_matches(xhamster_urls, ie='XHamsterEmbed')
# Look for embedded Tvigle player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -1352,11 +1449,23 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed')
# Look for embedded francetv player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?://)?embed\.francetv\.fr/\?ue=.+?)\1',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'))
# Look for embedded smotri.com player
smotri_url = SmotriIE._extract_url(webpage)
if smotri_url:
return self.url_result(smotri_url, 'Smotri')
# Look for embedded Myvi.ru player
myvi_url = MyviIE._extract_url(webpage)
if myvi_url:
return self.url_result(myvi_url)
# Look for embeded soundcloud player
mobj = re.search(
r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
@@ -1436,8 +1545,8 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds
mobj = re.search(
r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_id'\s*:\s*'(?P<id>[^']+)',", webpage)
mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_id'\s*:\s*'(?P<id>[^']+)',", webpage) or
re.search(r'(?s)(["\'])(?:https?:)?//cdnapisec\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?\1.*?entry_id\s*:\s*(["\'])(?P<id>[^\2]+?)\2', webpage))
if mobj is not None:
return self.url_result('kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(), 'Kaltura')
@@ -1494,6 +1603,30 @@ class GenericIE(InfoExtractor):
if senate_isvp_url:
return self.url_result(senate_isvp_url, 'SenateISVP')
# Look for Dailymotion Cloud videos
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud')
# Look for OnionStudios embeds
onionstudios_url = OnionStudiosIE._extract_url(webpage)
if onionstudios_url:
return self.url_result(onionstudios_url)
# Look for SnagFilms embeds
snagfilms_url = SnagFilmsEmbedIE._extract_url(webpage)
if snagfilms_url:
return self.url_result(snagfilms_url)
# Look for AdobeTVVideo embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
webpage)
if mobj is not None:
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))),
'AdobeTVVideo')
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True
@@ -1561,7 +1694,7 @@ class GenericIE(InfoExtractor):
if refresh_header:
found = re.search(REDIRECT_REGEX, refresh_header)
if found:
new_url = compat_urlparse.urljoin(url, found.group(1))
new_url = compat_urlparse.urljoin(url, unescapeHTML(found.group(1)))
self.report_following_redirect(new_url)
return {
'_type': 'url',
@@ -1573,7 +1706,7 @@ class GenericIE(InfoExtractor):
entries = []
for video_url in found:
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url))
video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
# Sometimes, jwplayer extraction will result in a YouTube URL
if YoutubeIE.suitable(video_url):

View File

@@ -6,12 +6,13 @@ from ..utils import (
int_or_none,
float_or_none,
qualities,
ExtractorError,
)
class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?P<id>[^/?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/)?(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
'id': 'DeadlyDecisiveGermanpinscher',
@@ -27,14 +28,33 @@ class GfycatIE(InfoExtractor):
'categories': list,
'age_limit': 0,
}
}
}, {
'url': 'http://gfycat.com/ifr/JauntyTimelyAmazontreeboa',
'info_dict': {
'id': 'JauntyTimelyAmazontreeboa',
'ext': 'mp4',
'title': 'JauntyTimelyAmazontreeboa',
'timestamp': 1411720126,
'upload_date': '20140926',
'uploader': 'anonymous',
'duration': 3.52,
'view_count': int,
'like_count': int,
'dislike_count': int,
'categories': list,
'age_limit': 0,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
gfy = self._download_json(
'http://gfycat.com/cajax/get/%s' % video_id,
video_id, 'Downloading video info')['gfyItem']
video_id, 'Downloading video info')
if 'error' in gfy:
raise ExtractorError('Gfycat said: ' + gfy['error'], expected=True)
gfy = gfy['gfyItem']
title = gfy.get('title') or gfy['gfyName']
description = gfy.get('description')

View File

@@ -78,12 +78,7 @@ class GorillaVidIE(InfoExtractor):
if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
(?:id="[^"]+"\s+)?
value="([^"]*)"
''', webpage))
fields = self._hidden_inputs(webpage)
if fields['op'] == 'download1':
countdown = int_or_none(self._search_regex(

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -19,20 +17,19 @@ class HentaiStigmaIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h2 class="posttitle"><a[^>]*>([^<]+)</a>',
r'<h2[^>]+class="posttitle"[^>]*><a[^>]*>([^<]+)</a>',
webpage, 'title')
wrap_url = self._html_search_regex(
r'<iframe src="([^"]+mp4)"', webpage, 'wrapper url')
r'<iframe[^>]+src="([^"]+mp4)"', webpage, 'wrapper url')
wrap_webpage = self._download_webpage(wrap_url, video_id)
video_url = self._html_search_regex(
r'clip:\s*{\s*url: "([^"]*)"', wrap_webpage, 'video url')
r'file\s*:\s*"([^"]+)"', wrap_webpage, 'video url')
return {
'id': video_id,

View File

@@ -58,11 +58,7 @@ class HostingBulkIE(InfoExtractor):
r'<img src="([^"]+)".+?class="pic"',
webpage, 'thumbnail', fatal=False)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
value="([^"]*)"
''', webpage))
fields = self._hidden_inputs(webpage)
request = compat_urllib_request.Request(url, urlencode_postdata(fields))
request.add_header('Content-type', 'application/x-www-form-urlencoded')

View File

@@ -1,8 +1,7 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_iso8601
class HowcastIE(InfoExtractor):
@@ -13,29 +12,31 @@ class HowcastIE(InfoExtractor):
'info_dict': {
'id': '390161',
'ext': 'mp4',
'description': 'The square knot, also known as the reef knot, is one of the oldest, most basic knots to tie, and can be used in many different ways. Here\'s the proper way to tie a square knot.',
'title': 'How to Tie a Square Knot Properly',
}
'description': 'md5:dbe792e5f6f1489027027bf2eba188a3',
'timestamp': 1276081287,
'upload_date': '20100609',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
video_url = self._search_regex(r'\'?file\'?: "(http://mobile-media\.howcast\.com/[0-9]+\.mp4)',
webpage, 'video URL')
video_description = self._html_search_regex(r'<meta content=(?:"([^"]+)"|\'([^\']+)\') name=\'description\'',
webpage, 'description', fatal=False)
embed_code = self._search_regex(
r'<iframe[^>]+src="[^"]+\bembed_code=([^\b]+)\b',
webpage, 'ooyala embed code')
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % embed_code,
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': video_description,
'thumbnail': self._og_search_thumbnail(webpage),
'timestamp': parse_iso8601(self._html_search_meta(
'article:published_time', webpage, 'timestamp')),
}

View File

@@ -10,7 +10,7 @@ from ..utils import (
class HowStuffWorksIE(InfoExtractor):
_VALID_URL = r'https?://[\da-z-]+\.howstuffworks\.com/(?:[^/]+/)*\d+-(?P<id>.+?)-video\.htm'
_VALID_URL = r'https?://[\da-z-]+\.howstuffworks\.com/(?:[^/]+/)*(?:\d+-)?(?P<id>.+?)-video\.htm'
_TESTS = [
{
'url': 'http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm',
@@ -46,6 +46,10 @@ class HowStuffWorksIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
},
},
{
'url': 'http://shows.howstuffworks.com/stuff-to-blow-your-mind/optical-illusions-video.htm',
'only_matching': True,
}
]
def _real_extract(self, url):

View File

@@ -46,7 +46,7 @@ class ImdbIE(InfoExtractor):
format_info = info['videoPlayerObject']['video']
formats.append({
'format_id': f_id,
'url': format_info['url'],
'url': format_info['videoInfoList'][0]['videoUrl'],
})
return {

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class InaIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?ina\.fr/video/(?P<id>I?[A-Z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?ina\.fr/video/(?P<id>I?[A-Z0-9]+)'
_TEST = {
'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
'md5': 'a667021bf2b41f8dc6049479d9bb38a3',

View File

@@ -4,14 +4,15 @@ import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urlparse,
)
class InfoQIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?infoq\.com/[^/]+/(?P<id>[^/]+)$'
_VALID_URL = r'https?://(?:www\.)?infoq\.com/(?:[^/]+/)+(?P<id>[^/]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things',
'md5': 'b5ca0e0a8c1fed93b0e65e48e462f9a2',
'info_dict': {
@@ -20,7 +21,10 @@ class InfoQIE(InfoExtractor):
'description': 'Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.',
'title': 'A Few of My Favorite [Python] Things',
},
}
}, {
'url': 'http://www.infoq.com/fr/presentations/changez-avis-sur-javascript',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -35,14 +39,14 @@ class InfoQIE(InfoExtractor):
# Extract video URL
encoded_id = self._search_regex(
r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id')
real_id = compat_urllib_parse.unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
real_id = compat_urllib_parse_unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
playpath = 'mp4:' + real_id
video_filename = playpath.split('/')[-1]
video_id, extension = video_filename.split('.')
http_base = self._search_regex(
r'EXPRESSINSTALL_SWF\s*=\s*"(https?://[^/"]+/)', webpage,
r'EXPRESSINSTALL_SWF\s*=\s*[^"]*"((?:https?:)?//[^/"]+/)', webpage,
'HTTP base URL')
formats = [{
@@ -52,7 +56,7 @@ class InfoQIE(InfoExtractor):
'play_path': playpath,
}, {
'format_id': 'http',
'url': http_base + real_id,
'url': compat_urlparse.urljoin(url, http_base) + real_id,
}]
self._sort_formats(formats)

View File

@@ -3,23 +3,18 @@ from __future__ import unicode_literals
import hashlib
import math
import os.path
import random
import re
import time
import uuid
import zlib
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
url_basename,
)
from ..utils import ExtractorError
class IqiyiIE(InfoExtractor):
IE_NAME = 'iqiyi'
IE_DESC = '爱奇艺'
_VALID_URL = r'http://(?:www\.)iqiyi.com/v_.+?\.html'
@@ -38,62 +33,57 @@ class IqiyiIE(InfoExtractor):
'title': '名侦探柯南第752集',
},
'playlist': [{
'md5': '7e49376fecaffa115d951634917fe105',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part1',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'md5': '41b75ba13bb7ac0e411131f92bc4f6ca',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part2',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'md5': '0cee1dd0a3d46a83e71e2badeae2aab0',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part3',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'md5': '4f8ad72373b0c491b582e7c196b0b1f9',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part4',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'md5': 'd89ad028bcfad282918e8098e811711d',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part5',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'md5': '9cb1e5c95da25dff0660c32ae50903b7',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part6',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'md5': '155116e0ff1867bbc9b98df294faabc9',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part7',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'md5': '53f5db77622ae14fa493ed2a278a082b',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part8',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}],
'params': {
'skip_download': True,
},
}]
_FORMATS_MAP = [
@@ -211,20 +201,7 @@ class IqiyiIE(InfoExtractor):
return raw_data
def get_enc_key(self, swf_url, video_id):
filename, _ = os.path.splitext(url_basename(swf_url))
enc_key_json = self._downloader.cache.load('iqiyi-enc-key', filename)
if enc_key_json is not None:
return enc_key_json[0]
req = self._request_webpage(
swf_url, video_id, note='download swf content')
cn = req.read()
cn = zlib.decompress(cn[8:])
pt = re.compile(b'MixerRemote\x08(?P<enc_key>.+?)\$&vv')
enc_key = self._search_regex(pt, cn, 'enc_key').decode('utf8')
self._downloader.cache.store('iqiyi-enc-key', filename, [enc_key])
enc_key = '8e29ab5666d041c3a1ea76e06dabdffb'
return enc_key
def _real_extract(self, url):

View File

@@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_start
class Ir90TvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?90tv\.ir/video/(?P<id>[0-9]+)/.*'
_TESTS = [{
'url': 'http://90tv.ir/video/95719/%D8%B4%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA-%D9%86%D9%82%D9%84-%D9%88-%D8%A7%D9%86%D8%AA%D9%82%D8%A7%D9%84%D8%A7%D8%AA-%D9%85%D9%87%D9%85-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7-940218',
'md5': '411dbd94891381960cb9e13daa47a869',
'info_dict': {
'id': '95719',
'ext': 'mp4',
'title': 'شایعات نقل و انتقالات مهم فوتبال اروپا 94/02/18',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'url': 'http://www.90tv.ir/video/95719/%D8%B4%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA-%D9%86%D9%82%D9%84-%D9%88-%D8%A7%D9%86%D8%AA%D9%82%D8%A7%D9%84%D8%A7%D8%AA-%D9%85%D9%87%D9%85-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7-940218',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = remove_start(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), '90tv.ir :: ')
video_url = self._search_regex(
r'<source[^>]+src="([^"]+)"', webpage, 'video url')
thumbnail = self._search_regex(r'poster="([^"]+)"', webpage, 'thumbnail url', fatal=False)
return {
'url': video_url,
'id': video_id,
'title': title,
'video_url': video_url,
'thumbnail': thumbnail,
}

View File

@@ -8,9 +8,9 @@ from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm'
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)\.htm'
_TEST = {
_TESTS = [{
'url': 'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
'md5': '046e491afb32a8aaac1f44dd4ddd54ee',
'info_dict': {
@@ -19,7 +19,10 @@ class JeuxVideoIE(InfoExtractor):
'title': 'Tearaway : GC 2013 : Tearaway nous présente ses papiers d\'identité',
'description': 'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.',
},
}
}, {
'url': 'http://www.jeuxvideo.com/videos/chroniques/434220/l-histoire-du-jeu-video-la-saturn.htm',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -2,7 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
js_to_json,
)
@@ -24,7 +24,7 @@ class KaraoketvIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
page_video_url = self._og_search_video_url(webpage, video_id)
config_json = compat_urllib_parse.unquote_plus(self._search_regex(
config_json = compat_urllib_parse_unquote_plus(self._search_regex(
r'config=(.*)', page_video_url, 'configuration'))
urls_info_json = self._download_json(

View File

@@ -0,0 +1,314 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..utils import (
get_element_by_id,
clean_html,
ExtractorError,
remove_start,
)
class KuwoBaseIE(InfoExtractor):
_FORMATS = [
{'format': 'ape', 'ext': 'ape', 'preference': 100},
{'format': 'mp3-320', 'ext': 'mp3', 'br': '320kmp3', 'abr': 320, 'preference': 80},
{'format': 'mp3-192', 'ext': 'mp3', 'br': '192kmp3', 'abr': 192, 'preference': 70},
{'format': 'mp3-128', 'ext': 'mp3', 'br': '128kmp3', 'abr': 128, 'preference': 60},
{'format': 'wma', 'ext': 'wma', 'preference': 20},
{'format': 'aac', 'ext': 'aac', 'abr': 48, 'preference': 10}
]
def _get_formats(self, song_id):
formats = []
for file_format in self._FORMATS:
song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s?format=%s&br=%s&rid=MUSIC_%s&type=convert_url&response=url' %
(file_format['ext'], file_format.get('br', ''), song_id),
song_id, note='Download %s url info' % file_format['format'],
)
if song_url.startswith('http://') or song_url.startswith('https://'):
formats.append({
'url': song_url,
'format_id': file_format['format'],
'format': file_format['format'],
'preference': file_format['preference'],
'abr': file_format.get('abr'),
})
self._sort_formats(formats)
return formats
class KuwoIE(KuwoBaseIE):
IE_NAME = 'kuwo:song'
IE_DESC = '酷我音乐'
_VALID_URL = r'http://www\.kuwo\.cn/yinyue/(?P<id>\d+?)/'
_TESTS = [{
'url': 'http://www.kuwo.cn/yinyue/635632/',
'info_dict': {
'id': '635632',
'ext': 'ape',
'title': '爱我别走',
'creator': '张震岳',
'upload_date': '20080122',
'description': 'md5:ed13f58e3c3bf3f7fd9fbc4e5a7aa75c'
},
}, {
'url': 'http://www.kuwo.cn/yinyue/6446136/',
'info_dict': {
'id': '6446136',
'ext': 'mp3',
'title': '',
'creator': 'IU',
'upload_date': '20150518',
},
'params': {
'format': 'mp3-320'
},
}]
def _real_extract(self, url):
song_id = self._match_id(url)
webpage = self._download_webpage(
url, song_id, note='Download song detail info',
errnote='Unable to get song detail info')
song_name = self._html_search_regex(
r'<h1[^>]+title="([^"]+)">', webpage, 'song name')
singer_name = self._html_search_regex(
r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"',
webpage, 'singer name', fatal=False)
lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
if lrc_content == '暂无': # indicates no lyrics
lrc_content = None
formats = self._get_formats(song_id)
album_id = self._html_search_regex(
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
webpage, 'album id', fatal=False)
publish_time = None
if album_id is not None:
album_info_page = self._download_webpage(
'http://www.kuwo.cn/album/%s/' % album_id, song_id,
note='Download album detail info',
errnote='Unable to get album detail info')
publish_time = self._html_search_regex(
r'发行时间:(\d{4}-\d{2}-\d{2})', album_info_page,
'publish time', fatal=False)
if publish_time:
publish_time = publish_time.replace('-', '')
return {
'id': song_id,
'title': song_name,
'creator': singer_name,
'upload_date': publish_time,
'description': lrc_content,
'formats': formats,
}
class KuwoAlbumIE(InfoExtractor):
IE_NAME = 'kuwo:album'
IE_DESC = '酷我音乐 - 专辑'
_VALID_URL = r'http://www\.kuwo\.cn/album/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': {
'id': '502294',
'title': 'M',
'description': 'md5:6a7235a84cc6400ec3b38a7bdaf1d60c',
},
'playlist_count': 2,
}
def _real_extract(self, url):
album_id = self._match_id(url)
webpage = self._download_webpage(
url, album_id, note='Download album info',
errnote='Unable to get album info')
album_name = self._html_search_regex(
r'<div[^>]+class="comm"[^<]+<h1[^>]+title="([^"]+)"', webpage,
'album name')
album_intro = remove_start(
clean_html(get_element_by_id('intro', webpage)),
'%s简介:' % album_name)
entries = [
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<p[^>]+class="listen"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+/)"',
webpage)
]
return self.playlist_result(entries, album_id, album_name, album_intro)
class KuwoChartIE(InfoExtractor):
IE_NAME = 'kuwo:chart'
IE_DESC = '酷我音乐 - 排行榜'
_VALID_URL = r'http://yinyue\.kuwo\.cn/billboard_(?P<id>[^.]+).htm'
_TEST = {
'url': 'http://yinyue.kuwo.cn/billboard_香港中文龙虎榜.htm',
'info_dict': {
'id': '香港中文龙虎榜',
'title': '香港中文龙虎榜',
'description': 're:\d{4}\d{2}',
},
'playlist_mincount': 10,
}
def _real_extract(self, url):
chart_id = self._match_id(url)
webpage = self._download_webpage(
url, chart_id, note='Download chart info',
errnote='Unable to get chart info')
chart_name = self._html_search_regex(
r'<h1[^>]+class="unDis">([^<]+)</h1>', webpage, 'chart name')
chart_desc = self._html_search_regex(
r'<p[^>]+class="tabDef">(\d{4}\d{2}期)</p>', webpage, 'chart desc')
entries = [
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)/"', webpage)
]
return self.playlist_result(entries, chart_id, chart_name, chart_desc)
class KuwoSingerIE(InfoExtractor):
IE_NAME = 'kuwo:singer'
IE_DESC = '酷我音乐 - 歌手'
_VALID_URL = r'http://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': {
'id': 'bruno+mars',
'title': 'Bruno Mars',
},
'playlist_count': 10,
}, {
'url': 'http://www.kuwo.cn/mingxing/Ali/music.htm',
'info_dict': {
'id': 'Ali',
'title': 'Ali',
},
'playlist_mincount': 95,
}]
def _real_extract(self, url):
singer_id = self._match_id(url)
webpage = self._download_webpage(
url, singer_id, note='Download singer info',
errnote='Unable to get singer info')
singer_name = self._html_search_regex(
r'<div class="title clearfix">\s*<h1>([^<]+)<span', webpage, 'singer name'
)
entries = []
first_page_only = False if re.search(r'/music(?:_\d+)?\.htm', url) else True
for page_num in itertools.count(1):
webpage = self._download_webpage(
'http://www.kuwo.cn/mingxing/%s/music_%d.htm' % (singer_id, page_num),
singer_id, note='Download song list page #%d' % page_num,
errnote='Unable to get song list page #%d' % page_num)
entries.extend([
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<p[^>]+class="m_name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)/',
webpage)
][:10 if first_page_only else None])
if first_page_only or not re.search(r'<a[^>]+href="[^"]+">下一页</a>', webpage):
break
return self.playlist_result(entries, singer_id, singer_name)
class KuwoCategoryIE(InfoExtractor):
IE_NAME = 'kuwo:category'
IE_DESC = '酷我音乐 - 分类'
_VALID_URL = r'http://yinyue\.kuwo\.cn/yy/cinfo_(?P<id>\d+?).htm'
_TEST = {
'url': 'http://yinyue.kuwo.cn/yy/cinfo_86375.htm',
'info_dict': {
'id': '86375',
'title': '八十年代精选',
'description': '这些都是属于八十年代的回忆!',
},
'playlist_count': 30,
}
def _real_extract(self, url):
category_id = self._match_id(url)
webpage = self._download_webpage(
url, category_id, note='Download category info',
errnote='Unable to get category info')
category_name = self._html_search_regex(
r'<h1[^>]+title="([^<>]+?)">[^<>]+?</h1>', webpage, 'category name')
category_desc = remove_start(
get_element_by_id('intro', webpage).strip(),
'%s简介:' % category_name)
jsonm = self._parse_json(self._html_search_regex(
r'var\s+jsonm\s*=\s*([^;]+);', webpage, 'category songs'), category_id)
entries = [
self.url_result('http://www.kuwo.cn/yinyue/%s/' % song['musicrid'], 'Kuwo')
for song in jsonm['musiclist']
]
return self.playlist_result(entries, category_id, category_name, category_desc)
class KuwoMvIE(KuwoBaseIE):
IE_NAME = 'kuwo:mv'
IE_DESC = '酷我音乐 - MV'
_VALID_URL = r'http://www\.kuwo\.cn/mv/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/mv/6480076/',
'info_dict': {
'id': '6480076',
'ext': 'mkv',
'title': '我们家MV',
'creator': '2PM',
},
}
_FORMATS = KuwoBaseIE._FORMATS + [
{'format': 'mkv', 'ext': 'mkv', 'preference': 250},
{'format': 'mp4', 'ext': 'mp4', 'preference': 200},
]
def _real_extract(self, url):
song_id = self._match_id(url)
webpage = self._download_webpage(
url, song_id, note='Download mv detail info: %s' % song_id,
errnote='Unable to get mv detail info: %s' % song_id)
mobj = re.search(
r'<h1[^>]+title="(?P<song>[^"]+)">[^<]+<span[^>]+title="(?P<singer>[^"]+)"',
webpage)
if mobj:
song_name = mobj.group('song')
singer_name = mobj.group('singer')
else:
raise ExtractorError('Unable to find song or singer names')
formats = self._get_formats(song_id)
return {
'id': song_id,
'title': song_name,
'creator': singer_name,
'formats': formats,
}

View File

@@ -0,0 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_duration,
int_or_none,
)
class Lecture2GoIE(InfoExtractor):
_VALID_URL = r'https?://lecture2go\.uni-hamburg\.de/veranstaltungen/-/v/(?P<id>\d+)'
_TEST = {
'url': 'https://lecture2go.uni-hamburg.de/veranstaltungen/-/v/17473',
'md5': 'ac02b570883020d208d405d5a3fd2f7f',
'info_dict': {
'id': '17473',
'ext': 'flv',
'title': '2 - Endliche Automaten und reguläre Sprachen',
'creator': 'Frank Heitmann',
'duration': 5220,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
formats = []
for url in set(re.findall(r'"src","([^"]+)"', webpage)):
ext = determine_ext(url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(url, video_id))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(url, video_id))
else:
formats.append({
'url': url,
})
self._sort_formats(formats)
creator = self._html_search_regex(
r'<div[^>]+id="description">([^<]+)</div>', webpage, 'creator', fatal=False)
duration = parse_duration(self._html_search_regex(
r'Duration:\s*</em>\s*<em[^>]*>([^<]+)</em>', webpage, 'duration', fatal=False))
view_count = int_or_none(self._html_search_regex(
r'Views:\s*</em>\s*<em[^>]+>(\d+)</em>', webpage, 'view count', fatal=False))
return {
'id': video_id,
'title': title,
'formats': formats,
'creator': creator,
'duration': duration,
'view_count': view_count,
}

View File

@@ -15,10 +15,12 @@ from ..utils import (
determine_ext,
ExtractorError,
parse_iso8601,
int_or_none,
)
class LetvIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'http://www\.letv\.com/ptv/vplay/(?P<id>\d+).html'
_TESTS = [{
@@ -133,7 +135,7 @@ class LetvIE(InfoExtractor):
}
if format_id[-1:] == 'p':
url_info_dict['height'] = format_id[:-1]
url_info_dict['height'] = int_or_none(format_id[:-1])
urls.append(url_info_dict)

View File

@@ -8,6 +8,7 @@ from ..compat import compat_urlparse
from ..utils import (
determine_ext,
int_or_none,
remove_end,
unified_strdate,
ExtractorError,
)
@@ -39,7 +40,6 @@ class LifeNewsIE(InfoExtractor):
'title': 'В Сети появилось видео захвата «Правым сектором» колхозных полей ',
'description': 'Жители двух поселков Днепропетровской области не простили радикалам угрозу лишения плодородных земель и пошли в лобовую. ',
'upload_date': '20150402',
'uploader': 'embed.life.ru',
}
}, {
'url': 'http://lifenews.ru/news/153461',
@@ -50,7 +50,6 @@ class LifeNewsIE(InfoExtractor):
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'upload_date': '20150505',
'uploader': 'embed.life.ru',
}
}, {
'url': 'http://lifenews.ru/video/13035',
@@ -72,20 +71,20 @@ class LifeNewsIE(InfoExtractor):
if not videos and not iframe_link:
raise ExtractorError('No media links available for %s' % video_id)
title = self._og_search_title(webpage)
TITLE_SUFFIX = ' - Первый по срочным новостям — LIFE | NEWS'
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
title = remove_end(
self._og_search_title(webpage),
' - Первый по срочным новостям — LIFE | NEWS')
description = self._og_search_description(webpage)
view_count = self._html_search_regex(
r'<div class=\'views\'>\s*(\d+)\s*</div>', webpage, 'view count', fatal=False)
comment_count = self._html_search_regex(
r'<div class=\'comments\'>\s*<span class=\'counter\'>\s*(\d+)\s*</span>', webpage, 'comment count', fatal=False)
r'=\'commentCount\'[^>]*>\s*(\d+)\s*<',
webpage, 'comment count', fatal=False)
upload_date = self._html_search_regex(
r'<time datetime=\'([^\']+)\'>', webpage, 'upload date', fatal=False)
r'<time[^>]*datetime=\'([^\']+)\'', webpage, 'upload date', fatal=False)
if upload_date is not None:
upload_date = unified_strdate(upload_date)

View File

@@ -30,13 +30,13 @@ class LyndaBaseIE(InfoExtractor):
return
login_form = {
'username': username,
'password': password,
'username': username.encode('utf-8'),
'password': password.encode('utf-8'),
'remember': 'false',
'stayPut': 'false'
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
login_page = self._download_webpage(
request, None, 'Logging in as %s' % username)
@@ -65,7 +65,7 @@ class LyndaBaseIE(InfoExtractor):
'stayPut': 'false',
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form))
self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form).encode('utf-8'))
login_page = self._download_webpage(
request, None,
'Confirming log in and log out from another device')

View File

@@ -2,9 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
)
from ..compat import compat_urllib_parse_unquote
class MalemotionIE(InfoExtractor):
@@ -24,7 +22,7 @@ class MalemotionIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = compat_urllib_parse.unquote(self._search_regex(
video_url = compat_urllib_parse_unquote(self._search_regex(
r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title')

View File

@@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request,
)
from ..utils import (
@@ -155,7 +156,7 @@ class MetacafeIE(InfoExtractor):
video_url = None
mobj = re.search(r'(?m)&mediaURL=([^&]+)', webpage)
if mobj is not None:
mediaURL = compat_urllib_parse.unquote(mobj.group(1))
mediaURL = compat_urllib_parse_unquote(mobj.group(1))
video_ext = mediaURL[-3:]
# Extract gdaKey if available

View File

@@ -5,6 +5,7 @@ import json
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
@@ -48,7 +49,7 @@ class MiTeleIE(InfoExtractor):
domain = 'http://' + domain
info_url = compat_urlparse.urljoin(
domain,
compat_urllib_parse.unquote(embed_data['flashvars']['host'])
compat_urllib_parse_unquote(embed_data['flashvars']['host'])
)
info_el = self._download_xml(info_url, episode).find('./video/info')

View File

@@ -3,9 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
)
from ..compat import compat_urllib_parse_unquote
from ..utils import (
ExtractorError,
HEADRequest,
@@ -60,7 +58,7 @@ class MixcloudIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
uploader = mobj.group(1)
cloudcast_name = mobj.group(2)
track_id = compat_urllib_parse.unquote('-'.join((uploader, cloudcast_name)))
track_id = compat_urllib_parse_unquote('-'.join((uploader, cloudcast_name)))
webpage = self._download_webpage(url, track_id)

View File

@@ -5,9 +5,9 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
@@ -34,7 +34,7 @@ class MofosexIE(InfoExtractor):
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1>(.+?)<', webpage, 'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, 'video_url'))
video_url = compat_urllib_parse_unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, 'video_url'))
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]

View File

@@ -35,7 +35,8 @@ class MySpassIE(InfoExtractor):
# get metadata
metadata_url = META_DATA_URL_TEMPLATE % video_id
metadata = self._download_xml(metadata_url, video_id)
metadata = self._download_xml(
metadata_url, video_id, transform_source=lambda s: s.strip())
# extract values from metadata
url_flv_el = metadata.find('url_flv')

View File

@@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .vimple import SprutoBaseIE
class MyviIE(SprutoBaseIE):
_VALID_URL = r'''(?x)
https?://
myvi\.(?:ru/player|tv)/
(?:
(?:
embed/html|
flash|
api/Video/Get
)/|
content/preloader\.swf\?.*\bid=
)
(?P<id>[\da-zA-Z_-]+)
'''
_TESTS = [{
'url': 'http://myvi.ru/player/embed/html/oOy4euHA6LVwNNAjhD9_Jq5Ha2Qf0rtVMVFMAZav8wObeRTZaCATzucDQIDph8hQU0',
'md5': '571bbdfba9f9ed229dc6d34cc0f335bf',
'info_dict': {
'id': 'f16b2bbd-cde8-481c-a981-7cd48605df43',
'ext': 'mp4',
'title': 'хозяин жизни',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 25,
},
}, {
'url': 'http://myvi.ru/player/content/preloader.swf?id=oOy4euHA6LVwNNAjhD9_Jq5Ha2Qf0rtVMVFMAZav8wOYf1WFpPfc_bWTKGVf_Zafr0',
'only_matching': True,
}, {
'url': 'http://myvi.ru/player/api/Video/Get/oOy4euHA6LVwNNAjhD9_Jq5Ha2Qf0rtVMVFMAZav8wObeRTZaCATzucDQIDph8hQU0',
'only_matching': True,
}, {
'url': 'http://myvi.tv/embed/html/oTGTNWdyz4Zwy_u1nraolwZ1odenTd9WkTnRfIL9y8VOgHYqOHApE575x4_xxS9Vn0?ap=0',
'only_matching': True,
}, {
'url': 'http://myvi.ru/player/flash/ocp2qZrHI-eZnHKQBK4cZV60hslH8LALnk0uBfKsB-Q4WnY26SeGoYPi8HWHxu0O30',
'only_matching': True,
}]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//myvi\.(?:ru/player|tv)/(?:embed/html|flash)/[^"]+)\1', webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
spruto = self._download_json(
'http://myvi.ru/player/api/Video/Get/%s?sig' % video_id, video_id)['sprutoData']
return self._extract_spruto(spruto, video_id)

View File

@@ -10,6 +10,7 @@ from .common import InfoExtractor
from ..compat import (
compat_ord,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request,
)
from ..utils import (
@@ -107,7 +108,7 @@ class MyVideoIE(InfoExtractor):
if not a == '_encxml':
params[a] = b
else:
encxml = compat_urllib_parse.unquote(b)
encxml = compat_urllib_parse_unquote(b)
if not params.get('domain'):
params['domain'] = 'www.myvideo.de'
xmldata_url = '%s?%s' % (encxml, compat_urllib_parse.urlencode(params))
@@ -135,7 +136,7 @@ class MyVideoIE(InfoExtractor):
video_url = None
mobj = re.search('connectionurl=\'(.*?)\'', dec_data)
if mobj:
video_url = compat_urllib_parse.unquote(mobj.group(1))
video_url = compat_urllib_parse_unquote(mobj.group(1))
if 'myvideo2flash' in video_url:
self.report_warning(
'Rewriting URL to use unencrypted rtmp:// ...',
@@ -147,10 +148,10 @@ class MyVideoIE(InfoExtractor):
mobj = re.search('path=\'(http.*?)\' source=\'(.*?)\'', dec_data)
if mobj is None:
raise ExtractorError('unable to extract url')
video_url = compat_urllib_parse.unquote(mobj.group(1)) + compat_urllib_parse.unquote(mobj.group(2))
video_url = compat_urllib_parse_unquote(mobj.group(1)) + compat_urllib_parse_unquote(mobj.group(2))
video_file = self._search_regex('source=\'(.*?)\'', dec_data, 'video file')
video_file = compat_urllib_parse.unquote(video_file)
video_file = compat_urllib_parse_unquote(video_file)
if not video_file.endswith('f4m'):
ppath, prefix = video_file.split('.')
@@ -159,7 +160,7 @@ class MyVideoIE(InfoExtractor):
video_playpath = ''
video_swfobj = self._search_regex('swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj')
video_swfobj = compat_urllib_parse.unquote(video_swfobj)
video_swfobj = compat_urllib_parse_unquote(video_swfobj)
video_title = self._html_search_regex("<h1(?: class='globalHd')?>(.*?)</h1>",
webpage, 'title')

View File

@@ -8,25 +8,40 @@ from ..utils import (
class NationalGeographicIE(InfoExtractor):
_VALID_URL = r'http://video\.nationalgeographic\.com/video/.*?'
_VALID_URL = r'http://video\.nationalgeographic\.com/.*?'
_TEST = {
'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo',
'info_dict': {
'id': '4DmDACA6Qtk_',
'ext': 'flv',
'title': 'Mating Crabs Busted by Sharks',
'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3',
_TESTS = [
{
'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo',
'info_dict': {
'id': '4DmDACA6Qtk_',
'ext': 'flv',
'title': 'Mating Crabs Busted by Sharks',
'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3',
},
'add_ie': ['ThePlatform'],
},
'add_ie': ['ThePlatform'],
}
{
'url': 'http://video.nationalgeographic.com/wild/when-sharks-attack/the-real-jaws',
'info_dict': {
'id': '_JeBD_D7PlS5',
'ext': 'flv',
'title': 'The Real Jaws',
'description': 'md5:8d3e09d9d53a85cd397b4b21b2c77be6',
},
'add_ie': ['ThePlatform'],
},
]
def _real_extract(self, url):
name = url_basename(url)
webpage = self._download_webpage(url, name)
feed_url = self._search_regex(r'data-feed-url="([^"]+)"', webpage, 'feed url')
guid = self._search_regex(r'data-video-guid="([^"]+)"', webpage, 'guid')
feed_url = self._search_regex(
r'data-feed-url="([^"]+)"', webpage, 'feed url')
guid = self._search_regex(
r'id="(?:videoPlayer|player-container)"[^>]+data-guid="([^"]+)"',
webpage, 'guid')
feed = self._download_xml('%s?byGuid=%s' % (feed_url, guid), name)
content = feed.find('.//{http://search.yahoo.com/mrss/}content')
@@ -34,5 +49,6 @@ class NationalGeographicIE(InfoExtractor):
return self.url_result(smuggle_url(
'http://link.theplatform.com/s/ngs/%s?format=SMIL&formats=MPEG4&manifest=f4m' % theplatform_id,
# For some reason, the normal links don't work and we must force the use of f4m
# For some reason, the normal links don't work and we must force
# the use of f4m
{'force_smil_url': True}))

View File

@@ -0,0 +1,459 @@
# coding: utf-8
from __future__ import unicode_literals
from hashlib import md5
from base64 import b64encode
from datetime import datetime
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
compat_str,
compat_itertools_count,
)
class NetEaseMusicBaseIE(InfoExtractor):
_FORMATS = ['bMusic', 'mMusic', 'hMusic']
_NETEASE_SALT = '3go8&$8*3*3h0k(2)2'
_API_BASE = 'http://music.163.com/api/'
@classmethod
def _encrypt(cls, dfsid):
salt_bytes = bytearray(cls._NETEASE_SALT.encode('utf-8'))
string_bytes = bytearray(compat_str(dfsid).encode('ascii'))
salt_len = len(salt_bytes)
for i in range(len(string_bytes)):
string_bytes[i] = string_bytes[i] ^ salt_bytes[i % salt_len]
m = md5()
m.update(bytes(string_bytes))
result = b64encode(m.digest()).decode('ascii')
return result.replace('/', '_').replace('+', '-')
@classmethod
def extract_formats(cls, info):
formats = []
for song_format in cls._FORMATS:
details = info.get(song_format)
if not details:
continue
formats.append({
'url': 'http://m1.music.126.net/%s/%s.%s' %
(cls._encrypt(details['dfsId']), details['dfsId'],
details['extension']),
'ext': details.get('extension'),
'abr': details.get('bitrate', 0) / 1000,
'format_id': song_format,
'filesize': details.get('size'),
'asr': details.get('sr')
})
return formats
@classmethod
def convert_milliseconds(cls, ms):
return int(round(ms / 1000.0))
def query_api(self, endpoint, video_id, note):
req = compat_urllib_request.Request('%s%s' % (self._API_BASE, endpoint))
req.add_header('Referer', self._API_BASE)
return self._download_json(req, video_id, note)
class NetEaseMusicIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:song'
IE_DESC = '网易云音乐'
_VALID_URL = r'https?://music\.163\.com/(#/)?song\?id=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://music.163.com/#/song?id=32102397',
'md5': 'f2e97280e6345c74ba9d5677dd5dcb45',
'info_dict': {
'id': '32102397',
'ext': 'mp3',
'title': 'Bad Blood (feat. Kendrick Lamar)',
'creator': 'Taylor Swift / Kendrick Lamar',
'upload_date': '20150517',
'timestamp': 1431878400,
'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
},
}, {
'note': 'No lyrics translation.',
'url': 'http://music.163.com/#/song?id=29822014',
'info_dict': {
'id': '29822014',
'ext': 'mp3',
'title': '听见下雨的声音',
'creator': '周杰伦',
'upload_date': '20141225',
'timestamp': 1419523200,
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
},
}, {
'note': 'No lyrics.',
'url': 'http://music.163.com/song?id=17241424',
'info_dict': {
'id': '17241424',
'ext': 'mp3',
'title': 'Opus 28',
'creator': 'Dustin O\'Halloran',
'upload_date': '20080211',
'timestamp': 1202745600,
},
}, {
'note': 'Has translated name.',
'url': 'http://music.163.com/#/song?id=22735043',
'info_dict': {
'id': '22735043',
'ext': 'mp3',
'title': '소원을 말해봐 (Genie)',
'creator': '少女时代',
'description': 'md5:79d99cc560e4ca97e0c4d86800ee4184',
'upload_date': '20100127',
'timestamp': 1264608000,
'alt_title': '说出愿望吧(Genie)',
}
}]
def _process_lyrics(self, lyrics_info):
original = lyrics_info.get('lrc', {}).get('lyric')
translated = lyrics_info.get('tlyric', {}).get('lyric')
if not translated:
return original
lyrics_expr = r'(\[[0-9]{2}:[0-9]{2}\.[0-9]{2,}\])([^\n]+)'
original_ts_texts = re.findall(lyrics_expr, original)
translation_ts_dict = dict(
(time_stamp, text) for time_stamp, text in re.findall(lyrics_expr, translated)
)
lyrics = '\n'.join([
'%s%s / %s' % (time_stamp, text, translation_ts_dict.get(time_stamp, ''))
for time_stamp, text in original_ts_texts
])
return lyrics
def _real_extract(self, url):
song_id = self._match_id(url)
params = {
'id': song_id,
'ids': '[%s]' % song_id
}
info = self.query_api(
'song/detail?' + compat_urllib_parse.urlencode(params),
song_id, 'Downloading song info')['songs'][0]
formats = self.extract_formats(info)
self._sort_formats(formats)
lyrics_info = self.query_api(
'song/lyric?id=%s&lv=-1&tv=-1' % song_id,
song_id, 'Downloading lyrics data')
lyrics = self._process_lyrics(lyrics_info)
alt_title = None
if info.get('transNames'):
alt_title = '/'.join(info.get('transNames'))
return {
'id': song_id,
'title': info['name'],
'alt_title': alt_title,
'creator': ' / '.join([artist['name'] for artist in info.get('artists', [])]),
'timestamp': self.convert_milliseconds(info.get('album', {}).get('publishTime')),
'thumbnail': info.get('album', {}).get('picUrl'),
'duration': self.convert_milliseconds(info.get('duration', 0)),
'description': lyrics,
'formats': formats,
}
class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:album'
IE_DESC = '网易云音乐 - 专辑'
_VALID_URL = r'https?://music\.163\.com/(#/)?album\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://music.163.com/#/album?id=220780',
'info_dict': {
'id': '220780',
'title': 'B\'day',
},
'playlist_count': 23,
}
def _real_extract(self, url):
album_id = self._match_id(url)
info = self.query_api(
'album/%s?id=%s' % (album_id, album_id),
album_id, 'Downloading album data')['album']
name = info['name']
desc = info.get('description')
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song['id'],
'NetEaseMusic', song['id'])
for song in info['songs']
]
return self.playlist_result(entries, album_id, name, desc)
class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:singer'
IE_DESC = '网易云音乐 - 歌手'
_VALID_URL = r'https?://music\.163\.com/(#/)?artist\?id=(?P<id>[0-9]+)'
_TESTS = [{
'note': 'Singer has aliases.',
'url': 'http://music.163.com/#/artist?id=10559',
'info_dict': {
'id': '10559',
'title': '张惠妹 - aMEI;阿密特',
},
'playlist_count': 50,
}, {
'note': 'Singer has translated name.',
'url': 'http://music.163.com/#/artist?id=124098',
'info_dict': {
'id': '124098',
'title': '李昇基 - 이승기',
},
'playlist_count': 50,
}]
def _real_extract(self, url):
singer_id = self._match_id(url)
info = self.query_api(
'artist/%s?id=%s' % (singer_id, singer_id),
singer_id, 'Downloading singer data')
name = info['artist']['name']
if info['artist']['trans']:
name = '%s - %s' % (name, info['artist']['trans'])
if info['artist']['alias']:
name = '%s - %s' % (name, ';'.join(info['artist']['alias']))
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song['id'],
'NetEaseMusic', song['id'])
for song in info['hotSongs']
]
return self.playlist_result(entries, singer_id, name)
class NetEaseMusicListIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:playlist'
IE_DESC = '网易云音乐 - 歌单'
_VALID_URL = r'https?://music\.163\.com/(#/)?(playlist|discover/toplist)\?id=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://music.163.com/#/playlist?id=79177352',
'info_dict': {
'id': '79177352',
'title': 'Billboard 2007 Top 100',
'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
},
'playlist_count': 99,
}, {
'note': 'Toplist/Charts sample',
'url': 'http://music.163.com/#/discover/toplist?id=3733003',
'info_dict': {
'id': '3733003',
'title': 're:韩国Melon排行榜周榜 [0-9]{4}-[0-9]{2}-[0-9]{2}',
'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
},
'playlist_count': 50,
}]
def _real_extract(self, url):
list_id = self._match_id(url)
info = self.query_api(
'playlist/detail?id=%s&lv=-1&tv=-1' % list_id,
list_id, 'Downloading playlist data')['result']
name = info['name']
desc = info.get('description')
if info.get('specialType') == 10: # is a chart/toplist
datestamp = datetime.fromtimestamp(
self.convert_milliseconds(info['updateTime'])).strftime('%Y-%m-%d')
name = '%s %s' % (name, datestamp)
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song['id'],
'NetEaseMusic', song['id'])
for song in info['tracks']
]
return self.playlist_result(entries, list_id, name, desc)
class NetEaseMusicMvIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:mv'
IE_DESC = '网易云音乐 - MV'
_VALID_URL = r'https?://music\.163\.com/(#/)?mv\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://music.163.com/#/mv?id=415350',
'info_dict': {
'id': '415350',
'ext': 'mp4',
'title': '이럴거면 그러지말지',
'description': '白雅言自作曲唱甜蜜爱情',
'creator': '白雅言',
'upload_date': '20150520',
},
}
def _real_extract(self, url):
mv_id = self._match_id(url)
info = self.query_api(
'mv/detail?id=%s&type=mp4' % mv_id,
mv_id, 'Downloading mv info')['data']
formats = [
{'url': mv_url, 'ext': 'mp4', 'format_id': '%sp' % brs, 'height': int(brs)}
for brs, mv_url in info['brs'].items()
]
self._sort_formats(formats)
return {
'id': mv_id,
'title': info['name'],
'description': info.get('desc') or info.get('briefDesc'),
'creator': info['artistName'],
'upload_date': info['publishTime'].replace('-', ''),
'formats': formats,
'thumbnail': info.get('cover'),
'duration': self.convert_milliseconds(info.get('duration', 0)),
}
class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:program'
IE_DESC = '网易云音乐 - 电台节目'
_VALID_URL = r'https?://music\.163\.com/(#/?)program\?id=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://music.163.com/#/program?id=10109055',
'info_dict': {
'id': '10109055',
'ext': 'mp3',
'title': '不丹足球背后的故事',
'description': '喜马拉雅人的足球梦 ...',
'creator': '大话西藏',
'timestamp': 1434179342,
'upload_date': '20150613',
'duration': 900,
},
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
'info_dict': {
'id': '10141022',
'title': '25岁你是自在如风的少年<27°C>',
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
},
'playlist_count': 4,
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
'info_dict': {
'id': '10141022',
'ext': 'mp3',
'title': '25岁你是自在如风的少年<27°C>',
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
'timestamp': 1434450841,
'upload_date': '20150616',
},
'params': {
'noplaylist': True
}
}]
def _real_extract(self, url):
program_id = self._match_id(url)
info = self.query_api(
'dj/program/detail?id=%s' % program_id,
program_id, 'Downloading program info')['program']
name = info['name']
description = info['description']
if not info['songs'] or self._downloader.params.get('noplaylist'):
if info['songs']:
self.to_screen(
'Downloading just the main audio %s because of --no-playlist'
% info['mainSong']['id'])
formats = self.extract_formats(info['mainSong'])
self._sort_formats(formats)
return {
'id': program_id,
'title': name,
'description': description,
'creator': info['dj']['brand'],
'timestamp': self.convert_milliseconds(info['createTime']),
'thumbnail': info['coverUrl'],
'duration': self.convert_milliseconds(info.get('duration', 0)),
'formats': formats,
}
self.to_screen(
'Downloading playlist %s - add --no-playlist to just download the main audio %s'
% (program_id, info['mainSong']['id']))
song_ids = [info['mainSong']['id']]
song_ids.extend([song['id'] for song in info['songs']])
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song_id,
'NetEaseMusic', song_id)
for song_id in song_ids
]
return self.playlist_result(entries, program_id, name, description)
class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:djradio'
IE_DESC = '网易云音乐 - 电台'
_VALID_URL = r'https?://music\.163\.com/(#/)?djradio\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://music.163.com/#/djradio?id=42',
'info_dict': {
'id': '42',
'title': '声音蔓延',
'description': 'md5:766220985cbd16fdd552f64c578a6b15'
},
'playlist_mincount': 40,
}
_PAGE_SIZE = 1000
def _real_extract(self, url):
dj_id = self._match_id(url)
name = None
desc = None
entries = []
for offset in compat_itertools_count(start=0, step=self._PAGE_SIZE):
info = self.query_api(
'dj/program/byradio?asc=false&limit=%d&radioId=%s&offset=%d'
% (self._PAGE_SIZE, dj_id, offset),
dj_id, 'Downloading dj programs - %d' % offset)
entries.extend([
self.url_result(
'http://music.163.com/#/program?id=%s' % program['id'],
'NetEaseMusicProgram', program['id'])
for program in info['programs']
])
if name is None:
radio = info['programs'][0]['radio']
name = radio['name']
desc = radio['desc']
if not info['more']:
break
return self.playlist_result(entries, dj_id, name, desc)

View File

@@ -31,7 +31,7 @@ class NewstubeIE(InfoExtractor):
page = self._download_webpage(url, video_id, 'Downloading page')
video_guid = self._html_search_regex(
r'<meta property="og:video" content="https?://(?:www\.)?newstube\.ru/freshplayer\.swf\?guid=(?P<guid>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
r'<meta property="og:video:url" content="https?://(?:www\.)?newstube\.ru/freshplayer\.swf\?guid=(?P<guid>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
page, 'video GUID')
player = self._download_xml(

View File

@@ -6,6 +6,7 @@ from ..utils import parse_iso8601
class NextMediaIE(InfoExtractor):
IE_DESC = '蘋果日報'
_VALID_URL = r'http://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199',
@@ -66,6 +67,7 @@ class NextMediaIE(InfoExtractor):
class NextMediaActionNewsIE(NextMediaIE):
IE_DESC = '蘋果日報 - 動新聞'
_VALID_URL = r'http://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
_TESTS = [{
'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460',
@@ -90,6 +92,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'http://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',

View File

@@ -195,7 +195,7 @@ class NocoIE(InfoExtractor):
if episode_number:
title += ' #' + compat_str(episode_number)
if episode:
title += ' - ' + episode
title += ' - ' + compat_str(episode)
description = show.get('show_resume') or show.get('family_resume')

View File

@@ -133,7 +133,7 @@ class NowTVIE(InfoExtractor):
station = mobj.group('station')
info = self._download_json(
'https://api.nowtv.de/v3/movies/%s?fields=*,format,files' % display_id,
'https://api.nowtv.de/v3/movies/%s?fields=id,title,free,geoblocked,articleLong,articleShort,broadcastStartDate,seoUrl,duration,format,files' % display_id,
display_id)
video_id = compat_str(info['id'])

View File

@@ -1,5 +1,7 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
fix_xml_ampersands,
@@ -7,7 +9,6 @@ from ..utils import (
qualities,
strip_jsonp,
unified_strdate,
url_basename,
)
@@ -16,13 +17,42 @@ class NPOBaseIE(InfoExtractor):
token_page = self._download_webpage(
'http://ida.omroep.nl/npoplayer/i.js',
video_id, note='Downloading token')
return self._search_regex(
token = self._search_regex(
r'npoplayer\.token = "(.+?)"', token_page, 'token')
# Decryption algorithm extracted from http://npoplayer.omroep.nl/csjs/npoplayer-min.js
token_l = list(token)
first = second = None
for i in range(5, len(token_l) - 4):
if token_l[i].isdigit():
if first is None:
first = i
elif second is None:
second = i
if first is None or second is None:
first = 12
second = 13
token_l[first], token_l[second] = token_l[second], token_l[first]
return ''.join(token_l)
class NPOIE(NPOBaseIE):
IE_NAME = 'npo.nl'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/(?!live|radio)[^/]+/[^/]+/(?P<id>[^/?]+)'
IE_NAME = 'npo'
IE_DESC = 'npo.nl and ntr.nl'
_VALID_URL = r'''(?x)
(?:
npo:|
https?://
(?:www\.)?
(?:
npo\.nl/(?!live|radio)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__
)
)
(?P<id>[^/?#]+)
'''
_TESTS = [
{
@@ -42,7 +72,7 @@ class NPOIE(NPOBaseIE):
'info_dict': {
'id': 'VARA_101191800',
'ext': 'm4v',
'title': 'De Mega Mike & Mega Thomas show',
'title': 'De Mega Mike & Mega Thomas show: The best of.',
'description': 'md5:3b74c97fc9d6901d5a665aac0e5400f4',
'upload_date': '20090227',
'duration': 2400,
@@ -54,8 +84,8 @@ class NPOIE(NPOBaseIE):
'info_dict': {
'id': 'VPWON_1169289',
'ext': 'm4v',
'title': 'Tegenlicht',
'description': 'md5:d6476bceb17a8c103c76c3b708f05dd1',
'title': 'Tegenlicht: De toekomst komt uit Afrika',
'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225',
'duration': 3000,
},
@@ -84,6 +114,30 @@ class NPOIE(NPOBaseIE):
'title': 'Hoe gaat Europa verder na Parijs?',
},
},
{
'url': 'http://www.ntr.nl/Aap-Poot-Pies/27/detail/Aap-poot-pies/VPWON_1233944#content',
'md5': '01c6a2841675995da1f0cf776f03a9c3',
'info_dict': {
'id': 'VPWON_1233944',
'ext': 'm4v',
'title': 'Aap, poot, pies',
'description': 'md5:c9c8005d1869ae65b858e82c01a91fde',
'upload_date': '20150508',
'duration': 599,
},
},
{
'url': 'http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698',
'md5': 'd30cd8417b8b9bca1fdff27428860d08',
'info_dict': {
'id': 'POW_00996502',
'ext': 'm4v',
'title': '''"Dit is wel een 'landslide'..."''',
'description': 'md5:f8d66d537dfb641380226e31ca57b8e8',
'upload_date': '20150508',
'duration': 462,
},
}
]
def _real_extract(self, url):
@@ -92,12 +146,24 @@ class NPOIE(NPOBaseIE):
def _get_info(self, video_id):
metadata = self._download_json(
'http://e.omroep.nl/metadata/aflevering/%s' % video_id,
'http://e.omroep.nl/metadata/%s' % video_id,
video_id,
# We have to remove the javascript callback
transform_source=strip_jsonp,
)
# For some videos actual video id (prid) is different (e.g. for
# http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698
# video id is POMS_WNL_853698 but prid is POW_00996502)
video_id = metadata.get('prid') or video_id
# titel is too generic in some cases so utilize aflevering_titel as well
# when available (e.g. http://tegenlicht.vpro.nl/afleveringen/2014-2015/access-to-africa.html)
title = metadata['titel']
sub_title = metadata.get('aflevering_titel')
if sub_title and sub_title != title:
title += ': %s' % sub_title
token = self._get_token(video_id)
formats = []
@@ -170,8 +236,8 @@ class NPOIE(NPOBaseIE):
return {
'id': video_id,
'title': metadata['titel'],
'description': metadata['info'],
'title': title,
'description': metadata.get('info'),
'thumbnail': metadata.get('images', [{'url': None}])[-1]['url'],
'upload_date': unified_strdate(metadata.get('gidsdatum')),
'duration': parse_duration(metadata.get('tijdsduur')),
@@ -340,9 +406,8 @@ class NPORadioFragmentIE(InfoExtractor):
}
class TegenlichtVproIE(NPOIE):
IE_NAME = 'tegenlicht.vpro.nl'
_VALID_URL = r'https?://tegenlicht\.vpro\.nl/afleveringen/.*?'
class VPROIE(NPOIE):
_VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
_TESTS = [
{
@@ -351,17 +416,72 @@ class TegenlichtVproIE(NPOIE):
'info_dict': {
'id': 'VPWON_1169289',
'ext': 'm4v',
'title': 'Tegenlicht',
'description': 'md5:d6476bceb17a8c103c76c3b708f05dd1',
'title': 'De toekomst komt uit Afrika',
'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225',
},
},
{
'url': 'http://www.vpro.nl/programmas/2doc/2015/sergio-herman.html',
'info_dict': {
'id': 'sergio-herman',
'title': 'Sergio Herman: Fucking perfect',
},
'playlist_count': 2,
},
{
# playlist with youtube embed
'url': 'http://www.vpro.nl/programmas/2doc/2015/education-education.html',
'info_dict': {
'id': 'education-education',
'title': '2Doc',
},
'playlist_count': 2,
}
]
def _real_extract(self, url):
name = url_basename(url)
webpage = self._download_webpage(url, name)
urn = self._html_search_meta('mediaurn', webpage)
info_page = self._download_json(
'http://rs.vpro.nl/v2/api/media/%s.json' % urn, name)
return self._get_info(info_page['mid'])
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
for video_id in re.findall(r'data-media-id="([^"]+)"', webpage)
]
playlist_title = self._search_regex(
r'<title>\s*([^>]+?)\s*-\s*Teledoc\s*-\s*VPRO\s*</title>',
webpage, 'playlist title', default=None) or self._og_search_title(webpage)
return self.playlist_result(entries, playlist_id, playlist_title)
class WNLIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?omroepwnl\.nl/video/detail/(?P<id>[^/]+)__\d+'
_TEST = {
'url': 'http://www.omroepwnl.nl/video/detail/vandaag-de-dag-6-mei__060515',
'info_dict': {
'id': 'vandaag-de-dag-6-mei',
'title': 'Vandaag de Dag 6 mei',
},
'playlist_count': 4,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('npo:%s' % video_id, 'NPO')
for video_id, part in re.findall(
r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>(Deel \d+)', webpage)
]
playlist_title = self._html_search_regex(
r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>',
webpage, 'playlist title')
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@@ -13,7 +13,7 @@ from ..utils import (
class NRKIE(InfoExtractor):
_VALID_URL = r'(?:nrk:|http://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
_VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
_TESTS = [
{
@@ -76,7 +76,7 @@ class NRKIE(InfoExtractor):
class NRKPlaylistIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?nrk\.no/(?!video)(?:[^/]+/)+(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video)(?:[^/]+/)+(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763',
@@ -116,11 +116,12 @@ class NRKPlaylistIE(InfoExtractor):
class NRKTVIE(InfoExtractor):
_VALID_URL = r'(?P<baseurl>http://tv\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
IE_DESC = 'NRK TV and NRK Radio'
_VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_TESTS = [
{
'url': 'http://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': 'adf2c5454fa2bf032f47a9f8fb351342',
'info_dict': {
'id': 'MUHH48000314',
@@ -132,7 +133,7 @@ class NRKTVIE(InfoExtractor):
},
},
{
'url': 'http://tv.nrk.no/program/mdfp15000514',
'url': 'https://tv.nrk.no/program/mdfp15000514',
'md5': '383650ece2b25ecec996ad7b5bb2a384',
'info_dict': {
'id': 'mdfp15000514',
@@ -145,7 +146,7 @@ class NRKTVIE(InfoExtractor):
},
{
# single playlist video
'url': 'http://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
@@ -157,7 +158,7 @@ class NRKTVIE(InfoExtractor):
'skip': 'Only works from Norway',
},
{
'url': 'http://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'playlist': [
{
'md5': '9480285eff92d64f06e02a5367970a7a',
@@ -188,6 +189,10 @@ class NRKTVIE(InfoExtractor):
'duration': 6947.5199999999995,
},
'skip': 'Only works from Norway',
},
{
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True,
}
]
@@ -206,7 +211,8 @@ class NRKTVIE(InfoExtractor):
]}
def _extract_f4m(self, manifest_url, video_id):
return self._extract_f4m_formats(manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id)
return self._extract_f4m_formats(
manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id, f4m_id='hds')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -268,7 +274,7 @@ class NRKTVIE(InfoExtractor):
m3u8_url = re.search(r'data-hls-media="([^"]+)"', webpage)
if m3u8_url:
formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4'))
formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4', m3u8_id='hls'))
self._sort_formats(formats)
subtitles_url = self._html_search_regex(

View File

@@ -2,7 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_unquote
from ..utils import (
unified_strdate,
int_or_none,
@@ -62,7 +62,7 @@ class OdnoklassnikiIE(InfoExtractor):
metadata = self._parse_json(metadata, video_id)
else:
metadata = self._download_json(
compat_urllib_parse.unquote(flashvars['metadataUrl']),
compat_urllib_parse_unquote(flashvars['metadataUrl']),
video_id, 'Downloading metadata JSON')
movie = metadata['movie']

View File

@@ -0,0 +1,76 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import determine_ext
class OnionStudiosIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?onionstudios\.com/(?:videos/[^/]+-|embed\?.*\bid=)(?P<id>\d+)(?!-)'
_TESTS = [{
'url': 'http://www.onionstudios.com/videos/hannibal-charges-forward-stops-for-a-cocktail-2937',
'md5': 'd4851405d31adfadf71cd7a487b765bb',
'info_dict': {
'id': '2937',
'ext': 'mp4',
'title': 'Hannibal charges forward, stops for a cocktail',
'description': 'md5:545299bda6abf87e5ec666548c6a9448',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'The A.V. Club',
'uploader_id': 'TheAVClub',
},
}, {
'url': 'http://www.onionstudios.com/embed?id=2855&autoplay=true',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?onionstudios\.com/embed.+?)\1', webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.onionstudios.com/embed?id=%s' % video_id, video_id)
formats = []
for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage):
if determine_ext(src) != 'm3u8': # m3u8 always results in 403
formats.append({
'url': src,
})
self._sort_formats(formats)
title = self._search_regex(
r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
webpage, 'title', group='title')
description = self._search_regex(
r'share_description\s*=\s*(["\'])(?P<description>[^\1]+?)\1',
webpage, 'description', default=None, group='description')
thumbnail = self._search_regex(
r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',
webpage, 'thumbnail', default=False, group='thumbnail')
uploader_id = self._search_regex(
r'twitter_handle\s*=\s*(["\'])(?P<uploader_id>[^\1]+?)\1',
webpage, 'uploader id', fatal=False, group='uploader_id')
uploader = self._search_regex(
r'window\.channelName\s*=\s*(["\'])Embedded:(?P<uploader>[^\1]+?)\1',
webpage, 'uploader', default=False, group='uploader')
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
}

View File

@@ -3,9 +3,9 @@ from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
parse_iso8601,
compat_urllib_parse,
parse_age_limit,
int_or_none,
)
@@ -37,7 +37,7 @@ class OpenFilmIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
player = compat_urllib_parse.unquote_plus(
player = compat_urllib_parse_unquote_plus(
self._og_search_video_url(webpage))
video = json.loads(self._search_regex(

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -31,10 +32,13 @@ class PBSIE(InfoExtractor):
'info_dict': {
'id': '2365006249',
'ext': 'mp4',
'title': 'A More Perfect Union',
'title': 'Constitution USA with Peter Sagal - A More Perfect Union',
'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
'duration': 3190,
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://www.pbs.org/wgbh/pages/frontline/losing-iraq/',
@@ -42,10 +46,13 @@ class PBSIE(InfoExtractor):
'info_dict': {
'id': '2365297690',
'ext': 'mp4',
'title': 'Losing Iraq',
'title': 'FRONTLINE - Losing Iraq',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'duration': 5050,
},
'params': {
'skip_download': True, # requires ffmpeg
}
},
{
'url': 'http://www.pbs.org/newshour/bb/education-jan-june12-cyberschools_02-23/',
@@ -53,7 +60,7 @@ class PBSIE(InfoExtractor):
'info_dict': {
'id': '2201174722',
'ext': 'mp4',
'title': 'Cyber Schools Gain Popularity, but Quality Questions Persist',
'title': 'PBS NewsHour - Cyber Schools Gain Popularity, but Quality Questions Persist',
'description': 'md5:5871c15cba347c1b3d28ac47a73c7c28',
'duration': 801,
},
@@ -65,10 +72,13 @@ class PBSIE(InfoExtractor):
'id': '2365297708',
'ext': 'mp4',
'description': 'md5:68d87ef760660eb564455eb30ca464fe',
'title': 'Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
'duration': 6559,
'thumbnail': 're:^https?://.*\.jpg$',
}
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://www.pbs.org/wgbh/nova/earth/killer-typhoon.html',
@@ -78,11 +88,14 @@ class PBSIE(InfoExtractor):
'display_id': 'killer-typhoon',
'ext': 'mp4',
'description': 'md5:c741d14e979fc53228c575894094f157',
'title': 'Killer Typhoon',
'title': 'NOVA - Killer Typhoon',
'duration': 3172,
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20140122',
}
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://www.pbs.org/wgbh/pages/frontline/united-states-of-secrets/',
@@ -90,6 +103,36 @@ class PBSIE(InfoExtractor):
'id': 'united-states-of-secrets',
},
'playlist_count': 2,
},
{
'url': 'http://www.pbs.org/wgbh/americanexperience/films/death/player/',
'info_dict': {
'id': '2280706814',
'display_id': 'player',
'ext': 'mp4',
'title': 'American Experience - Death and the Civil War',
'description': 'American Experience, TVs most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.',
'duration': 6705,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://video.pbs.org/video/2365367186/',
'info_dict': {
'id': '2365367186',
'display_id': '2365367186',
'ext': 'mp4',
'title': 'To Catch A Comet - Full Episode',
'description': 'On November 12, 2014, billions of kilometers from Earth, spacecraft orbiter Rosetta and lander Philae did what no other had dared to attempt \u2014 land on the volatile surface of a comet as it zooms around the sun at 67,000 km/hr. The European Space Agency hopes this mission can help peer into our past and unlock secrets of our origins.',
'duration': 3342,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # requires ffmpeg
},
}
]
@@ -123,7 +166,7 @@ class PBSIE(InfoExtractor):
return media_id, presumptive_id, upload_date
url = self._search_regex(
r'<iframe\s+(?:class|id)=["\']partnerPlayer["\'].*?\s+src=["\'](.*?)["\']>',
r'<iframe\s+[^>]*\s+src=["\']([^\'"]+partnerplayer[^\'"]+)["\']',
webpage, 'player URL')
mobj = re.match(self._VALID_URL, url)
@@ -196,6 +239,20 @@ class PBSIE(InfoExtractor):
rating_str = rating_str.rpartition('-')[2]
age_limit = US_RATINGS.get(rating_str)
subtitles = {}
closed_captions_url = info.get('closed_captions_url')
if closed_captions_url:
subtitles['en'] = [{
'ext': 'ttml',
'url': closed_captions_url,
}]
# info['title'] is often incomplete (e.g. 'Full Episode', 'Episode 5', etc)
# Try turning it to 'program - title' naming scheme if possible
alt_title = info.get('program', {}).get('title')
if alt_title:
info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + '[\s\-:]+', '', info['title'])
return {
'id': video_id,
'display_id': display_id,
@@ -206,4 +263,5 @@ class PBSIE(InfoExtractor):
'age_limit': age_limit,
'upload_date': upload_date,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -4,7 +4,7 @@ import json
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_unquote
class PhotobucketIE(InfoExtractor):
@@ -34,7 +34,7 @@ class PhotobucketIE(InfoExtractor):
info_json = self._search_regex(r'Pb\.Data\.Shared\.put\(Pb\.Data\.Shared\.MEDIA, (.*?)\);',
webpage, 'info json')
info = json.loads(info_json)
url = compat_urllib_parse.unquote(self._html_search_regex(r'file=(.+\.mp4)', info['linkcodes']['html'], 'url'))
url = compat_urllib_parse_unquote(self._html_search_regex(r'file=(.+\.mp4)', info['linkcodes']['html'], 'url'))
return {
'id': video_id,
'url': url,

View File

@@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
remove_end,
remove_start,
str_to_int,
unified_strdate,
)
class PinkbikeIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?pinkbike\.com/video/|es\.pinkbike\.org/i/kvid/kvid-y5\.swf\?id=)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.pinkbike.com/video/402811/',
'md5': '4814b8ca7651034cd87e3361d5c2155a',
'info_dict': {
'id': '402811',
'ext': 'mp4',
'title': 'Brandon Semenuk - RAW 100',
'description': 'Official release: www.redbull.ca/rupertwalker',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 100,
'upload_date': '20150406',
'uploader': 'revelco',
'location': 'Victoria, British Columbia, Canada',
'view_count': int,
'comment_count': int,
}
}, {
'url': 'http://es.pinkbike.org/i/kvid/kvid-y5.swf?id=406629',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.pinkbike.com/video/%s' % video_id, video_id)
formats = []
for _, format_id, src in re.findall(
r'data-quality=((?:\\)?["\'])(.+?)\1[^>]+src=\1(.+?)\1', webpage):
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None))
formats.append({
'url': src,
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
title = remove_end(self._og_search_title(webpage), ' Video - Pinkbike')
description = self._html_search_regex(
r'(?s)id="media-description"[^>]*>(.+?)<',
webpage, 'description', default=None) or remove_start(
self._og_search_description(webpage), title + '. ')
thumbnail = self._og_search_thumbnail(webpage)
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'duration'))
uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False))
location = self._html_search_regex(
r'(?s)<dt>Location</dt>\s*<dd>(.+?)<',
webpage, 'location', fatal=False)
def extract_count(webpage, label):
return str_to_int(self._search_regex(
r'<span[^>]+class="stat-num"[^>]*>([\d,.]+)</span>\s*<span[^>]+class="stat-label"[^>]*>%s' % label,
webpage, label, fatal=False))
view_count = extract_count(webpage, 'Views')
comment_count = extract_count(webpage, 'Comments')
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'upload_date': upload_date,
'uploader': uploader,
'location': location,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats
}

View File

@@ -18,7 +18,8 @@ class PlanetaPlayIE(InfoExtractor):
'id': '3586',
'ext': 'flv',
'title': 'md5:e829428ee28b1deed00de90de49d1da1',
}
},
'skip': 'Not accessible from Travis CI server',
}
_SONG_FORMATS = {

View File

@@ -38,9 +38,7 @@ class PlayedIE(InfoExtractor):
if m_error:
raise ExtractorError(m_error.group('msg'), expected=True)
fields = re.findall(
r'type="hidden" name="([^"]+)"\s+value="([^"]+)">', orig_webpage)
data = dict(fields)
data = self._hidden_inputs(orig_webpage)
self._sleep(2, video_id)

View File

@@ -4,7 +4,8 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
)
from ..utils import (
clean_html,
@@ -44,7 +45,7 @@ class PlayvidIE(InfoExtractor):
flashvars = self._html_search_regex(
r'flashvars="(.+?)"', webpage, 'flashvars')
infos = compat_urllib_parse.unquote(flashvars).split(r'&')
infos = compat_urllib_parse_unquote(flashvars).split(r'&')
for info in infos:
videovars_match = re.match(r'^video_vars\[(.+?)\]=(.+?)$', info)
if videovars_match:
@@ -52,7 +53,7 @@ class PlayvidIE(InfoExtractor):
val = videovars_match.group(2)
if key == 'title':
video_title = compat_urllib_parse.unquote_plus(val)
video_title = compat_urllib_parse_unquote_plus(val)
if key == 'duration':
try:
duration = int(val)

View File

@@ -5,7 +5,8 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse,
compat_urllib_request,
)
@@ -19,8 +20,8 @@ from ..aes import (
class PornHubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-f]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': '882f488fa1f0026f023f33576004a2ed',
'info_dict': {
@@ -30,7 +31,10 @@ class PornHubIE(InfoExtractor):
"title": "Seductive Indian beauty strips down and fingers her pink pussy",
"age_limit": 18
}
}
}, {
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
'only_matching': True,
}]
@classmethod
def _extract_url(cls, webpage):
@@ -66,7 +70,7 @@ class PornHubIE(InfoExtractor):
webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = compat_urllib_parse.unquote(thumbnail)
thumbnail = compat_urllib_parse_unquote(thumbnail)
view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> views', webpage, 'view')
@@ -77,9 +81,9 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
video_urls = list(map(compat_urllib_parse.unquote, re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
video_urls = list(map(compat_urllib_parse_unquote, re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse.unquote_plus(
password = compat_urllib_parse_unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
@@ -31,12 +29,7 @@ class PrimeShareTVIE(InfoExtractor):
if '>File not exist<' in webpage:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
(?:id="[^"]+"\s+)?
value="([^"]*)"
''', webpage))
fields = self._hidden_inputs(webpage)
headers = {
'Referer': url,

View File

@@ -35,10 +35,7 @@ class PromptFileIE(InfoExtractor):
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
fields = dict(re.findall(r'''(?x)type="hidden"\s+
name="(.+?)"\s+
value="(.*?)"
''', webpage))
fields = self._hidden_inputs(webpage)
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')

View File

@@ -9,8 +9,11 @@ from ..compat import (
compat_urllib_parse,
)
from ..utils import (
unified_strdate,
ExtractorError,
determine_ext,
float_or_none,
int_or_none,
unified_strdate,
)
@@ -21,6 +24,11 @@ class ProSiebenSat1IE(InfoExtractor):
_TESTS = [
{
# Tests changes introduced in https://github.com/rg3/youtube-dl/pull/6242
# in response to fixing https://github.com/rg3/youtube-dl/issues/6215:
# - malformed f4m manifest support
# - proper handling of URLs starting with `https?://` in 2.0 manifests
# - recursive child f4m manifests extraction
'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge',
'info_dict': {
'id': '2104602',
@@ -208,7 +216,7 @@ class ProSiebenSat1IE(InfoExtractor):
clip_id = self._html_search_regex(self._CLIPID_REGEXES, webpage, 'clip id')
access_token = 'prosieben'
client_name = 'kolibri-1.12.6'
client_name = 'kolibri-2.0.19-splec4'
client_location = url
videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse.urlencode({
@@ -218,10 +226,13 @@ class ProSiebenSat1IE(InfoExtractor):
'ids': clip_id,
})
videos = self._download_json(videos_api_url, clip_id, 'Downloading videos JSON')
video = self._download_json(videos_api_url, clip_id, 'Downloading videos JSON')[0]
duration = float(videos[0]['duration'])
source_ids = [source['id'] for source in videos[0]['sources']]
if video.get('is_protected') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
duration = float_or_none(video.get('duration'))
source_ids = [source['id'] for source in video['sources']]
source_ids_str = ','.join(map(str, source_ids))
g = '01!8d8F_)r9]4s[qeuXfP%'
@@ -275,8 +286,9 @@ class ProSiebenSat1IE(InfoExtractor):
for source in urls_sources:
protocol = source['protocol']
source_url = source['url']
if protocol == 'rtmp' or protocol == 'rtmpe':
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source['url'])
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj:
continue
path = mobj.group('path')
@@ -293,9 +305,11 @@ class ProSiebenSat1IE(InfoExtractor):
'ext': 'mp4',
'format_id': '%s_%s' % (source['cdn'], source['bitrate']),
})
elif 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(source_url, clip_id))
else:
formats.append({
'url': source['url'],
'url': source_url,
'vbr': fix_bitrate(source['bitrate']),
})

View File

@@ -9,12 +9,14 @@ from .common import InfoExtractor
from ..utils import (
strip_jsonp,
unescapeHTML,
clean_html,
)
from ..compat import compat_urllib_request
class QQMusicIE(InfoExtractor):
IE_NAME = 'qqmusic'
IE_DESC = 'QQ音乐'
_VALID_URL = r'http://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
@@ -26,6 +28,20 @@ class QQMusicIE(InfoExtractor):
'upload_date': '20141227',
'creator': '林俊杰',
'description': 'md5:d327722d0361576fde558f1ac68a7065',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'note': 'There is no mp3-320 version of this song.',
'url': 'http://y.qq.com/#type=song&mid=004MsGEo3DdNxV',
'md5': 'fa3926f0c585cda0af8fa4f796482e3e',
'info_dict': {
'id': '004MsGEo3DdNxV',
'ext': 'mp3',
'title': '如果',
'upload_date': '20050626',
'creator': '李季美',
'description': 'md5:46857d5ed62bc4ba84607a805dccf437',
'thumbnail': 're:^https?://.*\.jpg$',
}
}]
@@ -68,6 +84,14 @@ class QQMusicIE(InfoExtractor):
if lrc_content:
lrc_content = lrc_content.replace('\\n', '\n')
thumbnail_url = None
albummid = self._search_regex(
[r'albummid:\'([0-9a-zA-Z]+)\'', r'"albummid":"([0-9a-zA-Z]+)"'],
detail_info_page, 'album mid', default=None)
if albummid:
thumbnail_url = "http://i.gtimg.cn/music/photo/mid_album_500/%s/%s/%s.jpg" \
% (albummid[-2:-1], albummid[-1], albummid)
guid = self.m_r_get_ruin()
vkey = self._download_json(
@@ -85,6 +109,7 @@ class QQMusicIE(InfoExtractor):
'preference': details['preference'],
'abr': details.get('abr'),
})
self._check_formats(formats, mid)
self._sort_formats(formats)
return {
@@ -94,6 +119,7 @@ class QQMusicIE(InfoExtractor):
'upload_date': publish_time,
'creator': singer,
'description': lrc_content,
'thumbnail': thumbnail_url,
}
@@ -117,6 +143,7 @@ class QQPlaylistBaseIE(InfoExtractor):
class QQMusicSingerIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:singer'
IE_DESC = 'QQ音乐 - 歌手'
_VALID_URL = r'http://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = {
'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
@@ -161,39 +188,50 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album'
IE_DESC = 'QQ音乐 - 专辑'
_VALID_URL = r'http://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = {
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1&play=0',
_TESTS = [{
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',
'info_dict': {
'id': '000gXCTb2AhRR1',
'title': '我们都是这样长大的',
'description': 'md5:d216c55a2d4b3537fe4415b8767d74d6',
'description': 'md5:179c5dce203a5931970d306aa9607ea6',
},
'playlist_count': 4,
}
}, {
'url': 'http://y.qq.com/#type=album&mid=002Y5a3b3AlCu3',
'info_dict': {
'id': '002Y5a3b3AlCu3',
'title': '그리고...',
'description': 'md5:a48823755615508a95080e81b51ba729',
},
'playlist_count': 8,
}]
def _real_extract(self, url):
mid = self._match_id(url)
album_page = self._download_webpage(
self.qq_static_url('album', mid), mid, 'Download album page')
album = self._download_json(
'http://i.y.qq.com/v8/fcg-bin/fcg_v8_album_info_cp.fcg?albummid=%s&format=json' % mid,
mid, 'Download album page')['data']
entries = self.get_entries_from_page(album_page)
album_name = self._html_search_regex(
r"albumname\s*:\s*'([^']+)',", album_page, 'album name',
default=None)
album_detail = self._html_search_regex(
r'<div class="album_detail close_detail">\s*<p>((?:[^<>]+(?:<br />)?)+)</p>',
album_page, 'album details', default=None)
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
) for song in album['list']
]
album_name = album.get('name')
album_detail = album.get('desc')
if album_detail is not None:
album_detail = album_detail.strip()
return self.playlist_result(entries, mid, album_name, album_detail)
class QQMusicToplistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:toplist'
IE_DESC = 'QQ音乐 - 排行榜'
_VALID_URL = r'http://y\.qq\.com/#type=toplist&p=(?P<id>(top|global)_[0-9]+)'
_TESTS = [{
@@ -243,3 +281,37 @@ class QQMusicToplistIE(QQPlaylistBaseIE):
list_name = topinfo.get('ListName')
list_description = topinfo.get('info')
return self.playlist_result(entries, list_id, list_name, list_description)
class QQMusicPlaylistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:playlist'
IE_DESC = 'QQ音乐 - 歌单'
_VALID_URL = r'http://y\.qq\.com/#type=taoge&id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://y.qq.com/#type=taoge&id=3462654915',
'info_dict': {
'id': '3462654915',
'title': '韩国5月新歌精选下旬',
'description': 'md5:d2c9d758a96b9888cf4fe82f603121d4',
},
'playlist_count': 40,
}
def _real_extract(self, url):
list_id = self._match_id(url)
list_json = self._download_json(
'http://i.y.qq.com/qzone-music/fcg-bin/fcg_ucc_getcdinfo_byids_cp.fcg?type=1&json=1&utf8=1&onlysong=0&disstid=%s'
% list_id, list_id, 'Download list page',
transform_source=strip_jsonp)['cdlist'][0]
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
) for song in list_json['songlist']
]
list_name = list_json.get('dissname')
list_description = clean_html(unescapeHTML(list_json.get('desc')))
return self.playlist_result(entries, list_id, list_name, list_description)

View File

@@ -24,6 +24,7 @@ class QuickVidIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.(?:png|jpg|gif)$',
'view_count': int,
},
'skip': 'Not accessible from Travis CI server',
}
def _real_extract(self, url):

View File

@@ -0,0 +1,73 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
)
class RDSIE(InfoExtractor):
IE_DESC = 'RDS.ca'
_VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<display_id>[^/]+)-(?P<id>\d+\.\d+)'
_TESTS = [{
'url': 'http://www.rds.ca/videos/football/nfl/fowler-jr-prend-la-direction-de-jacksonville-3.1132799',
'info_dict': {
'id': '3.1132799',
'display_id': 'fowler-jr-prend-la-direction-de-jacksonville',
'ext': 'mp4',
'title': 'Fowler Jr. prend la direction de Jacksonville',
'description': 'Dante Fowler Jr. est le troisième choix du repêchage 2015 de la NFL. ',
'timestamp': 1430397346,
'upload_date': '20150430',
'duration': 154.354,
'age_limit': 0,
}
}, {
'url': 'http://www.rds.ca/vid%C3%A9os/un-voyage-positif-3.877934',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
# TODO: extract f4m from 9c9media.com
video_url = self._search_regex(
r'<span[^>]+itemprop="contentURL"[^>]+content="([^"]+)"',
webpage, 'video url')
title = self._og_search_title(webpage) or self._html_search_meta(
'title', webpage, 'title', fatal=True)
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage) or self._search_regex(
[r'<link[^>]+itemprop="thumbnailUrl"[^>]+href="([^"]+)"',
r'<span[^>]+itemprop="thumbnailUrl"[^>]+content="([^"]+)"'],
webpage, 'thumbnail', fatal=False)
timestamp = parse_iso8601(self._search_regex(
r'<span[^>]+itemprop="uploadDate"[^>]+content="([^"]+)"',
webpage, 'upload date', fatal=False))
duration = parse_duration(self._search_regex(
r'<span[^>]+itemprop="duration"[^>]+content="([^"]+)"',
webpage, 'duration', fatal=False))
age_limit = self._family_friendly_search(webpage)
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'age_limit': age_limit,
}

View File

@@ -43,6 +43,25 @@ class RtlNlIE(InfoExtractor):
'upload_date': '20150215',
'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.',
}
}, {
# empty synopsis and missing episodes (see https://github.com/rg3/youtube-dl/issues/6275)
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a/autoplay=false',
'info_dict': {
'id': 'f536aac0-1dc3-4314-920e-3bd1c5b3811a',
'ext': 'mp4',
'title': 'RTL Nieuws - Meer beelden van overval juwelier',
'thumbnail': 're:^https?://screenshots\.rtl\.nl/system/thumb/sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
'timestamp': 1437233400,
'upload_date': '20150718',
'duration': 30.474,
},
'params': {
'skip_download': True,
},
}, {
# encrypted m3u8 streams, georestricted
'url': 'http://www.rtlxl.nl/#!/afl-2-257632/52a74543-c504-4cde-8aa8-ec66fe8d68a7',
'only_matching': True,
}, {
'url': 'http://www.rtl.nl/system/videoplayer/derden/embed.html#!/uuid=bb0353b0-d6a4-1dad-90e9-18fe75b8d1f0',
'only_matching': True,
@@ -51,21 +70,33 @@ class RtlNlIE(InfoExtractor):
def _real_extract(self, url):
uuid = self._match_id(url)
info = self._download_json(
'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=flash/' % uuid,
'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=adaptive/' % uuid,
uuid)
material = info['material'][0]
progname = info['abstracts'][0]['name']
subtitle = material['title'] or info['episodes'][0]['name']
description = material.get('synopsis') or info['episodes'][0]['synopsis']
title = info['abstracts'][0]['name']
subtitle = material.get('title')
if subtitle:
title += ' - %s' % subtitle
description = material.get('synopsis')
# Use unencrypted m3u8 streams (See https://github.com/rg3/youtube-dl/issues/4118)
videopath = material['videopath'].replace('.f4m', '.m3u8')
m3u8_url = 'http://manifest.us.rtl.nl' + videopath
meta = info.get('meta', {})
# m3u8 streams are encrypted and may not be handled properly by older ffmpeg/avconv.
# To workaround this previously adaptive -> flash trick was used to obtain
# unencrypted m3u8 streams (see https://github.com/rg3/youtube-dl/issues/4118)
# and bypass georestrictions as well.
# Currently, unencrypted m3u8 playlists are (intentionally?) invalid and therefore
# unusable albeit can be fixed by simple string replacement (see
# https://github.com/rg3/youtube-dl/pull/6337)
# Since recent ffmpeg and avconv handle encrypted streams just fine encrypted
# streams are used now.
videopath = material['videopath']
m3u8_url = meta.get('videohost', 'http://manifest.us.rtl.nl') + videopath
formats = self._extract_m3u8_formats(m3u8_url, uuid, ext='mp4')
video_urlpart = videopath.split('/flash/')[1][:-5]
video_urlpart = videopath.split('/adaptive/')[1][:-5]
PG_URL_TEMPLATE = 'http://pg.us.rtl.nl/rtlxl/network/%s/progressive/%s.mp4'
formats.extend([
@@ -82,7 +113,7 @@ class RtlNlIE(InfoExtractor):
self._sort_formats(formats)
thumbnails = []
meta = info.get('meta', {})
for p in ('poster_base_url', '"thumb_base_url"'):
if not meta.get(p):
continue
@@ -98,7 +129,7 @@ class RtlNlIE(InfoExtractor):
return {
'id': uuid,
'title': '%s - %s' % (progname, subtitle),
'title': title,
'formats': formats,
'timestamp': material['original_date'],
'description': description,

View File

@@ -19,7 +19,16 @@ from ..utils import (
class RTSIE(InfoExtractor):
IE_DESC = 'RTS.ch'
_VALID_URL = r'https?://(?:www\.)?rts\.ch/(?:(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html|play/tv/[^/]+/video/(?P<display_id_new>.+?)\?id=(?P<id_new>[0-9]+))'
_VALID_URL = r'''(?x)
(?:
rts:(?P<rts_id>\d+)|
https?://
(?:www\.)?rts\.ch/
(?:
(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html|
play/tv/[^/]+/video/(?P<display_id_new>.+?)\?id=(?P<id_new>[0-9]+)
)
)'''
_TESTS = [
{
@@ -122,6 +131,15 @@ class RTSIE(InfoExtractor):
'view_count': int,
},
},
{
# article with videos on rhs
'url': 'http://www.rts.ch/sport/hockey/6693917-hockey-davos-decroche-son-31e-titre-de-champion-de-suisse.html',
'info_dict': {
'id': '6693917',
'title': 'Hockey: Davos décroche son 31e titre de champion de Suisse',
},
'playlist_mincount': 5,
},
{
'url': 'http://www.rts.ch/play/tv/le-19h30/video/le-chantier-du-nouveau-parlement-vaudois-a-permis-une-trouvaille-historique?id=6348280',
'only_matching': True,
@@ -130,7 +148,7 @@ class RTSIE(InfoExtractor):
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id') or m.group('id_new')
video_id = m.group('rts_id') or m.group('id') or m.group('id_new')
display_id = m.group('display_id') or m.group('display_id_new')
def download_json(internal_id):
@@ -143,6 +161,15 @@ class RTSIE(InfoExtractor):
# video_id extracted out of URL is not always a real id
if 'video' not in all_info and 'audio' not in all_info:
page = self._download_webpage(url, display_id)
# article with videos on rhs
videos = re.findall(
r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:rts:video:(\d+)"',
page)
if videos:
entries = [self.url_result('rts:%s' % video_urn, 'RTS') for video_urn in videos]
return self.playlist_result(entries, video_id, self._og_search_title(page))
internal_id = self._html_search_regex(
r'<(?:video|audio) data-id="([0-9]+)"', page,
'internal video id')

View File

@@ -83,7 +83,7 @@ class SafariIE(SafariBaseIE):
library/view/[^/]+|
api/v1/book
)/
(?P<course_id>\d+)/
(?P<course_id>[^/]+)/
(?:chapter(?:-content)?/)?
(?P<part>part\d+)\.html
'''
@@ -100,6 +100,10 @@ class SafariIE(SafariBaseIE):
}, {
'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
'only_matching': True,
}, {
# non-digits in course id
'url': 'https://www.safaribooksonline.com/library/view/create-a-nodejs/100000006A0210/part00.html',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -122,7 +126,7 @@ class SafariCourseIE(SafariBaseIE):
IE_NAME = 'safari:course'
IE_DESC = 'safaribooksonline.com online courses'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/(?:library/view/[^/]+|api/v1/book)/(?P<id>\d+)/?(?:[#?]|$)'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/(?:library/view/[^/]+|api/v1/book)/(?P<id>[^/]+)/?(?:[#?]|$)'
_TESTS = [{
'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/',

View File

@@ -1,17 +1,12 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
remove_end,
)
class SBSIE(InfoExtractor):
IE_DESC = 'sbs.com.au'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/ondemand/video/(?:single/)?(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand|news)/video/(?:single/)?(?P<id>[0-9]+)'
_TESTS = [{
# Original URL is handled by the generic IE which finds the iframe:
@@ -21,39 +16,36 @@ class SBSIE(InfoExtractor):
'info_dict': {
'id': '320403011771',
'ext': 'mp4',
'title': 'Dingo Conservation',
'description': 'Dingoes are on the brink of extinction; most of the animals we think are dingoes are in fact crossbred with wild dogs. This family run a dingo conservation park to prevent their extinction',
'title': 'Dingo Conservation (The Feed)',
'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5',
'thumbnail': 're:http://.*\.jpg',
'duration': 308,
},
'add_ies': ['generic'],
}, {
'url': 'http://www.sbs.com.au/ondemand/video/320403011771/Dingo-Conservation-The-Feed',
'only_matching': True,
}, {
'url': 'http://www.sbs.com.au/news/video/471395907773/The-Feed-July-9',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'http://www.sbs.com.au/ondemand/video/single/%s?context=web' % video_id, video_id)
player = self._search_regex(
r'(?s)playerParams\.releaseUrls\s*=\s*(\{.*?\n\});\n',
webpage, 'player')
player = re.sub(r"'\s*\+\s*[\da-zA-Z_]+\s*\+\s*'", '', player)
player_params = self._parse_json(
self._search_regex(
r'(?s)var\s+playerParams\s*=\s*({.+?});', webpage, 'playerParams'),
video_id)
release_urls = self._parse_json(js_to_json(player), video_id)
theplatform_url = release_urls.get('progressive') or release_urls['standard']
title = remove_end(self._og_search_title(webpage), ' (The Feed)')
description = self._html_search_meta('description', webpage)
thumbnail = self._og_search_thumbnail(webpage)
urls = player_params['releaseUrls']
theplatform_url = (urls.get('progressive') or urls.get('standard') or
urls.get('html') or player_params['relatedItemsURL'])
return {
'_type': 'url_transparent',
'id': video_id,
'url': theplatform_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
@@ -35,8 +34,7 @@ class SharedIE(InfoExtractor):
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
download_form = dict(re.findall(
r'<input type="hidden" name="([^"]+)" value="([^"]*)"', webpage))
download_form = self._hidden_inputs(webpage)
request = compat_urllib_request.Request(
url, compat_urllib_parse.urlencode(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')

View File

@@ -53,7 +53,7 @@ class SmotriIE(InfoExtractor):
'thumbnail': 'http://frame4.loadup.ru/03/ed/57591.2.3.jpg',
},
},
# video-password
# video-password, not approved by moderator
{
'url': 'http://smotri.com/video/view/?id=v1390466a13c',
'md5': 'f6331cef33cad65a0815ee482a54440b',
@@ -71,7 +71,24 @@ class SmotriIE(InfoExtractor):
},
'skip': 'Video is not approved by moderator',
},
# age limit + video-password
# video-password
{
'url': 'http://smotri.com/video/view/?id=v6984858774#',
'md5': 'f11e01d13ac676370fc3b95b9bda11b0',
'info_dict': {
'id': 'v6984858774',
'ext': 'mp4',
'title': 'Дача Солженицина ПАРОЛЬ 223322',
'uploader': 'psavari1',
'uploader_id': 'psavari1',
'upload_date': '20081103',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'videopassword': '223322',
},
},
# age limit + video-password, not approved by moderator
{
'url': 'http://smotri.com/video/view/?id=v15408898bcf',
'md5': '91e909c9f0521adf5ee86fbe073aad70',
@@ -90,19 +107,22 @@ class SmotriIE(InfoExtractor):
},
'skip': 'Video is not approved by moderator',
},
# not approved by moderator, but available
# age limit + video-password
{
'url': 'http://smotri.com/video/view/?id=v28888533b73',
'md5': 'f44bc7adac90af518ef1ecf04893bb34',
'url': 'http://smotri.com/video/view/?id=v7780025814',
'md5': 'b4599b068422559374a59300c5337d72',
'info_dict': {
'id': 'v28888533b73',
'id': 'v7780025814',
'ext': 'mp4',
'title': 'Russian Spies Killed By ISIL Child Soldier',
'uploader': 'Mopeder',
'uploader_id': 'mopeder',
'duration': 71,
'thumbnail': 'http://frame9.loadup.ru/d7/32/2888853.2.3.jpg',
'upload_date': '20150114',
'title': 'Sexy Beach (пароль 123)',
'uploader': 'вАся',
'uploader_id': 'asya_prosto',
'upload_date': '20081218',
'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18,
},
'params': {
'videopassword': '123'
},
},
# swf player
@@ -152,6 +172,10 @@ class SmotriIE(InfoExtractor):
'getvideoinfo': '1',
}
video_password = self._downloader.params.get('videopassword', None)
if video_password:
video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
request = compat_urllib_request.Request(
'http://smotri.com/video/view/url/bot/', compat_urllib_parse.urlencode(video_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
@@ -161,13 +185,18 @@ class SmotriIE(InfoExtractor):
video_url = video.get('_vidURL') or video.get('_vidURL_mp4')
if not video_url:
if video.get('_moderate_no') or not video.get('moderated'):
if video.get('_moderate_no'):
raise ExtractorError(
'Video %s has not been approved by moderator' % video_id, expected=True)
if video.get('error'):
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
if video.get('_pass_protected') == 1:
msg = ('Invalid video password' if video_password
else 'This video is protected by a password, use the --video-password option')
raise ExtractorError(msg, expected=True)
title = video['title']
thumbnail = video['_imgURL']
upload_date = unified_strdate(video['added'])

View File

@@ -0,0 +1,181 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
determine_ext,
int_or_none,
js_to_json,
parse_duration,
)
class SnagFilmsEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|embed)\.)?snagfilms\.com/embed/player\?.*\bfilmId=(?P<id>[\da-f-]{36})'
_TESTS = [{
'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
'md5': '2924e9215c6eff7a55ed35b72276bd93',
'info_dict': {
'id': '74849a00-85a9-11e1-9660-123139220831',
'ext': 'mp4',
'title': '#whilewewatch',
}
}, {
# invalid labels, 360p is better that 480p
'url': 'http://www.snagfilms.com/embed/player?filmId=17ca0950-a74a-11e0-a92a-0026bb61d036',
'md5': '882fca19b9eb27ef865efeeaed376a48',
'info_dict': {
'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
'ext': 'mp4',
'title': 'Life in Limbo',
}
}, {
'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:embed\.)?snagfilms\.com/embed/player.+?)\1',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if '>This film is not playable in your area.<' in webpage:
raise ExtractorError(
'Film %s is not playable in your area.' % video_id, expected=True)
formats = []
for source in self._parse_json(js_to_json(self._search_regex(
r'(?s)sources:\s*(\[.+?\]),', webpage, 'json')), video_id):
file_ = source.get('file')
if not file_:
continue
type_ = source.get('type')
ext = determine_ext(file_)
format_id = source.get('label') or ext
if all(v == 'm3u8' for v in (type_, ext)):
formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', m3u8_id='hls'))
else:
bitrate = int_or_none(self._search_regex(
[r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
file_, 'bitrate', default=None))
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None))
formats.append({
'url': file_,
'format_id': format_id,
'tbr': bitrate,
'height': height,
})
self._sort_formats(formats)
title = self._search_regex(
[r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
webpage, 'title')
return {
'id': video_id,
'title': title,
'formats': formats,
}
class SnagFilmsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?snagfilms\.com/(?:films/title|show)/(?P<id>[^?#]+)'
_TESTS = [{
'url': 'http://www.snagfilms.com/films/title/lost_for_life',
'md5': '19844f897b35af219773fd63bdec2942',
'info_dict': {
'id': '0000014c-de2f-d5d6-abcf-ffef58af0017',
'display_id': 'lost_for_life',
'ext': 'mp4',
'title': 'Lost for Life',
'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 4489,
'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals']
}
}, {
'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
'md5': 'e6292e5b837642bbda82d7f8bf3fbdfd',
'info_dict': {
'id': '00000145-d75c-d96e-a9c7-ff5c67b20000',
'display_id': 'the_world_cut_project/india',
'ext': 'mp4',
'title': 'India',
'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 979,
'categories': ['Documentary', 'Sports', 'Politics']
}
}, {
# Film is not playable in your area.
'url': 'http://www.snagfilms.com/films/title/inside_mecca',
'only_matching': True,
}, {
# Film is not available.
'url': 'http://www.snagfilms.com/show/augie_alone/flirting',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if ">Sorry, the Film you're looking for is not available.<" in webpage:
raise ExtractorError(
'Film %s is not available.' % display_id, expected=True)
film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
snag = self._parse_json(
self._search_regex(
'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
display_id)
for item in snag:
if item.get('data', {}).get('film', {}).get('id') == film_id:
data = item['data']['film']
title = data['title']
description = clean_html(data.get('synopsis'))
thumbnail = data.get('image')
duration = int_or_none(data.get('duration') or data.get('runtime'))
categories = [
category['title'] for category in data.get('categories', [])
if category.get('title')]
break
else:
title = self._search_regex(
r'itemprop="title">([^<]+)<', webpage, 'title')
description = self._html_search_regex(
r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
webpage, 'description', default=None) or self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
duration = parse_duration(self._search_regex(
r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
webpage, 'duration', fatal=False))
categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
return {
'_type': 'url_transparent',
'url': 'http://embed.snagfilms.com/embed/player?filmId=%s' % film_id,
'id': film_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'categories': categories,
}

View File

@@ -6,9 +6,12 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_request
compat_urllib_request,
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
)
from ..utils import ExtractorError
class SohuIE(InfoExtractor):
@@ -26,7 +29,7 @@ class SohuIE(InfoExtractor):
'skip': 'On available in China',
}, {
'url': 'http://tv.sohu.com/20150305/n409385080.shtml',
'md5': 'ac9a5d322b4bf9ae184d53e4711e4f1a',
'md5': '699060e75cf58858dd47fb9c03c42cfb',
'info_dict': {
'id': '409385080',
'ext': 'mp4',
@@ -34,7 +37,7 @@ class SohuIE(InfoExtractor):
}
}, {
'url': 'http://my.tv.sohu.com/us/232799889/78693464.shtml',
'md5': '49308ff6dafde5ece51137d04aec311e',
'md5': '9bf34be48f2f4dadcb226c74127e203c',
'info_dict': {
'id': '78693464',
'ext': 'mp4',
@@ -48,7 +51,7 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
},
'playlist': [{
'md5': '492923eac023ba2f13ff69617c32754a',
'md5': 'bdbfb8f39924725e6589c146bc1883ad',
'info_dict': {
'id': '78910339_part1',
'ext': 'mp4',
@@ -56,7 +59,7 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': 'de604848c0e8e9c4a4dde7e1347c0637',
'md5': '3e1f46aaeb95354fd10e7fca9fc1804e',
'info_dict': {
'id': '78910339_part2',
'ext': 'mp4',
@@ -64,7 +67,7 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': '93584716ee0657c0b205b8aa3d27aa13',
'md5': '8407e634175fdac706766481b9443450',
'info_dict': {
'id': '78910339_part3',
'ext': 'mp4',
@@ -139,21 +142,42 @@ class SohuIE(InfoExtractor):
for i in range(part_count):
formats = []
for format_id, format_data in formats_json.items():
data = format_data['data']
allot = format_data['allot']
data = format_data['data']
clips_url = data['clipsURL']
su = data['su']
# URLs starts with http://newflv.sohu.ccgslb.net/ is not usable
# so retry until got a working URL
video_url = 'newflv.sohu.ccgslb.net'
cdnId = None
retries = 0
while 'newflv.sohu.ccgslb.net' in video_url and retries < 5:
download_note = 'Download information from CDN gateway for format ' + format_id
while 'newflv.sohu.ccgslb.net' in video_url:
params = {
'prot': 9,
'file': clips_url[i],
'new': su[i],
'prod': 'flash',
}
if cdnId is not None:
params['idc'] = cdnId
download_note = 'Downloading %s video URL part %d of %d' % (
format_id, i + 1, part_count)
if retries > 0:
download_note += ' (retry #%d)' % retries
part_info = self._parse_json(self._download_webpage(
'http://%s/?%s' % (allot, compat_urllib_parse.urlencode(params)),
video_id, download_note), video_id)
video_url = part_info['url']
cdnId = part_info.get('nid')
retries += 1
cdn_info = self._download_json(
'http://data.vod.itc.cn/cdnList?new=' + data['su'][i],
video_id, download_note)
video_url = cdn_info['url']
if retries > 5:
raise ExtractorError('Failed to get video URL')
formats.append({
'url': video_url,

Some files were not shown because too many files have changed in this diff Show More