Compare commits

...

161 Commits

Author SHA1 Message Date
Sergey M․
6c152ce20f release 2016.10.02 2016-10-02 15:58:00 +07:00
Sergey M․
26406d33c7 [ChangeLog] Actualize 2016-10-02 15:56:33 +07:00
Yen Chi Hsuan
703b3afa93 [amcnetworks] Skip a restricted _TEST 2016-10-02 14:25:06 +08:00
Yen Chi Hsuan
99ed78c79e [jwplatform] Support DASH streams 2016-10-02 14:07:49 +08:00
Yen Chi Hsuan
fd15264172 [jwplatform] Support old-style jwplayer playlists 2016-10-02 13:47:06 +08:00
Yen Chi Hsuan
bd26441205 [utils] Fix xattr error handling 2016-10-02 03:03:41 +08:00
Yen Chi Hsuan
b19e275d99 [__init__] Fix lost xattr if --embed-thumbnail used
Reported at
https://github.com/rg3/youtube-dl/issues/9054#issuecomment-250451823
2016-10-02 02:12:14 +08:00
Sergey M․
f6ba581f89 [byutv:event] Add extractor 2016-10-02 00:50:07 +07:00
Sergey M․
6d2549fb4f [byutv] Fix id and display id 2016-10-02 00:44:54 +07:00
Déstin Reed
4da4516973 [byutv] Rely on _match_id and _parse_json 2016-10-02 00:41:18 +07:00
Sergey M․
e1e97c2446 [periscope:user] Fix extraction (Closes #10820) 2016-10-01 22:50:47 +07:00
Yen Chi Hsuan
53a7e3d287 [utils] Support xattr as well as pyxattr
Closes #9054

There are two xattr packages in Python, pyxattr [1] and xattr [2]. They
have different APIs.

In old days pyxattr supports Linux only and xattr supports Linux, Mac,
FreeBSD and Solaris, and pyxattr supports Linux only. Recently pyxattr
adds support for Mac OS X. [3]

An old version of [2] is shipped with Mac OS X. However, some Linux
distributions have pyxattr only, for example PLD-Linux [4] and old Arch
Linux. [5] As a result, supporting both is the way to go.

[1] https://github.com/iustin/pyxattr
[2] https://github.com/xattr/xattr
[3] https://github.com/iustin/pyxattr/pull/9
[4] https://github.com/rg3/youtube-dl/issues/5498
[5] https://git.archlinux.org/svntogit/community.git/commit/?id=427c4c76401e386d865ccddea4fbfdc74df80492
    https://git.archlinux.org/svntogit/community.git/commit/?id=59b40da7b69622a6761d364a8b07909e9cccaa56
    python-xattr is added on 2016/06/29 while pyxattr is there for more
    than 6 years
2016-10-01 20:13:04 +08:00
Yen Chi Hsuan
d54739a2e6 [downloader/http] xattr values should be bytes 2016-10-01 19:58:13 +08:00
Yen Chi Hsuan
63e0fd5bcc Merge pull request #10818 from TRox1972/criterion_match_id
[criterion] Rely on _match_id, improve regex and add thumbnail to test
2016-10-01 19:49:18 +08:00
Déstin Reed
9c51a24642 [criterion] Rely on _match_id, improve regex and add thumbnail to test 2016-10-01 13:46:48 +02:00
Yen Chi Hsuan
9bd7bd0b80 [twitch] Skip a 404 test 2016-10-01 16:38:47 +08:00
Yen Chi Hsuan
4a76b73c6c Merge pull request #10817 from TRox1972/clubic_match_id
[clubic] Rely on _match_id and _parse_json
2016-10-01 16:20:12 +08:00
Yen Chi Hsuan
e295618f9e [dctp] Fix extraction (closes #10734) 2016-10-01 15:22:48 +08:00
Yen Chi Hsuan
d7753d1948 [downloader/http] Use write_xattr function for --xattr-set-filesize 2016-10-01 14:47:20 +08:00
Déstin Reed
eaf9b22f94 [clubic] Rely on _match_id and _parse_json 2016-09-30 20:03:25 +02:00
Sergey M․
a1001f47fc [instagram] PEP 8 2016-10-01 00:16:08 +07:00
Déstin Reed
1609782258 [Instagram] Extract video dimensions 2016-10-01 00:13:34 +07:00
Sergey M․
de6babf922 [tvland] Extend _VALID_URL (Closes #10812) 2016-09-30 22:30:34 +07:00
Sergey M․
b0582fc806 [vgtv] Add support for tv.aftonbladet.se (Closes #10800) 2016-09-30 00:15:09 +07:00
Sergey M․
af33dd8ee7 [aftonbladet] Remove extractor 2016-09-30 00:13:03 +07:00
Sergey M․
70d7b323b6 [vk] Improve view count extraction 2016-09-29 23:52:29 +07:00
Sergey M․
a7ee8a00f4 [vk] Extract timestamp (Closes #10760) 2016-09-29 23:52:29 +07:00
Sergey M․
c6eed6b8c0 [utils] Lower priority for rare date formats and add tests 2016-09-29 23:52:29 +07:00
Kacper Michajłow
3aa3953d28 [vk] Fix date and view count extraction. 2016-09-29 23:52:29 +07:00
Yen Chi Hsuan
efa97bdcf1 Move write_xattr to utils.py
There are some other places that use xattr functions. It's better to
move it to a common place so that others can use it.
2016-09-30 00:28:32 +08:00
Sergey M․
475f8a4580 [vk] Add support for running live streams (Closes #10799) 2016-09-29 23:21:39 +07:00
Sergey M․
93aa0b6318 [vk] Add support for finished live streams (#10799) 2016-09-29 23:04:10 +07:00
Yen Chi Hsuan
0ce26ef228 Merge pull request #10788 from TRox1972/instagram_comments
[Instagram] Extract comments
2016-09-29 21:54:39 +08:00
Yen Chi Hsuan
0d72ff9c51 [leeco] Recognize more Le Sports URLs (#10794) 2016-09-29 21:39:35 +08:00
Déstin Reed
a56e74e271 [Instagram] Extract comments 2016-09-28 19:32:40 +02:00
Sergey M․
f533490bb7 [ketnet] Extract mzsource formats (#10770) 2016-09-28 22:58:25 +07:00
Remita Amine
8bfda726c2 [limelight:media] improve http formats extraction 2016-09-28 16:34:27 +01:00
Sergey M․
8f0cf20ab9 release 2016.09.27 2016-09-27 23:09:46 +07:00
Sergey M․
c8f45f763c [ChangeLog] Remove duplicate 2016-09-27 23:03:00 +07:00
Sergey M․
dd2cffeeec [ChangeLog] Actualize 2016-09-27 22:43:35 +07:00
Sergey M․
cdfcc4ce95 [mtv] Improve _VALID_URL 2016-09-27 22:27:10 +07:00
Kacper Michajłow
e384552590 [vk] Add support for dailymotion embeds
Fixes #10661
2016-09-27 21:58:14 +07:00
Sergey M․
1a2fbe322e [periscope] Treat timed_out state as finished stream 2016-09-27 21:55:51 +07:00
Sergey M․
f9dd86a112 [npo] Clarify IE_NAMEs (Closes #10775) 2016-09-27 21:37:33 +07:00
Remita Amine
2342733f85 fix tests related to 1978540a5122c53012e17a78841f3da0df77fd34(closes #10774) 2016-09-27 15:31:25 +01:00
Remita Amine
93933c9819 [awaan:video] fix test(closes #10773) 2016-09-27 15:31:25 +01:00
Yen Chi Hsuan
d75d9e343e [einthusan] Fix extraction (closes #10714) 2016-09-27 14:38:41 +08:00
Sergey M․
72c3d02d29 [promptfile] Improve and modernize 2016-09-26 23:39:54 +07:00
Ondřej Bárta
d3dbb46330 [promptfile] Fix extraction (Closes #10634) 2016-09-26 23:20:58 +07:00
Sergey M․
fffb9cff94 [kaltura] Speed up embed regexes (#10764) 2016-09-26 22:15:58 +07:00
Yen Chi Hsuan
d3c97bad61 Ignore and cleanup 3gp files 2016-09-26 14:14:37 +08:00
Sergey M․
2d5b4af007 [extractors] Add import for anderetijden extractor 2016-09-25 23:30:57 +07:00
Sergey M․
f1ee462c82 [PULL_REQUEST_TEMPLATE.md] Fix typo 2016-09-25 22:38:36 +07:00
Sergey M․
5742c18bc1 [npo] Add support for anderetijden.nl (Closes #10754) 2016-09-25 22:26:14 +07:00
Sergey M․
ddb19772d5 [vpro] Fix playlist title extraction and update tests 2016-09-25 22:26:06 +07:00
Sergey M․
a3d8b38168 [npo] Generalize playlist extractors 2016-09-25 22:26:00 +07:00
Sergey M․
e590b7ff9e [PULL_REQUEST_TEMPLATE.md] Add checkable Improvement options PR's purpose 2016-09-25 18:09:46 +07:00
Sergey M․
f3625cc4ca [PULL_REQUEST_TEMPLATE.md] Add Unlicense notice 2016-09-25 18:08:35 +07:00
stepshal
2d3d29976b [youtube] Change test URLs from http to https 2016-09-25 17:45:24 +07:00
Sergey M․
493353c7fd [prosiebensat1] Add support for advopedia 2016-09-25 06:25:57 +07:00
Sergey M․
0a078550b9 [prosiebensat1] Improve _VALID_URL 2016-09-25 06:19:17 +07:00
Sergey M․
f92bb612c6 [mwave] Relax _VALID_URLs (Closes #10735, closes #10748) 2016-09-25 06:14:32 +07:00
Sergey M․
ddde91952f [prosiebensat1] Fix playlist support (Closes #10745) 2016-09-25 05:36:18 +07:00
Sergey M․
63c583eb2c [prosiebensat1] Add support for sat1gold (#10745) 2016-09-25 04:43:10 +07:00
Remita Amine
7fd57de6fb [cbsnews:livevideo] fix extraction and extract m3u8 formats 2016-09-24 22:01:33 +01:00
Remita Amine
e71a450956 [common] add hdcore sign to akamai f4m formats 2016-09-24 21:55:53 +01:00
Remita Amine
27e99078d3 [brightcove:new] add support for live streams 2016-09-24 15:39:48 +01:00
Remita Amine
6f126d903f [download/hls] Delegate downloading to ffmpeg for live streams 2016-09-24 15:39:47 +01:00
Sergey M․
7518a61d41 [soundcloud] Fix typo in playlist base class name 2016-09-24 19:29:49 +07:00
Sergey M․
8e45e1cc4d [soundcloud] Generalize playlist entries extraction (#10733) 2016-09-24 19:18:01 +07:00
Yen Chi Hsuan
f0bc5a8609 [twitter] Support Periscope embeds (closes #10737)
Also update _TESTS
2016-09-24 20:00:29 +08:00
Remita Amine
a54ffb8aa7 [mtv] add common IE_NAME prefix for MTVIE and MTVVideoIE 2016-09-24 10:50:14 +01:00
Remita Amine
8add4bfecb [mtv] add support for new website urls(closes #8169)(closes #9808) 2016-09-24 10:42:20 +01:00
Yen Chi Hsuan
0711995bca [openload] Support subtitles (closes #10625) 2016-09-24 14:27:08 +08:00
Yen Chi Hsuan
5968d7d2fe [extractor/common] Improved support for HTML5 subtitles
Ref: #10625

In a strict sense, <track>s with kind=captions are not subtitles. [1]
openload misuses this attribute, and I guess there will be more
examples, so I add it to common.py.

Also allow extracting information for subtitles-only <video> or <audio>
tags, which is the case of openload.

[1] https://www.w3.org/TR/html5/embedded-content-0.html#attr-track-kind
2016-09-24 14:20:42 +08:00
Sergey M․
e6332059ac release 2016.09.24 2016-09-24 02:16:47 +07:00
Sergey M․
8eec691e8a [ChangeLog] Actualize 2016-09-24 02:12:49 +07:00
Sergey M․
24628cf7db [soundcloud:playlist] Provide video id for playlist entries (Closes #10733) 2016-09-24 02:01:01 +07:00
Sergey M․
71ad00c09f [prosiebensat1] Add support for kabeleinsdoku (Closes #10732) 2016-09-23 21:08:16 +07:00
Remita Amine
45cae3b021 [cbs] extract info from thunder videoPlayerService(closes #10728) 2016-09-22 19:28:22 +01:00
Yen Chi Hsuan
4ddcb5999d [openload] Fix extraction (closes #10408, closes #10727)
Thanks to @daniel100097 for providing a working version
2016-09-23 01:47:51 +08:00
Yen Chi Hsuan
628406db96 [Makefile] Cleanup files from fragment-based downloaders 2016-09-23 01:13:56 +08:00
Yen Chi Hsuan
e3d6bdc8fc [ustream] Support HLS streams (closes #10698) 2016-09-23 01:11:13 +08:00
Sergey M․
0a439c5c4c [udemy] Stringify video id 2016-09-22 21:48:53 +07:00
Remita Amine
1978540a51 [ooyala] extract all hls formats 2016-09-21 21:49:52 +01:00
Sergey M․
12f211d0cb [videomore] Fix embed regex 2016-09-21 22:51:36 +07:00
Remita Amine
3a5a18705f [adobepass] add support MSO that depend on watchTVeverywhere(closes #10709) 2016-09-21 15:57:27 +01:00
Remita Amine
1ae0ae5db0 [cartoonnetwork] add support Adobe Pass auth 2016-09-20 18:52:00 +01:00
Sergey M․
f62a77b99a [soundcloud] Modernize 2016-09-20 21:56:57 +07:00
coolsa
4bfd294e2f [soundcloud] Extract license metadata 2016-09-20 21:56:57 +07:00
Remita Amine
e33a7253b2 [fox] add support for Adobe Pass auth(closes #8584) 2016-09-20 15:52:23 +01:00
Remita Amine
c38f06818d add support for Adobe Pass auth in tbs,tnt and trutv extractors(fixes #10642)(closes #10222)(closes #10519) 2016-09-20 11:55:30 +01:00
Sergey M․
cb57386873 release 2016.09.19 2016-09-19 02:58:32 +07:00
Sergey M․
59fd8f931d [ChangeLog] Actualize 2016-09-19 02:57:14 +07:00
Sergey M․
70b4cf9b1b [crunchyroll] Check if already logged in (Closes #10700) 2016-09-19 02:50:06 +07:00
Sergey M․
cc764a6da8 [twitch:stream] Remove fallback to profile extraction when stream is offline
Main page does not contain profile videos anymore
2016-09-18 19:10:18 +07:00
Yen Chi Hsuan
d8dbf8707d [thisav] Improve title extraction (closes #10682)
I didn't add a test case as the one in #10682 looks like a copyrighted
product.
2016-09-18 18:35:38 +08:00
Sergey M․
a1da888d0c [vyborymos] Improve station info extraction 2016-09-18 17:30:55 +07:00
Sergey M․
3acff9423d release 2016.09.18 2016-09-18 17:16:55 +07:00
Sergey M․
9ca93b99d1 [ChangeLog] Actualize 2016-09-18 17:15:22 +07:00
Sergey M․
14ae11efab [vyborymos] Add extractor (Closes #10692) 2016-09-18 16:56:40 +07:00
Sergey M․
190d2027d0 [xfileshare] Add title regex for streamin.to and fallback to video id (Closes #10646) 2016-09-18 07:22:06 +07:00
Sergey M․
26394d021d [globo:article] Add support for multiple videos (Closes #10653) 2016-09-17 23:34:10 +07:00
Sergey M․
30d0b549be [extractor/common] Add manifest_url for hls and hds formats 2016-09-17 21:33:38 +07:00
Sergey M․
86f4d14f81 Refactor fragments interface and dash segments downloader
- Eliminate segment_urls and initialization_url
+ Introduce manifest_url (manifest may contain unfragmented data in this case url will be used for direct media URL and manifest_url for manifest itself correspondingly)
* Rewrite dashsegments downloader to use fragments data
* Improve generic mpd extraction
2016-09-17 20:35:22 +07:00
Sergey M․
21d21b0c72 [svt] Fix DASH formats extraction 2016-09-17 19:25:31 +07:00
Sergey M․
b4c1d6e800 [extractor/common] Expose fragments interface for dashsegments formats 2016-09-17 18:31:18 +07:00
Sergey M․
a0d5077c8d [extractor/common] Introduce fragments interface 2016-09-17 18:31:09 +07:00
Yen Chi Hsuan
584d6f3457 [thisav] Recognize jwplayers (closes #10447) 2016-09-17 18:46:43 +08:00
Yen Chi Hsuan
e14c82bd6b [jwplatform] Use js_to_json to detect more JWPlayers 2016-09-17 18:45:08 +08:00
Sergey M․
c51a7f0b2f [franceinter] Fix upload date extraction 2016-09-17 15:44:37 +07:00
Remita Amine
d05ef09d9d [mangomolo] fix domain regex 2016-09-17 08:11:01 +01:00
Remita Amine
30d9e20938 [postprocessor/ffmpeg] apply FFmpegFixupM3u8PP only for videos with aac codec(#5591) 2016-09-16 22:06:55 +01:00
Remita Amine
fc86d4eed0 [mangomolo] fix typo 2016-09-16 20:10:47 +01:00
Remita Amine
7d273a387a [mangomolo] add support for Mangomolo embeds 2016-09-16 19:31:39 +01:00
Remita Amine
6ad0219556 [common] add helper method for Wowza Streaming Engine format extraction 2016-09-16 19:30:38 +01:00
Remita Amine
98b7506e96 [toutv] add support for authentication(closes #10669) 2016-09-16 17:40:15 +01:00
Sergey M․
52dc8a9b3f [franceinter] Fix upload date extraction 2016-09-16 22:02:59 +07:00
Sergey M․
9d8985a165 [tv4] Fix hls and hds formats (Closes #10659) 2016-09-16 00:54:34 +07:00
Sergey M․
f5e008d134 release 2016.09.15 2016-09-15 23:46:11 +07:00
Sergey M․
e6bf3621e7 [ChangeLog] Actualize 2016-09-15 23:31:16 +07:00
stepshal
490b755769 Improve some id regexes 2016-09-15 23:12:58 +07:00
Sergey M․
1dec2c8a0e [adobepass] Change mvpd cache section name
In order to better emphasize it's relation to Adobe Pass
2016-09-15 22:47:45 +07:00
Sergey M․
dcce092e0a [extractor/common] Simplify _get_netrc_login_info and carry long lines 2016-09-15 22:35:12 +07:00
Sergey M․
32443dd346 [extractor/common] Update _get_login_info's comment 2016-09-15 22:34:29 +07:00
Sergey M․
2133565cec [extractor/common] Simplify _get_login_info 2016-09-15 22:26:37 +07:00
Sergey M․
1da50aa34e [YoutubeDL] Improve Adobe Pass options' wording 2016-09-15 22:24:55 +07:00
Sergey M․
d2522b86ac [options] Actually print Adobe Pass options sections in --help 2016-09-15 22:18:31 +07:00
Sergey M․
537f753399 [options] Improve Adobe Pass wording 2016-09-15 22:17:17 +07:00
Sergey M․
c849836854 [utils] Improve _hidden_inputs 2016-09-15 21:54:48 +07:00
Sergey M․
eb5b1fc021 [crunchyroll] Fix authentication (Closes #10655) 2016-09-15 21:53:35 +07:00
Sergey M․
95be29e1c6 [twitch] Fix api calls (Closes #10654, closes #10660) 2016-09-15 20:58:02 +07:00
Remita Amine
c035dba19e [bellmedia] add support for more sites 2016-09-15 08:12:12 +01:00
Remita Amine
87148bb711 [adobepass] rename --ap-mso-list option to --ap-list-mso 2016-09-14 20:21:09 +01:00
Remita Amine
797c636bcb [ap] improve adobe pass names and parse error handling 2016-09-14 18:58:47 +01:00
Sergey M․
0002962f3f [franceinter] Improve extraction (Closes #10538) 2016-09-14 23:59:38 +07:00
Sergey M․
3e4185c396 [utils] Use native french month names 2016-09-14 23:59:38 +07:00
Sergey M․
f6717dec8a [utils] Improve month_by_name and add tests 2016-09-14 23:59:38 +07:00
renalid
a942d6cb48 [utils,franceinter] Add french months' names and fix extraction
Update of the "FranceInter" radio extractor : webpages HTML structure
had changed, the extractor didn't work. So I updated this extractor to
get the mp3 URL and all details.
2016-09-14 23:59:38 +07:00
Yen Chi Hsuan
961516bfd1 [kwuo:song] Improve error detection (closes #10650) 2016-09-15 00:56:15 +08:00
Yen Chi Hsuan
6db354a9f4 [kuwo] Update _TESTS 2016-09-15 00:53:04 +08:00
Remita Amine
353f340e11 [go] fix typo 2016-09-14 17:22:42 +01:00
Remita Amine
014b7e6b25 [go] add support for free full episodes(#10439) 2016-09-14 17:08:25 +01:00
stepshal
925194022c Improve some _VALID_URLs 2016-09-14 22:47:21 +07:00
Sergey M․
b690ea15eb [viafree] Fix test 2016-09-14 22:45:23 +07:00
Remita Amine
5712c0f426 [adobepass] remove unnecessary option 2016-09-14 16:37:21 +01:00
Yen Chi Hsuan
86d68f906e [bilibili] Fix extraction for videos without backup_url (#10647) 2016-09-14 22:11:49 +08:00
Yen Chi Hsuan
4875ff6847 [bilibili] Remove copyrighted test cases
I can't find any English or Chinese material that claims BiliBili has
bought legal redistribution permissions for copyrighted products from
copyrighted holders.

References for removed test cases:
"刀语": https://en.wikipedia.org/wiki/Katanagatari, by White Fox
"哆啦A梦": https://en.wikipedia.org/wiki/Doraemon, by Shin-Ei Animation
"岳父岳母真难当": https://en.wikipedia.org/wiki/Serial_(Bad)_Weddings, by Les films du 24
"混沌武士": https://en.wikipedia.org/wiki/Samurai_Champloo, by Manglobe

I shouldn't have added them to _TESTS
2016-09-14 22:09:43 +08:00
Remita Amine
1b6712ab23 [adobepass] add specific options for adobe pass authentication
- add --ap-username and --ap-password option to specify
TV provider username and password in the cmd line
- add --ap-retries option to limit the number of retries
- add --list-ap-msi-ids to list the supported TV Providers
2016-09-13 22:16:01 +01:00
Sergey M․
8414c2da31 [adobepass] PEP 8 2016-09-13 23:22:16 +07:00
Sergey M․
45396dd2ed [nhk] Fix extraction (Closes #10633) 2016-09-13 23:20:25 +07:00
Remita Amine
7a7309219c [adobepass] add an option to specify mso_id and support for ROGERS TV Provider(closes #10606) 2016-09-12 23:39:35 +01:00
Sergey M․
fcba157e80 [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 23:29:43 +07:00
Sergey M․
a6ccc3e518 [safari] Improve ids regexes (#10617) 2016-09-12 23:05:52 +07:00
Sergey M․
1d16035bb4 [kaltura] Improve audio detection 2016-09-12 22:43:45 +07:00
Sergey M․
e8bcd982cc [kaltura] Skip chun format 2016-09-12 22:33:00 +07:00
Sergey M․
a5ff05df1a [extractor/generic] Add vimeo embed that requires Referer passed 2016-09-12 21:49:31 +07:00
Sergey M․
d002e91986 [vimeo:ondemand] Pass Referer along with embed URL (#10624) 2016-09-12 21:48:45 +07:00
Sergey M․
546edb2efa [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 21:01:31 +07:00
Yen Chi Hsuan
be45730226 [nbc] Add new extractor for NBC Olympics (#10295, #10361) 2016-09-12 02:55:15 +08:00
Sergey M․
ee7e672eb0 [tube8] Remove proxy settings from test 2016-09-11 23:46:50 +07:00
148 changed files with 3496 additions and 1128 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.11.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.11.1**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.02**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.09.11.1
[debug] youtube-dl version 2016.10.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
@@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@@ -10,8 +10,13 @@
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*?
- [ ] Bug fix
- [ ] Improvement
- [ ] New extractor
- [ ] New feature

1
.gitignore vendored
View File

@@ -29,6 +29,7 @@ updates_key.pem
*.m4a
*.m4v
*.mp3
*.3gp
*.part
*.swp
test/testdata

128
ChangeLog
View File

@@ -1,3 +1,131 @@
version 2016.10.02
Core
* Fix possibly lost extended attributes during post-processing
+ Support pyxattr as well as python-xattr for --xattrs and
--xattr-set-filesize (#9054)
Extractors
+ [jwplatform] Support DASH streams in JWPlayer
+ [jwplatform] Support old-style JWPlayer playlists
+ [byutv:event] Add extractor
* [periscope:user] Fix extraction (#10820)
* [dctp] Fix extraction (#10734)
+ [instagram] Extract video dimensions (#10790)
+ [tvland] Extend URL regular expression (#10812)
+ [vgtv] Add support for tv.aftonbladet.se (#10800)
- [aftonbladet] Remove extractor
* [vk] Fix timestamp and view count extraction (#10760)
+ [vk] Add support for running and finished live streams (#10799)
+ [leeco] Recognize more Le Sports URLs (#10794)
+ [instagram] Extract comments (#10788)
+ [ketnet] Extract mzsource formats (#10770)
* [limelight:media] Improve HTTP formats extraction
version 2016.09.27
Core
+ Add hdcore query parameter to akamai f4m formats
+ Delegate HLS live streams downloading to ffmpeg
+ Improved support for HTML5 subtitles
Extractors
+ [vk] Add support for dailymotion embeds (#10661)
* [promptfile] Fix extraction (#10634)
* [kaltura] Speed up embed regular expressions (#10764)
+ [npo] Add support for anderetijden.nl (#10754)
+ [prosiebensat1] Add support for advopedia sites
* [mwave] Relax URL regular expression (#10735, #10748)
* [prosiebensat1] Fix playlist support (#10745)
+ [prosiebensat1] Add support for sat1gold sites (#10745)
+ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
+ [brightcove:new] Add support for live streams
* [soundcloud] Generalize playlist entries extraction (#10733)
+ [mtv] Add support for new URL schema (#8169, #9808)
* [einthusan] Fix extraction (#10714)
+ [twitter] Support Periscope embeds (#10737)
+ [openload] Support subtitles (#10625)
version 2016.09.24
Core
+ Add support for watchTVeverywhere.com authentication provider based MSOs for
Adobe Pass authentication (#10709)
Extractors
+ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
+ [prosiebensat1] Add support for kabeleinsdoku (#10732)
* [cbs] Extract info from thunder videoPlayerService (#10728)
* [openload] Fix extraction (#10408)
+ [ustream] Support the new HLS streams (#10698)
+ [ooyala] Extract all HLS formats
+ [cartoonnetwork] Add support for Adobe Pass authentication
+ [soundcloud] Extract license metadata
+ [fox] Add support for Adobe Pass authentication (#8584)
+ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
+ [trutv] Add support for Adobe Pass authentication (#10519)
+ [turner] Add support for Adobe Pass authentication
version 2016.09.19
Extractors
+ [crunchyroll] Check if already authenticated (#10700)
- [twitch:stream] Remove fallback to profile extraction when stream is offline
* [thisav] Improve title extraction (#10682)
* [vyborymos] Improve station info extraction
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15
Core
* Improve _hidden_inputs
+ Introduce improved explicit Adobe Pass support
+ Add --ap-mso to provide multiple-system operator identifier
+ Add --ap-username to provide MSO account username
+ Add --ap-password to provide MSO account password
+ Add --ap-list-mso to list all supported MSOs
+ Add support for Rogers Cable multiple-system operator (#10606)
Extractors
* [crunchyroll] Fix authentication (#10655)
* [twitch] Fix API calls (#10654, #10660)
+ [bellmedia] Add support for more Bell Media Television sites
* [franceinter] Fix extraction (#10538, #2105)
* [kuwo] Improve error detection (#10650)
+ [go] Add support for free full episodes (#10439)
* [bilibili] Fix extraction for specific videos (#10647)
* [nhk] Fix extraction (#10633)
* [kaltura] Improve audio detection
* [kaltura] Skip chun format
+ [vimeo:ondemand] Pass Referer along with embed URL (#10624)
+ [nbc] Add support for NBC Olympics (#10361)
version 2016.09.11.1
Extractors

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete

View File

@@ -358,6 +358,17 @@ which means you can modify it, redistribute it or use it however you like.
-n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku)
## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV
provider) identifier, use --ap-list-mso for
a list of available MSOs
--ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password.
If this option is left out, youtube-dl will
ask interactively.
--ap-list-mso List all supported multiple-system
operators
## Post-processing Options:
-x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or

View File

@@ -34,12 +34,12 @@
- **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com
- **Aftonbladet**
- **AirMozilla**
- **AlJazeera**
- **Allocine**
- **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **AnimeOnDemand**
- **anitube.se**
- **AnySex**
@@ -89,6 +89,7 @@
- **BeatportPro**
- **Beeg**
- **BehindKink**
- **BellMedia**
- **Bet**
- **Bigflix**
- **Bild**: Bild.de
@@ -110,6 +111,7 @@
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed**
- **BYUtv**
- **BYUtvEvent**
- **Camdemy**
- **CamdemyFolder**
- **CamWithHer**
@@ -126,8 +128,8 @@
- **CBS**
- **CBSInteractive**
- **CBSLocal**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **cbsnews**: CBS News
- **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports**
- **CCTV**
- **CDA**
@@ -169,7 +171,6 @@
- **CSNNE**
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **CTV**
- **CTVNews**
- **culturebox.francetvinfo.fr**
- **CultureUnplugged**
@@ -388,6 +389,8 @@
- **mailru**: Видео@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
@@ -422,8 +425,9 @@
- **MPORA**
- **MSN**
- **mtg**: MTG services
- **MTV**
- **mtv**
- **mtv.de**
- **mtv:video**
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
@@ -445,6 +449,7 @@
- **NBA**
- **NBC**
- **NBCNews**
- **NBCOlympics**
- **NBCSports**
- **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk
@@ -848,6 +853,7 @@
- **VRT**
- **vube**: Vube.com
- **VuClip**
- **VyboryMos**
- **Walla**
- **washingtonpost**
- **washingtonpost:article**
@@ -861,7 +867,7 @@
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **WNL**
- **wnl**: npo.nl and ntr.nl
- **WorldStarHipHop**
- **wrzuta.pl**
- **wrzuta.pl:playlist**

View File

@@ -40,6 +40,7 @@ from youtube_dl.utils import (
js_to_json,
limit_length,
mimetype2ext,
month_by_name,
ohdave_rsa_encrypt,
OnDemandPagedList,
orderedSet,
@@ -291,6 +292,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@@ -311,6 +313,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -634,6 +637,14 @@ class TestUtil(unittest.TestCase):
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None)
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {

View File

@@ -131,6 +131,9 @@ class YoutubeDL(object):
username: Username for authentication purposes.
password: Password for authentication purposes.
videopassword: Password for accessing a video.
ap_mso: Adobe Pass multiple-system operator identifier.
ap_username: Multiple-system operator account username.
ap_password: Multiple-system operator account password.
usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout.
quiet: Do not print messages to stdout.

View File

@@ -34,12 +34,14 @@ from .utils import (
setproctitle,
std_headers,
write_string,
render_table,
)
from .update import update_self
from .downloader import (
FileDownloader,
)
from .extractor import gen_extractors, list_extractors
from .extractor.adobepass import MSO_INFO
from .YoutubeDL import YoutubeDL
@@ -118,18 +120,26 @@ def _real_main(argv=None):
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
write_string(desc + '\n', out=sys.stdout)
sys.exit(0)
if opts.ap_list_mso:
table = [[mso_id, mso_info['name']] for mso_id, mso_info in MSO_INFO.items()]
write_string('Supported TV Providers:\n' + render_table(['mso', 'mso name'], table) + '\n', out=sys.stdout)
sys.exit(0)
# Conflicting, missing and erroneous options
if opts.usenetrc and (opts.username is not None or opts.password is not None):
parser.error('using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None:
parser.error('account username missing\n')
if opts.ap_password is not None and opts.ap_username is None:
parser.error('TV Provider account username missing\n')
if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
parser.error('using output template conflicts with using title, video ID or auto number')
if opts.usetitle and opts.useid:
parser.error('using title conflicts with using video ID')
if opts.username is not None and opts.password is None:
opts.password = compat_getpass('Type account password and press [Return]: ')
if opts.ap_username is not None and opts.ap_password is None:
opts.ap_password = compat_getpass('Type TV provider account password and press [Return]: ')
if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None:
@@ -155,6 +165,8 @@ def _real_main(argv=None):
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@@ -254,8 +266,6 @@ def _real_main(argv=None):
postprocessors.append({
'key': 'FFmpegEmbedSubtitle',
})
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({
@@ -264,6 +274,10 @@ def _real_main(argv=None):
})
if not already_have_thumbnail:
opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd:
@@ -271,12 +285,6 @@ def _real_main(argv=None):
'key': 'ExecAfterDownload',
'exec_cmd': opts.exec_cmd,
})
if opts.xattr_set_filesize:
try:
import xattr
xattr # Confuse flake8
except ImportError:
parser.error('setting filesize xattr requested but python-xattr is not available')
external_downloader_args = None
if opts.external_downloader_args:
external_downloader_args = compat_shlex_split(opts.external_downloader_args)
@@ -293,6 +301,9 @@ def _real_main(argv=None):
'password': opts.password,
'twofactor': opts.twofactor,
'videopassword': opts.videopassword,
'ap_mso': opts.ap_mso,
'ap_username': opts.ap_username,
'ap_password': opts.ap_password,
'quiet': (opts.quiet or any_getting or any_printing),
'no_warnings': opts.no_warnings,
'forceurl': opts.geturl,

View File

@@ -1,7 +1,6 @@
from __future__ import unicode_literals
import os
import re
from .fragment import FragmentFD
from ..compat import compat_urllib_error
@@ -19,34 +18,32 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
base_url = info_dict['url']
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
initialization_url = info_dict.get('initialization_url')
segments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
ctx = {
'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0),
'total_frags': len(segments),
}
self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def process_segment(segment, tmp_filename, fatal):
target_url, segment_name = segment
def process_segment(segment, tmp_filename, num):
segment_url = segment['url']
segment_name = 'Frag%d' % num
target_filename = '%s-%s' % (tmp_filename, segment_name)
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
@@ -72,16 +69,8 @@ class DashSegmentsFD(FragmentFD):
return False
return True
segments_to_download = [(initialization_url, 'Init')] if initialization_url else []
segments_to_download.extend([
(segment_url, 'Seg%d' % i)
for i, segment_url in enumerate(segment_urls)])
for i, segment in enumerate(segments_to_download):
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
if not process_segment(segment, ctx['tmpfilename'], fatal):
for i, segment in enumerate(segments):
if not process_segment(segment, ctx['tmpfilename'], i):
return False
self._finish_frag_download(ctx)

View File

@@ -31,7 +31,7 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative'
@staticmethod
def can_download(manifest):
def can_download(manifest, info_dict):
UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
@@ -53,6 +53,7 @@ class HlsFD(FragmentFD):
)
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
check_results.append(not info_dict.get('is_live'))
return all(check_results)
def real_download(self, filename, info_dict):
@@ -62,7 +63,7 @@ class HlsFD(FragmentFD):
s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s):
if not self.can_download(s, info_dict):
self.report_warning(
'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg')

View File

@@ -13,6 +13,9 @@ from ..utils import (
encodeFilename,
sanitize_open,
sanitized_Request,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
)
@@ -179,9 +182,8 @@ class HttpFD(FileDownloader):
if self.params.get('xattr_set_filesize', False) and data_len is not None:
try:
import xattr
xattr.setxattr(tmpfilename, 'user.ytdl.filesize', str(data_len))
except(OSError, IOError, ImportError) as err:
write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
except (XAttrUnavailableError, XAttrMetadataError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err))
try:

View File

@@ -13,7 +13,7 @@ from ..utils import (
class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au'
_VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',

File diff suppressed because it is too large Load Diff

View File

@@ -1,64 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {
'id': '36015',
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# find internal video meta data
meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
formats = []
for fmt in internal_formats_json['formats']['http']['pseudostreaming']['mp4']:
p = fmt['paths'][0]
formats.append({
'url': 'http://%s:%d/%s/%s' % (p['address'], p['port'], p['path'], p['filename']),
'ext': 'mp4',
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'tbr': int_or_none(fmt.get('bitrate')),
'protocol': 'http',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json.get('imageUrl'),
'description': internal_meta_json.get('shortPreamble'),
'timestamp': int_or_none(internal_meta_json.get('timePublished')),
'duration': int_or_none(internal_meta_json.get('duration')),
'view_count': int_or_none(internal_meta_json.get('views')),
}

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',

View File

@@ -28,6 +28,7 @@ class AMCNetworksIE(ThePlatformIE):
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,

View File

@@ -50,25 +50,6 @@ class AWAANBaseIE(InfoExtractor):
'is_live': is_live,
}
def _extract_video_formats(self, webpage, video_id, m3u8_entry_protocol):
formats = []
format_url_base = 'http' + self._html_search_regex(
[
r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
r'<a[^>]+href="rtsp(://[^"]+)"'
], webpage, 'format url')
formats.extend(self._extract_mpd_formats(
format_url_base + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_m3u8_formats(
format_url_base + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
format_url_base + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return formats
class AWAANVideoIE(AWAANBaseIE):
IE_NAME = 'awaan:video'
@@ -85,6 +66,7 @@ class AWAANVideoIE(AWAANBaseIE):
'duration': 2041,
'timestamp': 1227504126,
'upload_date': '20081124',
'uploader_id': '71',
},
}, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
@@ -99,16 +81,18 @@ class AWAANVideoIE(AWAANBaseIE):
video_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(video_data, video_id, False)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), video_id)
info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native')
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloVideo',
})
return info
@@ -138,16 +122,18 @@ class AWAANLiveIE(AWAANBaseIE):
channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), channel_id)
info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8')
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloLive',
})
return info

View File

@@ -103,7 +103,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
_VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$'
_TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen',

View File

@@ -1028,7 +1028,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles'

View File

@@ -6,8 +6,25 @@ import re
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>ctv|tsn|bnn|thecomedynetwork)\.ca/.*?(?:\bvid=|-vid|~|%7E)(?P<id>[0-9.]+)'
class BellMediaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@@ -32,15 +49,27 @@ class CTVIE(InfoExtractor):
}, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
if domain == 'thecomedynetwork':
domain = 'comedy'
domain = domain.split('.')[0]
return {
'_type': 'url_transparent',
'id': video_id,
'url': '9c9media:%s_web:%s' % (domain, video_id),
'url': '9c9media:%s_web:%s' % (self._DOMAINS.get(domain, domain), video_id),
'ie_key': 'NineCNineMedia',
}

View File

@@ -17,7 +17,7 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
_TESTS = [{
_TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': {
@@ -32,64 +32,7 @@ class BiliBiliIE(InfoExtractor):
'uploader': '菊子桑',
'uploader_id': '156160',
},
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'duration': 3382.259,
'timestamp': 1396530060,
'upload_date': '20140403',
'thumbnail': 're:^https?://.+\.jpg',
'uploader': '枫叶逝去',
'uploader_id': '520116',
},
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'duration': 1493.995,
'timestamp': 1464564180,
'upload_date': '20160529',
'thumbnail': 're:^https?://.+\.jpg',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '1867637',
'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'duration': 5760.0,
'uploader': '黑夜为猫',
'uploader_id': '610729',
'thumbnail': 're:^https?://.+\.jpg',
},
'params': {
# Just to test metadata extraction
'skip_download': True,
},
'expected_warnings': ['upload time'],
}, {
'url': 'http://bangumi.bilibili.com/anime/v/40068',
'md5': '08d539a0884f3deb7b698fb13ba69696',
'info_dict': {
'id': '40068',
'ext': 'mp4',
'duration': 1402.357,
'title': '混沌武士 : 第7集 四面楚歌 A Risky Racket',
'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
'thumbnail': 're:^http?://.+\.jpg',
},
}]
}
_APP_KEY = '6f90a59ac58a4123'
_BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
@@ -124,7 +67,7 @@ class BiliBiliIE(InfoExtractor):
'url': durl['url'],
'filesize': int_or_none(durl['size']),
}]
for backup_url in durl['backup_url']:
for backup_url in durl.get('backup_url', []):
formats.append({
'url': backup_url,
# backup URLs have lower priorities

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_VALID_URL = r'https?://(?:www\.)?bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@@ -621,15 +621,21 @@ class BrightcoveNewIE(InfoExtractor):
'url': text_track['src'],
})
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
if duration and duration < 0:
is_live = True
return {
'id': video_id,
'title': title,
'title': self._live_title(title) if is_live else title,
'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id,
'formats': formats,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
'is_live': is_live,
}

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@@ -8,15 +7,15 @@ from ..utils import ExtractorError
class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/(?!event/)(?P<id>[0-9a-f-]+)(?:/(?P<display_id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': {
'id': 'studio-c-season-5-episode-5',
'id': '6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'display_id': 'studio-c-season-5-episode-5',
'ext': 'mp4',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'title': 'Season 5 Episode 5',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 1486.486,
},
@@ -24,28 +23,71 @@ class BYUtvIE(InfoExtractor):
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
ep = self._parse_json(
episode_code, display_id, transform_source=lambda s:
re.sub(r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', s))
if ep['providerType'] != 'Ooyala':
raise ExtractorError('Unsupported provider %s' % ep['provider'])
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'display_id': display_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
class BYUtvEventIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/event/(?P<id>[0-9a-f-]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/event/29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'info_dict': {
'id': '29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'ext': 'mp4',
'title': 'Toledo vs. BYU (9/30/16)',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
episode_json = re.sub(
r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
ep = json.loads(episode_json)
if ep['providerType'] == 'Ooyala':
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
else:
raise ExtractorError('Unsupported provider %s' % ep['provider'])
ooyala_id = self._search_regex(
r'providerId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'ooyala id', group='id')
title = self._search_regex(
r'class=["\']description["\'][^>]*>\s*<h1>([^<]+)</h1>', webpage,
'title').strip()
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ooyala_id,
'id': video_id,
'title': title,
}

View File

@@ -112,7 +112,7 @@ class CamdemyIE(InfoExtractor):
class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?camdemy\.com/folder/(?P<id>\d+)'
_TESTS = [{
# links with trailing slash
'url': 'http://www.camdemy.com/folder/450',

View File

@@ -71,7 +71,7 @@ class CanvasIE(InfoExtractor):
webpage)).strip()
video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')
r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id')
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'

View File

@@ -33,4 +33,10 @@ class CartoonNetworkIE(TurnerBaseIE):
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
})

View File

@@ -4,7 +4,9 @@ from .theplatform import ThePlatformFeedIE
from ..utils import (
int_or_none,
find_xpath_attr,
ExtractorError,
xpath_element,
xpath_text,
update_url_query,
)
@@ -47,27 +49,49 @@ class CBSIE(CBSBaseIE):
'only_matching': True,
}]
def _extract_video_info(self, guid):
path = 'dJ5BDC/media/guid/2198311517/' + guid
smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
formats, subtitles = self._extract_theplatform_smil(smil_url + '&manifest=m3u', guid)
for r in ('OnceURL&formats=M3U', 'HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
try:
tp_formats, _ = self._extract_theplatform_smil(smil_url + '&assetTypes=' + r, guid, 'Downloading %s SMIL data' % r.split('&')[0])
formats.extend(tp_formats)
except ExtractorError:
def _extract_video_info(self, content_id):
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
metadata = self._download_theplatform_metadata(path, guid)
info = self._parse_theplatform_metadata(metadata)
info = self._extract_theplatform_metadata(tp_path, content_id)
info.update({
'id': guid,
'id': content_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
'series': metadata.get('cbs$SeriesTitle'),
'season_number': int_or_none(metadata.get('cbs$SeasonNumber')),
'episode': metadata.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(metadata.get('cbs$EpisodeNumber')),
})
return info

View File

@@ -9,6 +9,7 @@ from ..utils import (
class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@@ -68,15 +69,16 @@ class CBSNewsIE(CBSIE):
class CBSNewsLiveVideoIE(InfoExtractor):
IE_NAME = 'cbsnews:livevideo'
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv',
'ext': 'mp4',
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
@@ -84,25 +86,22 @@ class CBSNewsLiveVideoIE(InfoExtractor):
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._download_json(
'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
'device': 'desktop',
'dvr_slug': display_id,
})
video_info = self._parse_json(self._html_search_regex(
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
formats = self._extract_akamai_formats(video_info['url'], display_id)
self._sort_formats(formats)
return {
'id': video_id,
'id': display_id,
'display_id': display_id,
'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats,
'formats': formats,
}

View File

@@ -4,7 +4,7 @@ from .cbs import CBSBaseIE
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',

View File

@@ -17,7 +17,7 @@ from ..utils import (
class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': {

View File

@@ -65,7 +65,7 @@ class ChirbitIE(InfoExtractor):
class ChirbitProfileIE(InfoExtractor):
IE_NAME = 'chirbit:profile'
_VALID_URL = r'https?://(?:www\.)?chirbit.com/(?:rss/)?(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?chirbit\.com/(?:rss/)?(?P<id>[^/]+)'
_TEST = {
'url': 'http://chirbit.com/ScarletBeauty',
'info_dict': {

View File

@@ -1,9 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
@@ -30,16 +27,14 @@ class ClubicIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id
player_page = self._download_webpage(player_url, video_id)
config_json = self._search_regex(
config = self._parse_json(self._search_regex(
r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page,
'configuration')
config = json.loads(config_json)
'configuration'), video_id)
video_info = config['videoInfo']
sources = config['sources']

View File

@@ -6,7 +6,7 @@ from ..utils import ExtractorError
class CMTIE(MTVIE):
IE_NAME = 'cmt.com'
_VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
_VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [{

View File

@@ -87,6 +87,9 @@ class InfoExtractor(object):
Potential fields:
* url Mandatory. The URL of the video file
* manifest_url
The URL of the manifest file in case of
fragmented media (DASH, hls, hds)
* ext Will be calculated from URL if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
@@ -115,6 +118,11 @@ class InfoExtractor(object):
download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmpe",
"m3u8", "m3u8_native" or "http_dash_segments".
* fragments A list of fragments of the fragmented media,
with the following entries:
* "url" (mandatory) - fragment's URL
* "duration" (optional, int or float)
* "filesize" (optional, int)
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field, regardless of all other values.
@@ -674,33 +682,36 @@ class InfoExtractor(object):
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % netrc_machine)
raise netrc.NetrcParseError(
'No authenticators for %s' % netrc_machine)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
self._downloader.report_warning(
'parsing .netrc: %s' % error_to_compat_str(err))
return (username, password)
return username, password
def _get_login_info(self):
def _get_login_info(self, username_option='username', password_option='password', netrc_machine=None):
"""
Get the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value
First look for the manually specified credentials using username_option
and password_option as keys in params dictionary. If no such credentials
available look in the netrc file using the netrc_machine or _NETRC_MACHINE
value.
If there's no info available, return (None, None)
"""
if self._downloader is None:
return (None, None)
username = None
password = None
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username') is not None:
username = downloader_params['username']
password = downloader_params['password']
if downloader_params.get(username_option) is not None:
username = downloader_params[username_option]
password = downloader_params[password_option]
else:
username, password = self._get_netrc_login_info()
username, password = self._get_netrc_login_info(netrc_machine)
return (username, password)
return username, password
def _get_tfa_info(self, note='two-factor verification code'):
"""
@@ -888,16 +899,16 @@ class InfoExtractor(object):
def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
hidden_inputs = {}
for input in re.findall(r'(?i)<input([^>]+)>', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input):
for input in re.findall(r'(?i)(<input[^>]+>)', html):
attrs = extract_attributes(input)
if not input:
continue
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input)
if not name:
if attrs.get('type') not in ('hidden', 'submit'):
continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input)
if not value:
continue
hidden_inputs[name.group('value')] = value.group('value')
name = attrs.get('name') or attrs.get('id')
value = attrs.get('value')
if name and value is not None:
hidden_inputs[name] = value
return hidden_inputs
def _form_hidden_inputs(self, form_id, html):
@@ -1139,6 +1150,7 @@ class InfoExtractor(object):
formats.append({
'format_id': format_id,
'url': manifest_url,
'manifest_url': manifest_url,
'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr,
'width': width,
@@ -1244,9 +1256,11 @@ class InfoExtractor(object):
# format_id intact.
if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
manifest_url = format_url(line.strip())
f = {
'format_id': '-'.join(format_id),
'url': format_url(line.strip()),
'url': manifest_url,
'manifest_url': manifest_url,
'tbr': tbr,
'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
@@ -1518,9 +1532,10 @@ class InfoExtractor(object):
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
"""
Parse formats from MPD manifest.
References:
@@ -1541,42 +1556,52 @@ class InfoExtractor(object):
def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy()
# As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
# common attributes and elements. We will only extract relevant
# for us.
def extract_common(source):
segment_timeline = source.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
start_number = source.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
timescale = source.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = source.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None:
extract_common(segment_list)
extract_Initialization(segment_list)
segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
initialization = segment_list.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
start_number = segment_template.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
else:
timescale = segment_template.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
extract_common(segment_template)
media_template = segment_template.get('media')
if media_template:
ms_info['media_template'] = media_template
@@ -1584,11 +1609,14 @@ class InfoExtractor(object):
if initialization:
ms_info['initialization_url'] = initialization
else:
initialization = segment_template.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
extract_Initialization(segment_template)
return ms_info
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = []
for period in mpd_doc.findall(_add_ns('Period')):
@@ -1631,6 +1659,7 @@ class InfoExtractor(object):
f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
@@ -1645,9 +1674,7 @@ class InfoExtractor(object):
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
@@ -1656,46 +1683,79 @@ class InfoExtractor(object):
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template:
representation_ms_info['segment_urls'] = [
media_template % {
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
'url': media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'),
}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
representation_ms_info['segment_urls'] = []
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
def add_segment_url():
representation_ms_info['segment_urls'].append(
media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
}
)
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
'Number': segment_number,
}
representation_ms_info['fragments'].append({
'url': segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += s['d']
segment_time += segment_d
add_segment_url()
segment_time += s['d']
if 'segment_urls' in representation_ms_info:
segment_number += 1
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
s_num = 0
for segment_url in representation_ms_info['segment_urls']:
s = representation_ms_info['s'][s_num]
for r in range(s.get('r', 0) + 1):
fragments.append({
'url': segment_url,
'duration': float_or_none(s['d'], representation_ms_info['timescale']),
})
representation_ms_info['fragments'] = fragments
# NB: MPD manifest may contain direct URLs to unfragmented media.
# No fragments key is present in this case.
if 'fragments' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
'fragments': [],
'protocol': 'http_dash_segments',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({'url': initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = combine_url(base_url, fragment['url'])
try:
existing_format = next(
fo for fo in formats
@@ -1768,7 +1828,7 @@ class InfoExtractor(object):
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src')
if not src:
continue
@@ -1776,22 +1836,70 @@ class InfoExtractor(object):
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
if media_info['formats'] or media_info['subtitles']:
entries.append(media_info)
return entries
def _extract_akamai_formats(self, manifest_url, video_id):
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
formats.extend(self._extract_f4m_formats(
update_url_query(f4m_url, {'hdcore': '3.7.0'}),
video_id, f4m_id='hds', fatal=False))
if 'hdcore=' not in f4m_url:
f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
f4m_formats = self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False)
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
url_base = self._search_regex(r'(?:https?|rtmp|rtsp)(://[^?]+)', url, 'format url')
http_base_url = 'http' + url_base
formats = []
if 'm3u8' not in skip_protocols:
formats.extend(self._extract_m3u8_formats(
http_base_url + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
if 'f4m' not in skip_protocols:
formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base):
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil',
video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
else:
for protocol in ('rtmp', 'rtsp'):
if protocol not in skip_protocols:
formats.append({
'url': protocol + url_base,
'format_id': protocol,
'protocol': protocol,
})
return formats
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()

View File

@@ -1,13 +1,11 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(?P<id>[0-9]+)-.+'
_VALID_URL = r'https?://(?:www\.)?criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = {
'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3',
@@ -16,20 +14,20 @@ class CriterionIE(InfoExtractor):
'ext': 'mp4',
'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(
r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
r'so\.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage)
thumbnail = self._search_regex(
r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
r'so\.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {

View File

@@ -34,22 +34,58 @@ from ..aes import (
class CrunchyrollBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.crunchyroll.com/login'
_LOGIN_FORM = 'login_form'
_NETRC_MACHINE = 'crunchyroll'
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
self.report_login()
login_url = 'https://www.crunchyroll.com/?a=formhandler'
data = urlencode_postdata({
'formname': 'RpcApiUser_Login',
'name': username,
'password': password,
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
def is_logged(webpage):
return '<title>Redirecting' in webpage
# Already logged in
if is_logged(login_page):
return
login_form_str = self._search_regex(
r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
login_page, 'login form', group='form')
post_url = extract_attributes(login_form_str).get('action')
if not post_url:
post_url = self._LOGIN_URL
elif not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
login_form = self._form_hidden_inputs(self._LOGIN_FORM, login_page)
login_form.update({
'login_form[name]': username,
'login_form[password]': password,
})
login_request = sanitized_Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._download_webpage(login_request, None, False, 'Wrong login info')
response = self._download_webpage(
post_url, None, 'Logging in', 'Wrong login info',
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if is_logged(response):
return
error = self._html_search_regex(
'(?s)<ul[^>]+class=["\']messages["\'][^>]*>(.+?)</ul>',
response, 'error message', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self):
self._login()

View File

@@ -1,61 +1,54 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import unified_strdate
class DctpTvIE(InfoExtractor):
_VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
_VALID_URL = r'https?://(?:www\.)?dctp\.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '174dd4a8a6225cf5655952f969cfbe24',
'info_dict': {
'id': '1324',
'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv',
'title': 'Videoinstallation für eine Kaufhausfassade'
'ext': 'mp4',
'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm',
'upload_date': '20110407',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
base_url = 'http://dctp-ivms2-restapi.s3.amazonaws.com/'
version_json = self._download_json(
base_url + 'version.json',
video_id, note='Determining file version')
version = version_json['version_name']
info_json = self._download_json(
'{0}{1}/restapi/slugs/{2}.json'.format(base_url, version, video_id),
video_id, note='Fetching object ID')
object_id = compat_str(info_json['object_id'])
meta_json = self._download_json(
'{0}{1}/restapi/media/{2}.json'.format(base_url, version, object_id),
video_id, note='Downloading metadata')
uuid = meta_json['uuid']
title = meta_json['title']
wide = meta_json['is_wide']
if wide:
ratio = '16x9'
else:
ratio = '4x3'
play_path = 'mp4:{0}_dctp_0500_{1}.m4v'.format(uuid, ratio)
webpage = self._download_webpage(url, video_id)
object_id = self._html_search_meta('DC.identifier', webpage)
servers_json = self._download_json(
'http://www.dctp.tv/streaming_servers/',
'http://www.dctp.tv/elastic_streaming_client/get_streaming_server/',
video_id, note='Downloading server list')
url = servers_json[0]['endpoint']
server = servers_json[0]['server']
m3u8_path = self._search_regex(
r'\'([^\'"]+/playlist\.m3u8)"', webpage, 'm3u8 path')
formats = self._extract_m3u8_formats(
'http://%s%s' % (server, m3u8_path), video_id, ext='mp4',
entry_protocol='m3u8_native')
title = self._og_search_title(webpage)
description = self._html_search_meta('DC.description', webpage)
upload_date = unified_strdate(
self._html_search_meta('DC.date.created', webpage))
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': object_id,
'title': title,
'format': 'rtmp',
'url': url,
'play_path': play_path,
'rtmp_real_time': True,
'ext': 'flv',
'display_id': video_id
'formats': formats,
'display_id': video_id,
'description': description,
'upload_date': upload_date,
'thumbnail': thumbnail,
}

View File

@@ -13,7 +13,7 @@ from ..utils import (
class DemocracynowIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?democracynow.org/(?P<id>[^\?]*)'
_VALID_URL = r'https?://(?:www\.)?democracynow\.org/(?P<id>[^\?]*)'
IE_NAME = 'democracynow'
_TESTS = [{
'url': 'http://www.democracynow.org/shows/2015/7/3',

View File

@@ -14,7 +14,7 @@ class EinthusanIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.einthusan.com/movies/watch.php?id=2447',
'md5': 'af244f4458cd667205e513d75da5b8b1',
'md5': 'd71379996ff5b7f217eca034c34e3461',
'info_dict': {
'id': '2447',
'ext': 'mp4',
@@ -25,13 +25,13 @@ class EinthusanIE(InfoExtractor):
},
{
'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
'md5': 'ef63c7a803e22315880ed182c10d1c5c',
'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': {
'id': '1671',
'ext': 'mp4',
'title': 'Soodhu Kavvuum',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:05d8a0c0281a4240d86d76e14f2f4d51',
'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
}
},
]
@@ -50,9 +50,11 @@ class EinthusanIE(InfoExtractor):
video_id = self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
video_url = self._download_webpage(
m3u8_url = self._download_webpage(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
% video_id, video_id)
% video_id, video_id, headers={'Referer': url})
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native')
description = self._html_search_meta('description', webpage)
thumbnail = self._html_search_regex(
@@ -64,7 +66,7 @@ class EinthusanIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'url': video_url,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
}

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class EngadgetIE(InfoExtractor):
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?engadget\.com/video/(?P<id>[^/?#]+)'
_TESTS = [{
# video with 5min ID

View File

@@ -8,7 +8,7 @@ from ..utils import (
class ExpoTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_VALID_URL = r'https?://(?:www\.)?expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_TEST = {
'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916',
'md5': 'fe1d728c3a813ff78f595bc8b7a707a8',

View File

@@ -31,7 +31,6 @@ from .aenetworks import (
HistoryTopicIE,
)
from .afreecatv import AfreecaTVIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
@@ -93,6 +92,7 @@ from .bbc import (
)
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE
from .beatportpro import BeatportProIE
from .bet import BetIE
from .bigflix import BigflixIE
@@ -116,7 +116,10 @@ from .brightcove import (
BrightcoveNewIE,
)
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .byutv import (
BYUtvIE,
BYUtvEventIE,
)
from .c56 import C56IE
from .camdemy import (
CamdemyIE,
@@ -195,7 +198,6 @@ from .crunchyroll import (
)
from .cspan import CSpanIE
from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import (
@@ -472,6 +474,10 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .mangomolo import (
MangomoloVideoIE,
MangomoloLiveIE,
)
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .meta import METAIE
@@ -512,6 +518,7 @@ from .movingimage import MovingImageIE
from .msn import MSNIE
from .mtv import (
MTVIE,
MTVVideoIE,
MTVServicesEmbeddedIE,
MTVDEIE,
)
@@ -534,6 +541,7 @@ from .nbc import (
CSNNEIE,
NBCIE,
NBCNewsIE,
NBCOlympicsIE,
NBCSportsIE,
NBCSportsVPlayerIE,
)
@@ -606,13 +614,14 @@ from .nowtv import (
)
from .noz import NozIE
from .npo import (
AndereTijdenIE,
NPOIE,
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
VPROIE,
WNLIE
WNLIE,
)
from .npr import NprIE
from .nrk import (
@@ -1064,6 +1073,7 @@ from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE
from .walla import WallaIE
from .washingtonpost import (
WashingtonPostIE,

View File

@@ -11,9 +11,13 @@ class Formula1IE(InfoExtractor):
'md5': '8c79e54be72078b26b89e0e111c0502b',
'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'ext': 'flv',
'ext': 'mp4',
'title': 'Race highlights - Spain 2016',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',

View File

@@ -1,14 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
)
class FOXIE(InfoExtractor):
class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
@@ -30,14 +30,26 @@ class FOXIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url']
settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), video_id)
fox_pdk_player = settings['fox_pdk_player']
release_url = fox_pdk_player['release_url']
query = {
'mbr': 'true',
'switch': 'http'
}
if fox_pdk_player.get('access') == 'locked':
ap_p = settings['foxAdobePassProvider']
rating = ap_p.get('videoRating')
if rating == 'n/a':
rating = None
resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
release_url, {'switch': 'http'}), {'force_smil_url': True}),
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'id': video_id,
}

View File

@@ -2,21 +2,21 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import month_by_name
class FranceInterIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/emissions/(?P<id>[^?#]+)'
_TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
'md5': '4764932e466e6f6c79c317d2e74f6884',
'url': 'https://www.franceinter.fr/emissions/affaires-sensibles/affaires-sensibles-07-septembre-2016',
'md5': '9e54d7bdb6fdc02a841007f8a975c094',
'info_dict': {
'id': '793962',
'id': 'affaires-sensibles/affaires-sensibles-07-septembre-2016',
'ext': 'mp3',
'title': 'LHistoire dans les jeux vidéo',
'description': 'md5:7e93ddb4451e7530022792240a3049c7',
'timestamp': 1387369800,
'upload_date': '20131218',
'title': 'Affaire Cahuzac : le contentieux du compte en Suisse',
'description': 'md5:401969c5d318c061f86bda1fa359292b',
'upload_date': '20160907',
},
}
@@ -25,23 +25,30 @@ class FranceInterIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
path = self._search_regex(
r'<a id="player".+?href="([^"]+)"', webpage, 'video url')
video_url = 'http://www.franceinter.fr/' + path
video_url = self._search_regex(
r'(?s)<div[^>]+class=["\']page-diffusion["\'][^>]*>.*?<button[^>]+data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video url', group='url')
title = self._html_search_regex(
r'<span class="title-diffusion">(.+?)</span>', webpage, 'title')
description = self._html_search_regex(
r'<span class="description">(.*?)</span>',
webpage, 'description', fatal=False)
timestamp = int_or_none(self._search_regex(
r'data-date="(\d+)"', webpage, 'upload date', fatal=False))
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
upload_date_str = self._search_regex(
r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
webpage, 'upload date', fatal=False)
if upload_date_str:
upload_date_list = upload_date_str.split()
upload_date_list.reverse()
upload_date_list[1] = '%02d' % (month_by_name(upload_date_list[1], lang='fr') or 0)
upload_date_list[2] = '%02d' % int(upload_date_list[2])
upload_date = ''.join(upload_date_list)
else:
upload_date = None
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'upload_date': upload_date,
'formats': [{
'url': video_url,
'vcodec': 'none',

View File

@@ -8,7 +8,7 @@ from .common import InfoExtractor
class FreespeechIE(InfoExtractor):
IE_NAME = 'freespeech.org'
_VALID_URL = r'https://www\.freespeech\.org/video/(?P<title>.+)'
_VALID_URL = r'https?://(?:www\.)?freespeech\.org/video/(?P<title>.+)'
_TEST = {
'add_ie': ['Youtube'],
'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',

View File

@@ -9,7 +9,7 @@ from ..utils import (
class GameStarIE(InfoExtractor):
_VALID_URL = r'https?://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': '96974ecbb7fd8d0d20fca5a00810cea7',

View File

@@ -1369,6 +1369,11 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Vimeo'],
},
{
# generic vimeo embed that requires original URL passed as Referer
'url': 'http://racing4everyone.eu/2016/07/30/formula-1-2016-round12-germany/',
'only_matching': True,
},
{
'url': 'https://support.arkena.com/display/PLAY/Ways+to+embed+your+video',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
@@ -1652,7 +1657,9 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0])
doc, video_id,
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
@@ -2249,6 +2256,35 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group('url'))), 'VODPlatform')
# Look for Mangomolo embeds
mobj = re.search(
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
(?:
video\?.*?\bid=(?P<video_id>\d+)|
index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
).+?)\1''', webpage)
if mobj is not None:
info = {
'_type': 'url_transparent',
'url': self._proto_relative_url(unescapeHTML(mobj.group('url'))),
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'uploader': video_uploader,
}
video_id = mobj.group('video_id')
if video_id:
info.update({
'ie_key': 'MangomoloVideo',
'id': video_id,
})
else:
info.update({
'ie_key': 'MangomoloLive',
'id': mobj.group('channel_id'),
})
return info
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
@@ -2296,12 +2332,23 @@ class GenericIE(InfoExtractor):
info_dict.update(json_ld)
return info_dict
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
for entry in entries:
entry.update({
'id': video_id,
'title': video_title,
})
self._sort_formats(entry['formats'])
return self.playlist_result(entries)
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True
vpath = compat_urlparse.urlparse(vurl).path
vext = determine_ext(vpath)
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml')
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml', 'js')
def filter_video(urls):
return list(filter(check_video, urls))
@@ -2351,9 +2398,6 @@ class GenericIE(InfoExtractor):
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
if not found:
# HTML5 video
found = re.findall(r'(?s)<(?:video|audio)[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']', webpage)
if not found:
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
found = re.search(

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import random
import re
import math
from .common import InfoExtractor
@@ -14,6 +15,7 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
orderedSet,
str_or_none,
)
@@ -63,6 +65,9 @@ class GloboIE(InfoExtractor):
}, {
'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
'only_matching': True,
}, {
'url': 'globo:3607726',
'only_matching': True,
}]
class MD5(object):
@@ -396,7 +401,7 @@ class GloboIE(InfoExtractor):
class GloboArticleIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/.]+)(?:\.html)?'
_VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})',
@@ -408,15 +413,20 @@ class GloboArticleIE(InfoExtractor):
_TESTS = [{
'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
'info_dict': {
'id': '3652183',
'ext': 'mp4',
'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
'duration': 110.711,
'uploader': 'Rede Globo',
'uploader_id': '196',
}
'id': 'novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes',
'title': 'Novidade na fiscalização de bagagem pela Receita provoca discussões',
'description': 'md5:c3c4b4d4c30c32fce460040b1ac46b12',
},
'playlist_count': 1,
}, {
'url': 'http://g1.globo.com/pr/parana/noticia/2016/09/mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato.html',
'info_dict': {
'id': 'mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato',
'title': "Lula era o 'comandante máximo' do esquema da Lava Jato, diz MPF",
'description': 'md5:8aa7cc8beda4dc71cc8553e00b77c54c',
},
'playlist_count': 6,
}, {
'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
'only_matching': True,
@@ -435,5 +445,12 @@ class GloboArticleIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
return self.url_result('globo:%s' % video_id, 'Globo')
video_ids = []
for video_regex in self._VIDEOID_REGEXES:
video_ids.extend(re.findall(video_regex, webpage))
entries = [
self.url_result('globo:%s' % video_id, GloboIE.ie_key())
for video_id in orderedSet(video_ids)]
title = self._og_search_title(webpage, fatal=False)
description = self._html_search_meta('description', webpage)
return self.playlist_result(entries, display_id, title, description)

View File

@@ -8,6 +8,8 @@ from ..utils import (
int_or_none,
determine_ext,
parse_age_limit,
urlencode_postdata,
ExtractorError,
)
@@ -19,7 +21,7 @@ class GoIE(InfoExtractor):
'watchdisneyjunior': '008',
'watchdisneyxd': '009',
}
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/.*?vdka(?P<id>\w+)' % '|'.join(_BRANDS.keys())
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:[^/]+/)*(?:vdka(?P<id>\w+)|season-\d+/\d+-(?P<display_id>[^/?#]+))' % '|'.join(_BRANDS.keys())
_TESTS = [{
'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx',
'info_dict': {
@@ -38,9 +40,13 @@ class GoIE(InfoExtractor):
}]
def _real_extract(self, url):
sub_domain, video_id = re.match(self._VALID_URL, url).groups()
sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'data-video-id=["\']VDKA(\w+)', webpage, 'video id')
brand = self._BRANDS[sub_domain]
video_data = self._download_json(
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (self._BRANDS[sub_domain], video_id),
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id),
video_id)['video'][0]
title = video_data['title']
@@ -52,6 +58,21 @@ class GoIE(InfoExtractor):
format_id = asset.get('format')
ext = determine_ext(asset_url)
if ext == 'm3u8':
video_type = video_data.get('type')
if video_type == 'lf':
entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata({
'video_id': video_data['id'],
'video_type': video_type,
'brand': brand,
'device': '001',
}))
errors = entitlement.get('errors', {}).get('errors', [])
if errors:
error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey']
formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False))
else:

View File

@@ -10,7 +10,7 @@ from ..utils import unified_strdate
class GooglePlusIE(InfoExtractor):
IE_DESC = 'Google Plus'
_VALID_URL = r'https://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
_VALID_URL = r'https?://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
IE_NAME = 'plus.google'
_TEST = {
'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH',

View File

@@ -11,7 +11,7 @@ from ..utils import (
class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
_VALID_URL = r'https?://(?:www\.)?goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '4b6db9a0a333142eb9f15913142b0ed1',

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://www\.hark\.com/clips/(?P<id>.+?)-.+'
_VALID_URL = r'https?://(?:www\.)?hark\.com/clips/(?P<id>.+?)-.+'
_TEST = {
'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
'md5': '6783a58491b47b92c7c1af5a77d4cbee',

View File

@@ -12,7 +12,7 @@ from ..utils import (
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'https?://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_VALID_URL = r'https?://(?:www\.)?hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_TEST = {
'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
'md5': '2c2cd2f76ef11a9b3b581e8b232f3d96',

View File

@@ -94,7 +94,7 @@ class ImdbIE(InfoExtractor):
class ImdbListIE(InfoExtractor):
IE_NAME = 'imdb:list'
IE_DESC = 'Internet Movie Database lists'
_VALID_URL = r'https?://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_VALID_URL = r'https?://(?:www\.)?imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_TEST = {
'url': 'http://www.imdb.com/list/JFs9NWw6XI0',
'info_dict': {

View File

@@ -29,6 +29,7 @@ class InstagramIE(InfoExtractor):
'uploader': 'Naomi Leonor Phan-Quang',
'like_count': int,
'comment_count': int,
'comments': list,
},
}, {
# missing description
@@ -44,6 +45,7 @@ class InstagramIE(InfoExtractor):
'uploader': 'Britney Spears',
'like_count': int,
'comment_count': int,
'comments': list,
},
'params': {
'skip_download': True,
@@ -82,7 +84,7 @@ class InstagramIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
(video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count) = [None] * 8
uploader_id, like_count, comment_count, height, width) = [None] * 10
shared_data = self._parse_json(
self._search_regex(
@@ -94,6 +96,8 @@ class InstagramIE(InfoExtractor):
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
if media:
video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height'))
width = int_or_none(media.get('dimensions', {}).get('width'))
description = media.get('caption')
thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date'))
@@ -101,10 +105,24 @@ class InstagramIE(InfoExtractor):
uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count'))
comments = [{
'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'),
'id': comment.get('id'),
'text': comment.get('text'),
'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')]
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)
formats = [{
'url': video_url,
'width': width,
'height': height,
}]
if not uploader_id:
uploader_id = self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
@@ -121,7 +139,7 @@ class InstagramIE(InfoExtractor):
return {
'id': video_id,
'url': video_url,
'formats': formats,
'ext': 'mp4',
'title': 'Video by %s' % uploader_id,
'description': description,
@@ -131,6 +149,7 @@ class InstagramIE(InfoExtractor):
'uploader': uploader,
'like_count': like_count,
'comment_count': comment_count,
'comments': comments,
}

View File

@@ -9,6 +9,7 @@ from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
)
@@ -19,24 +20,32 @@ class JWPlatformBaseIE(InfoExtractor):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\((?P<options>[^)]+)\)',
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id)
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
@@ -55,6 +64,9 @@ class JWPlatformBaseIE(InfoExtractor):
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({

View File

@@ -105,20 +105,20 @@ class KalturaIE(InfoExtractor):
kWidget\.(?:thumb)?[Ee]mbed\(
\{.*?
(?P<q1>['\"])wid(?P=q1)\s*:\s*
(?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?
(?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),
(?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
""", webpage) or
re.search(
r'''(?xs)
(?P<q1>["\'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/(?:(?!(?P=q1)).)*(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
(?P=q1).*?
(?:
entry_?[Ii]d|
(?P<q2>["\'])entry_?[Ii]d(?P=q2)
)\s*:\s*
(?P<q3>["\'])(?P<id>.+?)(?P=q3)
(?P<q3>["\'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage))
if mobj:
embed_info = mobj.groupdict()
@@ -262,8 +262,16 @@ class KalturaIE(InfoExtractor):
# Continue if asset is not ready
if f.get('status') != 2:
continue
# Original format that's not available (e.g. kaltura:1926081:0_c03e1b5g)
# skip for now.
if f.get('fileExt') == 'chun':
continue
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
# -f mp4-56)
vcodec = 'none' if 'videoCodecId' not in f and f.get(
'frameRate') == 0 else f.get('videoCodecId')
formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f,
'ext': f.get('fileExt'),
@@ -271,7 +279,7 @@ class KalturaIE(InfoExtractor):
'fps': int_or_none(f.get('frameRate')),
'filesize_approx': int_or_none(f.get('size'), invscale=1024),
'container': f.get('containerFormat'),
'vcodec': f.get('videoCodecId'),
'vcodec': vcodec,
'height': int_or_none(f.get('height')),
'width': int_or_none(f.get('width')),
'url': video_url,

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class KaraoketvIE(InfoExtractor):
_VALID_URL = r'https?://www\.karaoketv\.co\.il/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?karaoketv\.co\.il/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': {

View File

@@ -21,6 +21,10 @@ class KetnetIE(InfoExtractor):
}, {
'url': 'https://www.ketnet.be/achter-de-schermen/sien-repeteert-voor-stars-for-life',
'only_matching': True,
}, {
# mzsource, geo restricted to Belgium
'url': 'https://www.ketnet.be/kijken/nachtwacht/de-bermadoe',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -36,9 +40,25 @@ class KetnetIE(InfoExtractor):
title = config['title']
formats = self._extract_m3u8_formats(
config['source']['hls'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
formats = []
for source_key in ('', 'mz'):
source = config.get('%ssource' % source_key)
if not isinstance(source, dict):
continue
for format_id, format_url in source.items():
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id=format_id,
fatal=False))
elif format_id == 'hds':
formats.extend(self._extract_f4m_formats(
format_url, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
return {

View File

@@ -6,7 +6,7 @@ from ..utils import smuggle_url
class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_VALID_URL = r'https?://(?:www\.)?kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_TESTS = [{
'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant/description',
'md5': 'c81addca81327ffa66c642b5d8b08cab',

View File

@@ -59,7 +59,7 @@ class KuwoBaseIE(InfoExtractor):
class KuwoIE(KuwoBaseIE):
IE_NAME = 'kuwo:song'
IE_DESC = '酷我音乐'
_VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/yinyue/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/yinyue/635632/',
'info_dict': {
@@ -82,7 +82,7 @@ class KuwoIE(KuwoBaseIE):
'upload_date': '20150518',
},
'params': {
'format': 'mp3-320'
'format': 'mp3-320',
},
}, {
'url': 'http://www.kuwo.cn/yinyue/3197154?catalog=yueku2016',
@@ -91,10 +91,10 @@ class KuwoIE(KuwoBaseIE):
def _real_extract(self, url):
song_id = self._match_id(url)
webpage = self._download_webpage(
webpage, urlh = self._download_webpage_handle(
url, song_id, note='Download song detail info',
errnote='Unable to get song detail info')
if '对不起,该歌曲由于版权问题已被下线,将返回网站首页' in webpage:
if song_id not in urlh.geturl() or '对不起,该歌曲由于版权问题已被下线,将返回网站首页' in webpage:
raise ExtractorError('this song has been offline because of copyright issues', expected=True)
song_name = self._html_search_regex(
@@ -139,7 +139,7 @@ class KuwoIE(KuwoBaseIE):
class KuwoAlbumIE(InfoExtractor):
IE_NAME = 'kuwo:album'
IE_DESC = '酷我音乐 - 专辑'
_VALID_URL = r'https?://www\.kuwo\.cn/album/(?P<id>\d+?)/'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/album/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': {
@@ -181,7 +181,7 @@ class KuwoChartIE(InfoExtractor):
'info_dict': {
'id': '香港中文龙虎榜',
},
'playlist_mincount': 10,
'playlist_mincount': 7,
}
def _real_extract(self, url):
@@ -200,7 +200,7 @@ class KuwoChartIE(InfoExtractor):
class KuwoSingerIE(InfoExtractor):
IE_NAME = 'kuwo:singer'
IE_DESC = '酷我音乐 - 歌手'
_VALID_URL = r'https?://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mingxing/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': {
@@ -296,14 +296,14 @@ class KuwoCategoryIE(InfoExtractor):
class KuwoMvIE(KuwoBaseIE):
IE_NAME = 'kuwo:mv'
IE_DESC = '酷我音乐 - MV'
_VALID_URL = r'https?://www\.kuwo\.cn/mv/(?P<id>\d+?)/'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mv/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/mv/6480076/',
'info_dict': {
'id': '6480076',
'ext': 'mp4',
'title': 'My HouseMV',
'creator': 'PM02:00',
'creator': '2PM',
},
# In this video, music URLs (anti.s) are blocked outside China and
# USA, while the MV URL (mvurl) is available globally, so force the MV

View File

@@ -29,7 +29,7 @@ from ..utils import (
class LeIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|(?:sports\.le|(?:www\.)?lesports)\.com/(?:match|video))/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
@@ -73,6 +73,12 @@ class LeIE(InfoExtractor):
}, {
'url': 'http://sports.le.com/video/25737697.html',
'only_matching': True,
}, {
'url': 'http://www.lesports.com/match/1023203003.html',
'only_matching': True,
}, {
'url': 'http://sports.le.com/match/1023203003.html',
'only_matching': True,
}]
# ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf

View File

@@ -59,7 +59,7 @@ class LimelightBaseIE(InfoExtractor):
format_id = 'rtmp'
if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_url = 'http://%s/%s' % (rtmp.group('host').replace('csl.', 'cpl.'), rtmp.group('playpath')[4:])
http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:]
urls.append(http_url)
http_fmt = fmt.copy()
http_fmt.update({

View File

@@ -14,7 +14,7 @@ from ..utils import (
class LiTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
_VALID_URL = r'https?://(?:www\.)?litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
_URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'

View File

@@ -94,7 +94,7 @@ class LyndaBaseIE(InfoExtractor):
class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'

View File

@@ -7,7 +7,7 @@ from ..utils import ExtractorError
class MacGameStoreIE(InfoExtractor):
IE_NAME = 'macgamestore'
IE_DESC = 'MacGameStore trailers'
_VALID_URL = r'https?://www\.macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
_TEST = {
'url': 'http://www.macgamestore.com/mediaviewer.php?trailer=2450',

View File

@@ -0,0 +1,54 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
)
class MangomoloBaseIE(InfoExtractor):
def _get_real_id(self, page_id):
return page_id
def _real_extract(self, url):
page_id = self._get_real_id(self._match_id(url))
webpage = self._download_webpage(url, page_id)
hidden_inputs = self._hidden_inputs(webpage)
m3u8_entry_protocol = 'm3u8' if self._IS_LIVE else 'm3u8_native'
format_url = self._html_search_regex(
[
r'file\s*:\s*"(https?://[^"]+?/playlist.m3u8)',
r'<a[^>]+href="(rtsp://[^"]+)"'
], webpage, 'format url')
formats = self._extract_wowza_formats(
format_url, page_id, m3u8_entry_protocol, ['smil'])
self._sort_formats(formats)
return {
'id': page_id,
'title': self._live_title(page_id) if self._IS_LIVE else page_id,
'uploader_id': hidden_inputs.get('userid'),
'duration': int_or_none(hidden_inputs.get('duration')),
'is_live': self._IS_LIVE,
'formats': formats,
}
class MangomoloVideoIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:video'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/video\?.*?\bid=(?P<id>\d+)'
_IS_LIVE = False
class MangomoloLiveIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:live'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/index\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
_IS_LIVE = True
def _get_real_id(self, page_id):
return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()

View File

@@ -9,7 +9,7 @@ from ..utils import (
class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',

View File

@@ -6,7 +6,7 @@ from ..utils import int_or_none
class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.)?mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV'
_TESTS = [{

View File

@@ -8,7 +8,7 @@ from ..utils import (
class MinistryGridIE(InfoExtractor):
_VALID_URL = r'https?://www\.ministrygrid.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?ministrygrid\.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])'
_TEST = {
'url': 'http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers',

View File

@@ -74,7 +74,7 @@ class MiTeleBaseIE(InfoExtractor):
class MiTeleIE(MiTeleBaseIE):
IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',

View File

@@ -9,7 +9,7 @@ from ..compat import (
class MotorsportIE(InfoExtractor):
IE_DESC = 'motorsport.com'
_VALID_URL = r'https?://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
_TEST = {
'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/',
'info_dict': {

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class MoviezineIE(InfoExtractor):
_VALID_URL = r'https?://www\.moviezine\.se/video/(?P<id>[^?#]+)'
_VALID_URL = r'https?://(?:www\.)?moviezine\.se/video/(?P<id>[^?#]+)'
_TEST = {
'url': 'http://www.moviezine.se/video/205866',

View File

@@ -270,6 +270,29 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
class MTVIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv'
_VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.mtv.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.mtv.com/video-clips/vl8qof/unlocking-the-truth-trailer',
'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
'info_dict': {
'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
'ext': 'mp4',
'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
'timestamp': 1468846800,
'upload_date': '20160718',
},
}, {
'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
'only_matching': True,
}]
class MTVVideoIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv:video'
_VALID_URL = r'''(?x)^https?://
(?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''

View File

@@ -9,9 +9,9 @@ from ..utils import (
class MwaveIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
_TEST = {
_TESTS = [{
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
# md5 is unstable
'info_dict': {
@@ -23,7 +23,10 @@ class MwaveIE(InfoExtractor):
'duration': 206,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/mnettv/videodetail.m?searchVideoDetailVO.clip_id=176199',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -60,8 +63,8 @@ class MwaveIE(InfoExtractor):
class MwaveMeetGreetIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/meetgreet/view/(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?meetgreet/view/(?P<id>\d+)'
_TESTS = [{
'url': 'http://mwave.interest.me/meetgreet/view/256',
'info_dict': {
'id': '173294',
@@ -72,7 +75,10 @@ class MwaveMeetGreetIE(InfoExtractor):
'duration': 3634,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/meetgreet/view/256',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -11,7 +11,7 @@ from ..utils import (
class MySpassIE(InfoExtractor):
_VALID_URL = r'https?://www\.myspass\.de/.*'
_VALID_URL = r'https?://(?:www\.)?myspass\.de/.*'
_TEST = {
'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
'md5': '0b49f4844a068f8b33f4b7c88405862b',

View File

@@ -13,7 +13,7 @@ from ..utils import (
class NBCIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
_VALID_URL = r'https?://(?:www\.)?nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
_TESTS = [
{
@@ -138,7 +138,7 @@ class NBCSportsVPlayerIE(InfoExtractor):
class NBCSportsIE(InfoExtractor):
# Does not include https because its certificate is invalid
_VALID_URL = r'https?://www\.nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke',
@@ -161,7 +161,7 @@ class NBCSportsIE(InfoExtractor):
class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://www\.csnne\.com/video/(?P<id>[0-9a-z-]+)'
_VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
@@ -335,3 +335,43 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
'ie_key': 'ThePlatformFeed',
}
class NBCOlympicsIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbcolympics\.com/video/(?P<id>[a-z-]+)'
_TEST = {
# Geo-restricted to US
'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'md5': '54fecf846d05429fbaa18af557ee523a',
'info_dict': {
'id': 'WjTBzDXx5AUq',
'display_id': 'justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'ext': 'mp4',
'title': 'Rose\'s son Leo was in tears after his dad won gold',
'description': 'Olympic gold medalist Justin Rose gets emotional talking to the impact his win in men\'s golf has already had on his children.',
'timestamp': 1471274964,
'upload_date': '20160815',
'uploader': 'NBCU-SPORTS',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
iframe_url = drupal_settings['vod']['iframe_url']
theplatform_url = iframe_url.replace(
'vplayer.nbcolympics.com', 'player.theplatform.com')
return {
'_type': 'url_transparent',
'url': theplatform_url,
'ie_key': ThePlatformIE.ie_key(),
'display_id': display_id,
}

View File

@@ -23,7 +23,7 @@ class NDRBaseIE(InfoExtractor):
class NDRIE(NDRBaseIE):
IE_NAME = 'ndr'
IE_DESC = 'NDR.de - Norddeutscher Rundfunk'
_VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
_VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
_TESTS = [{
# httpVideo, same content id
'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
@@ -105,7 +105,7 @@ class NDRIE(NDRBaseIE):
class NJoyIE(NDRBaseIE):
IE_NAME = 'njoy'
IE_DESC = 'N-JOY'
_VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
_VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
_TESTS = [{
# httpVideo, same content id
'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
@@ -238,7 +238,7 @@ class NDREmbedBaseIE(InfoExtractor):
class NDREmbedIE(NDREmbedBaseIE):
IE_NAME = 'ndr:embed'
_VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
_VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
_TESTS = [{
'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html',
'md5': '8b9306142fe65bbdefb5ce24edb6b0a9',
@@ -332,7 +332,7 @@ class NDREmbedIE(NDREmbedBaseIE):
class NJoyEmbedIE(NDREmbedBaseIE):
IE_NAME = 'njoy:embed'
_VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
_VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
_TESTS = [{
# httpVideo
'url': 'http://www.n-joy.de/events/reeperbahnfestival/doku948-player_image-bc168e87-5263-4d6d-bd27-bb643005a6de_theme-n-joy.html',

View File

@@ -7,7 +7,7 @@ from ..utils import parse_iso8601
class NextMediaIE(InfoExtractor):
IE_DESC = '蘋果日報'
_VALID_URL = r'https?://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
_VALID_URL = r'https?://hk\.apple\.nextmedia\.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199',
'md5': 'dff9fad7009311c421176d1ac90bfe4f',
@@ -68,7 +68,7 @@ class NextMediaIE(InfoExtractor):
class NextMediaActionNewsIE(NextMediaIE):
IE_DESC = '蘋果日報 - 動新聞'
_VALID_URL = r'https?://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
_VALID_URL = r'https?://hk\.dv\.nextmedia\.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
_TESTS = [{
'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460',
'md5': '05fce8ffeed7a5e00665d4b7cf0f9201',
@@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'https?://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',

View File

@@ -165,7 +165,7 @@ class NFLIE(InfoExtractor):
group='config'))
# For articles, the id in the url is not the video id
video_id = self._search_regex(
r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>.+?)\1',
r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'video id', default=video_id, group='id')
config = self._download_json(config_url, video_id, 'Downloading player config')
url_template = NFLIE.prepend_host(

View File

@@ -1,14 +1,15 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import ExtractorError
class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>.+?)\.html'
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>[^/]+/[^/?#&]+)'
_TEST = {
# Videos available only for a limited period of time. Visit
# http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples.
'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815.html',
'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815',
'info_dict': {
'id': 'A1bnNiNTE6nY3jLllS-BIISfcC_PpvF5',
'ext': 'flv',
@@ -19,25 +20,25 @@ class NhkVodIE(InfoExtractor):
},
'skip': 'Videos available only for a limited period of time',
}
_API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = self._download_json(self._API_URL, video_id)
embed_code = self._search_regex(
r'nw_vod_ooplayer\([^,]+,\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'ooyala embed code', group='id')
try:
episode = next(
e for e in data['data']['episodes']
if e.get('url') and video_id in e['url'])
except StopIteration:
raise ExtractorError('Unable to find episode')
title = self._search_regex(
r'<div[^>]+class=["\']episode-detail["\']>\s*<h\d+>([^<]+)',
webpage, 'title', default=None)
description = self._html_search_regex(
r'(?s)<p[^>]+class=["\']description["\'][^>]*>(.+?)</p>',
webpage, 'description', default=None)
series = self._search_regex(
r'<h2[^>]+class=["\']detail-top-player-title[^>]+><a[^>]+>([^<]+)',
webpage, 'series', default=None)
embed_code = episode['vod_id']
title = episode.get('sub_title_clean') or episode['sub_title']
description = episode.get('description_clean') or episode.get('description')
series = episode.get('title_clean') or episode.get('title')
return {
'_type': 'url_transparent',

View File

@@ -252,7 +252,7 @@ class NiconicoIE(InfoExtractor):
class NiconicoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://www\.nicovideo\.jp/mylist/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/mylist/(?P<id>\d+)'
_TEST = {
'url': 'http://www.nicovideo.jp/mylist/27411728',

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
fix_xml_ampersands,
orderedSet,
parse_duration,
qualities,
strip_jsonp,
@@ -429,7 +430,7 @@ class SchoolTVIE(InfoExtractor):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-mid=(["\'])(?P<id>.+?)\1', webpage, 'video_id', group='id')
r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video_id', group='id')
return {
'_type': 'url_transparent',
'ie_key': 'NPO',
@@ -438,9 +439,29 @@ class SchoolTVIE(InfoExtractor):
}
class VPROIE(NPOIE):
class NPOPlaylistBaseIE(NPOIE):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
for video_id in orderedSet(re.findall(self._PLAYLIST_ENTRY_RE, webpage))
]
playlist_title = self._html_search_regex(
self._PLAYLIST_TITLE_RE, webpage, 'playlist title',
default=None) or self._og_search_title(webpage)
return self.playlist_result(entries, playlist_id, playlist_title)
class VPROIE(NPOPlaylistBaseIE):
IE_NAME = 'vpro'
_VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
_PLAYLIST_TITLE_RE = r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)'
_PLAYLIST_ENTRY_RE = r'data-media-id="([^"]+)"'
_TESTS = [
{
@@ -453,12 +474,13 @@ class VPROIE(NPOIE):
'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225',
},
'skip': 'Video gone',
},
{
'url': 'http://www.vpro.nl/programmas/2doc/2015/sergio-herman.html',
'info_dict': {
'id': 'sergio-herman',
'title': 'Sergio Herman: Fucking perfect',
'title': 'sergio herman: fucking perfect',
},
'playlist_count': 2,
},
@@ -467,54 +489,40 @@ class VPROIE(NPOIE):
'url': 'http://www.vpro.nl/programmas/2doc/2015/education-education.html',
'info_dict': {
'id': 'education-education',
'title': '2Doc',
'title': 'education education',
},
'playlist_count': 2,
}
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
for video_id in re.findall(r'data-media-id="([^"]+)"', webpage)
]
playlist_title = self._search_regex(
r'<title>\s*([^>]+?)\s*-\s*Teledoc\s*-\s*VPRO\s*</title>',
webpage, 'playlist title', default=None) or self._og_search_title(webpage)
return self.playlist_result(entries, playlist_id, playlist_title)
class WNLIE(InfoExtractor):
class WNLIE(NPOPlaylistBaseIE):
IE_NAME = 'wnl'
_VALID_URL = r'https?://(?:www\.)?omroepwnl\.nl/video/detail/(?P<id>[^/]+)__\d+'
_PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>'
_PLAYLIST_ENTRY_RE = r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>Deel \d+'
_TEST = {
_TESTS = [{
'url': 'http://www.omroepwnl.nl/video/detail/vandaag-de-dag-6-mei__060515',
'info_dict': {
'id': 'vandaag-de-dag-6-mei',
'title': 'Vandaag de Dag 6 mei',
},
'playlist_count': 4,
}
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
class AndereTijdenIE(NPOPlaylistBaseIE):
IE_NAME = 'anderetijden'
_VALID_URL = r'https?://(?:www\.)?anderetijden\.nl/programma/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class=["\'].*?\bpage-title\b.*?["\'][^>]*>(.+?)</h1>'
_PLAYLIST_ENTRY_RE = r'<figure[^>]+class=["\']episode-container episode-page["\'][^>]+data-prid=["\'](.+?)["\']'
entries = [
self.url_result('npo:%s' % video_id, 'NPO')
for video_id, part in re.findall(
r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>(Deel \d+)', webpage)
]
playlist_title = self._html_search_regex(
r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>',
webpage, 'playlist title')
return self.playlist_result(entries, playlist_id, playlist_title)
_TESTS = [{
'url': 'http://anderetijden.nl/programma/1/Andere-Tijden/aflevering/676/Duitse-soldaten-over-de-Slag-bij-Arnhem',
'info_dict': {
'id': 'Duitse-soldaten-over-de-Slag-bij-Arnhem',
'title': 'Duitse soldaten over de Slag bij Arnhem',
},
'playlist_count': 3,
}]

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class OktoberfestTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.oktoberfest-tv\.de/[^/]+/[^/]+/video/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?oktoberfest-tv\.de/[^/]+/[^/]+/video/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.oktoberfest-tv.de/de/kameras/video/hb-zelt',

View File

@@ -47,7 +47,7 @@ class OoyalaBaseIE(InfoExtractor):
delivery_type = stream['delivery_type']
if delivery_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
s_url, embed_code, 'mp4', 'm3u8_native',
re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif delivery_type == 'hds' or ext == 'f4m':
formats.extend(self._extract_f4m_formats(

View File

@@ -13,7 +13,7 @@ from ..utils import (
class OpenloadIE(InfoExtractor):
_VALID_URL = r'https://openload.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_VALID_URL = r'https?://openload\.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o',
@@ -24,6 +24,22 @@ class OpenloadIE(InfoExtractor):
'title': 'skyrim_no-audio_1080.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'https://openload.co/embed/rjC09fkPLYs',
'info_dict': {
'id': 'rjC09fkPLYs',
'ext': 'mp4',
'title': 'movie.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
'params': {
'skip_download': True, # test subtitles only
},
}, {
'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
'only_matching': True,
@@ -51,7 +67,8 @@ class OpenloadIE(InfoExtractor):
# declared to be freely used in youtube-dl
# See https://github.com/rg3/youtube-dl/issues/10408
enc_data = self._html_search_regex(
r'<span[^>]+id="hiddenurl"[^>]*>([^<]+)</span>', webpage, 'encrypted data')
r'<span[^>]*>([^<]+)</span>\s*<span[^>]*>[^<]+</span>\s*<span[^>]+id="streamurl"',
webpage, 'encrypted data')
video_url_chars = []
@@ -60,7 +77,7 @@ class OpenloadIE(InfoExtractor):
if j >= 33 and j <= 126:
j = ((j + 14) % 94) + 33
if idx == len(enc_data) - 1:
j += 3
j += 2
video_url_chars += compat_chr(j)
video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
@@ -70,11 +87,17 @@ class OpenloadIE(InfoExtractor):
'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True)
return {
entries = self._parse_html5_media_entries(url, webpage, video_id)
subtitles = entries[0]['subtitles'] if entries else None
info_dict = {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'url': video_url,
# Seems all videos have extensions in their titles
'ext': determine_ext(title),
'subtitles': subtitles,
}
return info_dict

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
@@ -41,6 +43,13 @@ class PeriscopeIE(PeriscopeBaseIE):
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?periscope\.tv/(?:(?!\1).)+)\1', webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
token = self._match_id(url)
@@ -78,7 +87,7 @@ class PeriscopeIE(PeriscopeBaseIE):
'ext': 'flv' if format_id == 'rtmp' else 'mp4',
}
if format_id != 'rtmp':
f['protocol'] = 'm3u8_native' if state == 'ended' else 'm3u8'
f['protocol'] = 'm3u8_native' if state in ('ended', 'timed_out') else 'm3u8'
formats.append(f)
self._sort_formats(formats)
@@ -94,7 +103,7 @@ class PeriscopeIE(PeriscopeBaseIE):
class PeriscopeUserIE(PeriscopeBaseIE):
_VALID_URL = r'https?://www\.periscope\.tv/(?P<id>[^/]+)/?$'
_VALID_URL = r'https?://(?:www\.)?periscope\.tv/(?P<id>[^/]+)/?$'
IE_DESC = 'Periscope user videos'
IE_NAME = 'periscope:user'
@@ -123,7 +132,7 @@ class PeriscopeUserIE(PeriscopeBaseIE):
user = list(data_store['UserCache']['users'].values())[0]['user']
user_id = user['id']
session_id = data_store['SessionToken']['broadcastHistory']['token']['session_id']
session_id = data_store['SessionToken']['public']['broadcastHistory']['token']['session_id']
broadcasts = self._call_api(
'getUserBroadcastsPublic',

View File

@@ -14,7 +14,7 @@ from ..utils import (
class PlayvidIE(InfoExtractor):
_VALID_URL = r'https?://www\.playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
_VALID_URL = r'https?://(?:www\.)?playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
_TESTS = [{
'url': 'http://www.playvid.com/watch/RnmBNgtrrJu',
'md5': 'ffa2f6b2119af359f544388d8c01eb6c',

View File

@@ -7,7 +7,6 @@ from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
@@ -15,12 +14,12 @@ from ..utils import (
class PromptFileIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?promptfile\.com/l/(?P<id>[0-9A-Z\-]+)'
_TEST = {
'url': 'http://www.promptfile.com/l/D21B4746E9-F01462F0FF',
'md5': 'd1451b6302da7215485837aaea882c4c',
'url': 'http://www.promptfile.com/l/86D1CE8462-576CAAE416',
'md5': '5a7e285a26e0d66d9a263fae91bc92ce',
'info_dict': {
'id': 'D21B4746E9-F01462F0FF',
'id': '86D1CE8462-576CAAE416',
'ext': 'mp4',
'title': 'Birds.mp4',
'title': 'oceans.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
@@ -33,14 +32,23 @@ class PromptFileIE(InfoExtractor):
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
chash = self._search_regex(
r'val\("([^"]*)"\s*\+\s*\$\("#chash"\)', webpage, 'chash')
fields = self._hidden_inputs(webpage)
post = urlencode_postdata(fields)
req = sanitized_Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(
req, video_id, 'Downloading video page')
keys = list(fields.keys())
chash_key = keys[0] if len(keys) == 1 else next(
key for key in keys if key.startswith('cha'))
fields[chash_key] = chash + fields[chash_key]
url = self._html_search_regex(r'url:\s*\'([^\']+)\'', webpage, 'URL')
webpage = self._download_webpage(
url, video_id, 'Downloading video page',
data=urlencode_postdata(fields),
headers={'Content-type': 'application/x-www-form-urlencoded'})
video_url = self._search_regex(
(r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>\s*Download File',
r'<a[^>]+href=(["\'])(?P<url>https?://(?:www\.)?promptfile\.com/file/(?:(?!\1).)+)\1'),
webpage, 'video url', group='url')
title = self._html_search_regex(
r'<span.+title="([^"]+)">', webpage, 'title')
thumbnail = self._html_search_regex(
@@ -49,7 +57,7 @@ class PromptFileIE(InfoExtractor):
formats = [{
'format_id': 'sd',
'url': url,
'url': video_url,
'ext': determine_ext(title),
}]
self._sort_formats(formats)

View File

@@ -122,7 +122,17 @@ class ProSiebenSat1BaseIE(InfoExtractor):
class ProSiebenSat1IE(ProSiebenSat1BaseIE):
IE_NAME = 'prosiebensat1'
IE_DESC = 'ProSiebenSat.1 Digital'
_VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
(?:
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
)\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de
)
/(?P<id>.+)
'''
_TESTS = [
{
@@ -290,6 +300,24 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'skip_download': True,
},
},
{
# geo restricted to Germany
'url': 'http://www.kabeleinsdoku.de/tv/mayday-alarm-im-cockpit/video/102-notlandung-im-hudson-river-ganze-folge',
'only_matching': True,
},
{
# geo restricted to Germany
'url': 'http://www.sat1gold.de/tv/edel-starck/video/11-staffel-1-episode-1-partner-wider-willen-ganze-folge',
'only_matching': True,
},
{
'url': 'http://www.sat1gold.de/tv/edel-starck/playlist/die-gesamte-1-staffel',
'only_matching': True,
},
{
'url': 'http://www.advopedia.de/videos/lenssen-klaert-auf/lenssen-klaert-auf-folge-8-staffel-3-feiertage-und-freie-tage',
'only_matching': True,
},
]
_TOKEN = 'prosieben'
@@ -361,19 +389,28 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex(
self._PLAYLIST_ID_REGEXES, webpage, 'playlist id')
for regex in self._PLAYLIST_CLIP_REGEXES:
playlist_clips = re.findall(regex, webpage)
if playlist_clips:
title = self._html_search_regex(
self._TITLE_REGEXES, webpage, 'title')
description = self._html_search_regex(
self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
entries = [
self.url_result(
re.match('(.+?//.+?)/', url).group(1) + clip_path,
'ProSiebenSat1')
for clip_path in playlist_clips]
return self.playlist_result(entries, playlist_id, title, description)
playlist = self._parse_json(
self._search_regex(
'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
webpage, 'playlist'),
playlist_id)
entries = []
for item in playlist:
clip_id = item.get('id') or item.get('upc')
if not clip_id:
continue
info = self._extract_video_info(url, clip_id)
info.update({
'id': clip_id,
'title': item.get('title') or item.get('teaser', {}).get('headline'),
'description': item.get('teaser', {}).get('description'),
'thumbnail': item.get('poster'),
'duration': float_or_none(item.get('duration')),
'series': item.get('tvShowTitle'),
'uploader': item.get('broadcastPublisher'),
})
entries.append(info)
return self.playlist_result(entries, playlist_id)
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -18,7 +18,7 @@ from ..utils import (
class QQMusicIE(InfoExtractor):
IE_NAME = 'qqmusic'
IE_DESC = 'QQ音乐'
_VALID_URL = r'https?://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
_VALID_URL = r'https?://y\.qq\.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
'md5': '9ce1c1c8445f561506d2e3cfb0255705',
@@ -172,7 +172,7 @@ class QQPlaylistBaseIE(InfoExtractor):
class QQMusicSingerIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:singer'
IE_DESC = 'QQ音乐 - 歌手'
_VALID_URL = r'https?://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
_VALID_URL = r'https?://y\.qq\.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = {
'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
'info_dict': {
@@ -217,7 +217,7 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album'
IE_DESC = 'QQ音乐 - 专辑'
_VALID_URL = r'https?://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_VALID_URL = r'https?://y\.qq\.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',

View File

@@ -13,6 +13,7 @@ from ..utils import (
xpath_element,
ExtractorError,
determine_protocol,
unsmuggle_url,
)
@@ -35,28 +36,51 @@ class RadioCanadaIE(InfoExtractor):
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
app_code, video_id = re.match(self._VALID_URL, url).groups()
device_types = ['ipad', 'android']
metadata = self._download_xml(
'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
video_id, note='Downloading metadata XML', query={
'appCode': app_code,
'idMedia': video_id,
})
def get_meta(name):
el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None
if get_meta('protectionType'):
raise ExtractorError('This video is DRM protected.', expected=True)
device_types = ['ipad']
if app_code != 'toutv':
device_types.append('flash')
if not smuggled_data:
device_types.append('android')
formats = []
# TODO: extract f4m formats
# f4m formats can be extracted using flashhd device_type but they produce unplayable file
for device_type in device_types:
v_data = self._download_xml(
'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx',
video_id, note='Downloading %s XML' % device_type, query={
'appCode': app_code,
'idMedia': video_id,
'connectionType': 'broadband',
'multibitrate': 'true',
'deviceType': device_type,
validation_url = 'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx'
query = {
'appCode': app_code,
'idMedia': video_id,
'connectionType': 'broadband',
'multibitrate': 'true',
'deviceType': device_type,
}
if smuggled_data:
validation_url = 'https://services.radio-canada.ca/media/validation/v2/'
query.update(smuggled_data)
else:
query.update({
# paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction
'paysJ391wsHjbOJwvCs26toz': 'CA',
'bypasslock': 'NZt5K62gRqfc',
}, fatal=False)
})
v_data = self._download_xml(validation_url, video_id, note='Downloading %s XML' % device_type, query=query, fatal=False)
v_url = xpath_text(v_data, 'url')
if not v_url:
continue
@@ -101,17 +125,6 @@ class RadioCanadaIE(InfoExtractor):
f4m_id='hds', fatal=False))
self._sort_formats(formats)
metadata = self._download_xml(
'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
video_id, note='Downloading metadata XML', query={
'appCode': app_code,
'idMedia': video_id,
})
def get_meta(name):
el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None
return {
'id': video_id,
'title': get_meta('Title'),

View File

@@ -5,7 +5,7 @@ from .internetvideoarchive import InternetVideoArchiveIE
class RottenTomatoesIE(InfoExtractor):
_VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_TEST = {
'url': 'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',

View File

@@ -7,7 +7,7 @@ from ..utils import unified_strdate, determine_ext
class RoxwelIE(InfoExtractor):
_VALID_URL = r'https?://www\.roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)'
_VALID_URL = r'https?://(?:www\.)?roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)'
_TEST = {
'url': 'http://www.roxwel.com/player/passionpittakeawalklive.html',

View File

@@ -64,7 +64,7 @@ def _decrypt_url(png):
class RTVEALaCartaIE(InfoExtractor):
IE_NAME = 'rtve.es:alacarta'
IE_DESC = 'RTVE a la carta'
_VALID_URL = r'https?://www\.rtve\.es/(m/)?(alacarta/videos|filmoteca)/[^/]+/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/(m/)?(alacarta/videos|filmoteca)/[^/]+/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
@@ -184,7 +184,7 @@ class RTVEInfantilIE(InfoExtractor):
class RTVELiveIE(InfoExtractor):
IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'https?://www\.rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
_TESTS = [{
'url': 'http://www.rtve.es/directo/la-1/',
@@ -226,7 +226,7 @@ class RTVELiveIE(InfoExtractor):
class RTVETelevisionIE(InfoExtractor):
IE_NAME = 'rtve.es:television'
_VALID_URL = r'https?://www\.rtve\.es/television/[^/]+/[^/]+/(?P<id>\d+).shtml'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/television/[^/]+/[^/]+/(?P<id>\d+).shtml'
_TEST = {
'url': 'http://www.rtve.es/television/20160628/revolucion-del-movil/1364141.shtml',

Some files were not shown because too many files have changed in this diff Show More