mirror of
https://github.com/ytdl-org/youtube-dl
synced 2025-09-29 21:08:37 +09:00
Compare commits
73 Commits
2016.07.03
...
2016.07.09
Author | SHA1 | Date | |
---|---|---|---|
![]() |
ce43100a01 | ||
![]() |
8cc9b4016d | ||
![]() |
31eeab9f41 | ||
![]() |
9558dcec9c | ||
![]() |
6e6b70d65f | ||
![]() |
d417fd88d0 | ||
![]() |
9e4f5dc1e9 | ||
![]() |
1251565ee0 | ||
![]() |
1f7258a367 | ||
![]() |
0af985069b | ||
![]() |
0de168f7ed | ||
![]() |
95b31e266b | ||
![]() |
6b3a3098b5 | ||
![]() |
2de624fdd5 | ||
![]() |
3fee7f636c | ||
![]() |
89e2fff2b7 | ||
![]() |
cedc70b292 | ||
![]() |
07d7689f2e | ||
![]() |
ae8cb5328d | ||
![]() |
2e32ac0b9a | ||
![]() |
672f01c370 | ||
![]() |
e2d616dd30 | ||
![]() |
0ab7f4fe2b | ||
![]() |
29c4a07776 | ||
![]() |
826e911e41 | ||
![]() |
30d22dae8e | ||
![]() |
ec3518725b | ||
![]() |
5f87d845eb | ||
![]() |
571808a7aa | ||
![]() |
dfe5fa49ae | ||
![]() |
01a0c511eb | ||
![]() |
b3d30315ce | ||
![]() |
882af14d7d | ||
![]() |
47335a0efa | ||
![]() |
34bc2d9dfd | ||
![]() |
08c7af4afa | ||
![]() |
f7291a0b7c | ||
![]() |
c65aa4e9e1 | ||
![]() |
ad213a1d74 | ||
![]() |
43f1e4e41e | ||
![]() |
54b0e909d5 | ||
![]() |
f8752b86ac | ||
![]() |
84c237fb8a | ||
![]() |
ab49d7a9fa | ||
![]() |
b4173f1551 | ||
![]() |
2817b99cf2 | ||
![]() |
001fffd004 | ||
![]() |
0e94b4713d | ||
![]() |
a6d3b89feb | ||
![]() |
6c26815d63 | ||
![]() |
73c4ac2c95 | ||
![]() |
84f214d840 | ||
![]() |
e3f88be7a9 | ||
![]() |
31af3e35e0 | ||
![]() |
94a5cff91d | ||
![]() |
77082c7b9e | ||
![]() |
252a1f75d2 | ||
![]() |
5abf513cf8 | ||
![]() |
c6054e3201 | ||
![]() |
4080530624 | ||
![]() |
c25f1a9b63 | ||
![]() |
dfaa86b75e | ||
![]() |
d9163ae3b6 | ||
![]() |
dafafe7cf1 | ||
![]() |
81953d1ae5 | ||
![]() |
3a212ed62e | ||
![]() |
195f084542 | ||
![]() |
aa7a455b2e | ||
![]() |
6a4e659c93 | ||
![]() |
40f3666f6b | ||
![]() |
dd801bbe18 | ||
![]() |
38cce791c7 | ||
![]() |
bf3ae6a543 |
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@@ -6,8 +6,8 @@
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.03.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.03.1**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.09.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.09.1**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2016.07.03.1
|
||||
[debug] youtube-dl version 2016.07.09.1
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
@@ -103,9 +103,9 @@ which means you can modify it, redistribute it or use it however you like.
|
||||
(experimental)
|
||||
-6, --force-ipv6 Make all connections via IPv6
|
||||
(experimental)
|
||||
--cn-verification-proxy URL Use this proxy to verify the IP address for
|
||||
some Chinese sites. The default proxy
|
||||
specified by --proxy (or none, if the
|
||||
--geo-verification-proxy URL Use this proxy to verify the IP address for
|
||||
some geo-restricted sites. The default
|
||||
proxy specified by --proxy (or none, if the
|
||||
options is not present) is used for the
|
||||
actual downloading. (experimental)
|
||||
|
||||
@@ -424,7 +424,7 @@ which means you can modify it, redistribute it or use it however you like.
|
||||
|
||||
# CONFIGURATION
|
||||
|
||||
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
|
||||
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
|
||||
|
||||
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
|
||||
```
|
||||
@@ -432,6 +432,7 @@ For example, with the following configuration file youtube-dl will always extrac
|
||||
--no-mtime
|
||||
--proxy 127.0.0.1:3128
|
||||
-o ~/Movies/%(title)s.%(ext)s
|
||||
# Lines starting with # are comments
|
||||
```
|
||||
|
||||
Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
|
||||
|
41
devscripts/show-downloads-statistics.py
Normal file
41
devscripts/show-downloads-statistics.py
Normal file
@@ -0,0 +1,41 @@
|
||||
#!/usr/bin/env python
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from youtube_dl.compat import (
|
||||
compat_print,
|
||||
compat_urllib_request,
|
||||
)
|
||||
from youtube_dl.utils import format_bytes
|
||||
|
||||
|
||||
def format_size(bytes):
|
||||
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
|
||||
|
||||
|
||||
total_bytes = 0
|
||||
|
||||
releases = json.loads(compat_urllib_request.urlopen(
|
||||
'https://api.github.com/repos/rg3/youtube-dl/releases').read().decode('utf-8'))
|
||||
|
||||
for release in releases:
|
||||
compat_print(release['name'])
|
||||
for asset in release['assets']:
|
||||
asset_name = asset['name']
|
||||
total_bytes += asset['download_count'] * asset['size']
|
||||
if all(not re.match(p, asset_name) for p in (
|
||||
r'^youtube-dl$',
|
||||
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
|
||||
r'^youtube-dl\.exe$')):
|
||||
continue
|
||||
compat_print(
|
||||
' %s size: %s downloads: %d'
|
||||
% (asset_name, format_size(asset['size']), asset['download_count']))
|
||||
|
||||
compat_print('total downloads traffic: %s' % format_size(total_bytes))
|
@@ -224,6 +224,7 @@
|
||||
- **Firstpost**
|
||||
- **FiveTV**
|
||||
- **Flickr**
|
||||
- **Flipagram**
|
||||
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
|
||||
- **FootyRoom**
|
||||
- **Formula1**
|
||||
@@ -312,6 +313,7 @@
|
||||
- **jpopsuki.tv**
|
||||
- **JWPlatform**
|
||||
- **Kaltura**
|
||||
- **Kamcord**
|
||||
- **KanalPlay**: Kanal 5/9/11 Play
|
||||
- **Kankan**
|
||||
- **Karaoketv**
|
||||
@@ -476,6 +478,8 @@
|
||||
- **Odnoklassniki**
|
||||
- **OktoberfestTV**
|
||||
- **on.aol.com**
|
||||
- **onet.tv**
|
||||
- **onet.tv:channel**
|
||||
- **OnionStudios**
|
||||
- **Ooyala**
|
||||
- **OoyalaExternal**
|
||||
@@ -857,6 +861,7 @@
|
||||
- **youtube:search**: YouTube.com searches
|
||||
- **youtube:search:date**: YouTube.com searches, newest videos first
|
||||
- **youtube:search_url**: YouTube.com search URLs
|
||||
- **youtube:shared**
|
||||
- **youtube:show**: YouTube.com (multi-season) shows
|
||||
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
|
||||
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
|
||||
|
@@ -88,6 +88,7 @@ class TestCompat(unittest.TestCase):
|
||||
def test_compat_shlex_split(self):
|
||||
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
|
||||
self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
|
||||
self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
|
||||
|
||||
def test_compat_etree_fromstring(self):
|
||||
xml = '''
|
||||
|
@@ -138,27 +138,27 @@ class TestProxy(unittest.TestCase):
|
||||
self.proxy_thread.daemon = True
|
||||
self.proxy_thread.start()
|
||||
|
||||
self.cn_proxy = compat_http_server.HTTPServer(
|
||||
('localhost', 0), _build_proxy_handler('cn'))
|
||||
self.cn_port = http_server_port(self.cn_proxy)
|
||||
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever)
|
||||
self.cn_proxy_thread.daemon = True
|
||||
self.cn_proxy_thread.start()
|
||||
self.geo_proxy = compat_http_server.HTTPServer(
|
||||
('localhost', 0), _build_proxy_handler('geo'))
|
||||
self.geo_port = http_server_port(self.geo_proxy)
|
||||
self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
|
||||
self.geo_proxy_thread.daemon = True
|
||||
self.geo_proxy_thread.start()
|
||||
|
||||
def test_proxy(self):
|
||||
cn_proxy = 'localhost:{0}'.format(self.cn_port)
|
||||
geo_proxy = 'localhost:{0}'.format(self.geo_port)
|
||||
ydl = YoutubeDL({
|
||||
'proxy': 'localhost:{0}'.format(self.port),
|
||||
'cn_verification_proxy': cn_proxy,
|
||||
'geo_verification_proxy': geo_proxy,
|
||||
})
|
||||
url = 'http://foo.com/bar'
|
||||
response = ydl.urlopen(url).read().decode('utf-8')
|
||||
self.assertEqual(response, 'normal: {0}'.format(url))
|
||||
|
||||
req = compat_urllib_request.Request(url)
|
||||
req.add_header('Ytdl-request-proxy', cn_proxy)
|
||||
req.add_header('Ytdl-request-proxy', geo_proxy)
|
||||
response = ydl.urlopen(req).read().decode('utf-8')
|
||||
self.assertEqual(response, 'cn: {0}'.format(url))
|
||||
self.assertEqual(response, 'geo: {0}'.format(url))
|
||||
|
||||
def test_proxy_with_idn(self):
|
||||
ydl = YoutubeDL({
|
||||
|
@@ -33,6 +33,7 @@ from youtube_dl.utils import (
|
||||
ExtractorError,
|
||||
find_xpath_attr,
|
||||
fix_xml_ampersands,
|
||||
get_element_by_class,
|
||||
InAdvancePagedList,
|
||||
intlist_to_bytes,
|
||||
is_html,
|
||||
@@ -405,6 +406,12 @@ class TestUtil(unittest.TestCase):
|
||||
self.assertEqual(res_url, url)
|
||||
self.assertEqual(res_data, None)
|
||||
|
||||
smug_url = smuggle_url(url, {'a': 'b'})
|
||||
smug_smug_url = smuggle_url(smug_url, {'c': 'd'})
|
||||
res_url, res_data = unsmuggle_url(smug_smug_url)
|
||||
self.assertEqual(res_url, url)
|
||||
self.assertEqual(res_data, {'a': 'b', 'c': 'd'})
|
||||
|
||||
def test_shell_quote(self):
|
||||
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
|
||||
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
|
||||
@@ -985,5 +992,13 @@ The first line
|
||||
self.assertEqual(urshift(3, 1), 1)
|
||||
self.assertEqual(urshift(-3, 1), 2147483646)
|
||||
|
||||
def test_get_element_by_class(self):
|
||||
html = '''
|
||||
<span class="foo bar">nice</span>
|
||||
'''
|
||||
|
||||
self.assertEqual(get_element_by_class('foo', html), 'nice')
|
||||
self.assertEqual(get_element_by_class('no-such-class', html), None)
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
|
@@ -196,8 +196,8 @@ class YoutubeDL(object):
|
||||
prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
|
||||
At the moment, this is only supported by YouTube.
|
||||
proxy: URL of the proxy server to use
|
||||
cn_verification_proxy: URL of the proxy to use for IP address verification
|
||||
on Chinese sites. (Experimental)
|
||||
geo_verification_proxy: URL of the proxy to use for IP address verification
|
||||
on geo-restricted sites. (Experimental)
|
||||
socket_timeout: Time to wait for unresponsive hosts, in seconds
|
||||
bidi_workaround: Work around buggy terminals without bidirectional text
|
||||
support, using fridibi
|
||||
@@ -304,6 +304,11 @@ class YoutubeDL(object):
|
||||
self.params.update(params)
|
||||
self.cache = Cache(self)
|
||||
|
||||
if self.params.get('cn_verification_proxy') is not None:
|
||||
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
|
||||
if self.params.get('geo_verification_proxy') is None:
|
||||
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
|
||||
|
||||
if params.get('bidi_workaround', False):
|
||||
try:
|
||||
import pty
|
||||
|
@@ -382,6 +382,8 @@ def _real_main(argv=None):
|
||||
'external_downloader_args': external_downloader_args,
|
||||
'postprocessor_args': postprocessor_args,
|
||||
'cn_verification_proxy': opts.cn_verification_proxy,
|
||||
'geo_verification_proxy': opts.geo_verification_proxy,
|
||||
|
||||
}
|
||||
|
||||
with YoutubeDL(ydl_opts) as ydl:
|
||||
|
@@ -1,3 +1,4 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import binascii
|
||||
@@ -2594,15 +2595,19 @@ except ImportError: # Python < 3.3
|
||||
return "'" + s.replace("'", "'\"'\"'") + "'"
|
||||
|
||||
|
||||
if sys.version_info >= (2, 7, 3):
|
||||
try:
|
||||
args = shlex.split('中文')
|
||||
assert (isinstance(args, list) and
|
||||
isinstance(args[0], compat_str) and
|
||||
args[0] == '中文')
|
||||
compat_shlex_split = shlex.split
|
||||
else:
|
||||
except (AssertionError, UnicodeEncodeError):
|
||||
# Working around shlex issue with unicode strings on some python 2
|
||||
# versions (see http://bugs.python.org/issue1548891)
|
||||
def compat_shlex_split(s, comments=False, posix=True):
|
||||
if isinstance(s, compat_str):
|
||||
s = s.encode('utf-8')
|
||||
return shlex.split(s, comments, posix)
|
||||
return list(map(lambda s: s.decode('utf-8'), shlex.split(s, comments, posix)))
|
||||
|
||||
|
||||
def compat_ord(c):
|
||||
|
@@ -5,6 +5,8 @@ from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
mimetype2ext,
|
||||
determine_ext,
|
||||
)
|
||||
|
||||
|
||||
@@ -50,21 +52,25 @@ class AMPIE(InfoExtractor):
|
||||
if isinstance(media_content, dict):
|
||||
media_content = [media_content]
|
||||
for media_data in media_content:
|
||||
media = media_data['@attributes']
|
||||
media_type = media['type']
|
||||
if media_type in ('video/f4m', 'application/f4m+xml'):
|
||||
media = media_data.get('@attributes', {})
|
||||
media_url = media.get('url')
|
||||
if not media_url:
|
||||
continue
|
||||
ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
|
||||
if ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
|
||||
media_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
elif media_type == 'application/x-mpegURL':
|
||||
elif ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||
media_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
formats.append({
|
||||
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
|
||||
'url': media['url'],
|
||||
'tbr': int_or_none(media.get('bitrate')),
|
||||
'filesize': int_or_none(media.get('fileSize')),
|
||||
'ext': ext,
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
@@ -22,6 +22,7 @@ class AnimeOnDemandIE(InfoExtractor):
|
||||
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
|
||||
_NETRC_MACHINE = 'animeondemand'
|
||||
_TESTS = [{
|
||||
# jap, OmU
|
||||
'url': 'https://www.anime-on-demand.de/anime/161',
|
||||
'info_dict': {
|
||||
'id': '161',
|
||||
@@ -30,17 +31,21 @@ class AnimeOnDemandIE(InfoExtractor):
|
||||
},
|
||||
'playlist_mincount': 4,
|
||||
}, {
|
||||
# Film wording is used instead of Episode
|
||||
# Film wording is used instead of Episode, ger/jap, Dub/OmU
|
||||
'url': 'https://www.anime-on-demand.de/anime/39',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# Episodes without titles
|
||||
# Episodes without titles, jap, OmU
|
||||
'url': 'https://www.anime-on-demand.de/anime/162',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# ger/jap, Dub/OmU, account required
|
||||
'url': 'https://www.anime-on-demand.de/anime/169',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# Full length film, non-series, ger/jap, Dub/OmU, account required
|
||||
'url': 'https://www.anime-on-demand.de/anime/185',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _login(self):
|
||||
@@ -110,35 +115,12 @@ class AnimeOnDemandIE(InfoExtractor):
|
||||
|
||||
entries = []
|
||||
|
||||
for num, episode_html in enumerate(re.findall(
|
||||
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
|
||||
episodebox_title = self._search_regex(
|
||||
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
|
||||
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
|
||||
episode_html, 'episodebox title', default=None, group='title')
|
||||
if not episodebox_title:
|
||||
continue
|
||||
|
||||
episode_number = int(self._search_regex(
|
||||
r'(?:Episode|Film)\s*(\d+)',
|
||||
episodebox_title, 'episode number', default=num))
|
||||
episode_title = self._search_regex(
|
||||
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
|
||||
episodebox_title, 'episode title', default=None)
|
||||
|
||||
video_id = 'episode-%d' % episode_number
|
||||
|
||||
common_info = {
|
||||
'id': video_id,
|
||||
'series': anime_title,
|
||||
'episode': episode_title,
|
||||
'episode_number': episode_number,
|
||||
}
|
||||
|
||||
def extract_info(html, video_id, num=None):
|
||||
title, description = [None] * 2
|
||||
formats = []
|
||||
|
||||
for input_ in re.findall(
|
||||
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
|
||||
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
|
||||
attributes = extract_attributes(input_)
|
||||
playlist_urls = []
|
||||
for playlist_key in ('data-playlist', 'data-otherplaylist'):
|
||||
@@ -161,7 +143,7 @@ class AnimeOnDemandIE(InfoExtractor):
|
||||
format_id_list.append(lang)
|
||||
if kind:
|
||||
format_id_list.append(kind)
|
||||
if not format_id_list:
|
||||
if not format_id_list and num is not None:
|
||||
format_id_list.append(compat_str(num))
|
||||
format_id = '-'.join(format_id_list)
|
||||
format_note = ', '.join(filter(None, (kind, lang_note)))
|
||||
@@ -215,28 +197,74 @@ class AnimeOnDemandIE(InfoExtractor):
|
||||
})
|
||||
formats.extend(file_formats)
|
||||
|
||||
if formats:
|
||||
self._sort_formats(formats)
|
||||
return {
|
||||
'title': title,
|
||||
'description': description,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
def extract_entries(html, video_id, common_info, num):
|
||||
info = extract_info(html, video_id, num)
|
||||
|
||||
if info['formats']:
|
||||
self._sort_formats(info['formats'])
|
||||
f = common_info.copy()
|
||||
f.update({
|
||||
'title': title,
|
||||
'description': description,
|
||||
'formats': formats,
|
||||
})
|
||||
f.update(info)
|
||||
entries.append(f)
|
||||
|
||||
# Extract teaser only when full episode is not available
|
||||
if not formats:
|
||||
# Extract teaser/trailer only when full episode is not available
|
||||
if not info['formats']:
|
||||
m = re.search(
|
||||
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
|
||||
episode_html)
|
||||
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<',
|
||||
html)
|
||||
if m:
|
||||
f = common_info.copy()
|
||||
f.update({
|
||||
'id': '%s-teaser' % f['id'],
|
||||
'id': '%s-%s' % (f['id'], m.group('kind').lower()),
|
||||
'title': m.group('title'),
|
||||
'url': compat_urlparse.urljoin(url, m.group('href')),
|
||||
})
|
||||
entries.append(f)
|
||||
|
||||
def extract_episodes(html):
|
||||
for num, episode_html in enumerate(re.findall(
|
||||
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
|
||||
episodebox_title = self._search_regex(
|
||||
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
|
||||
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
|
||||
episode_html, 'episodebox title', default=None, group='title')
|
||||
if not episodebox_title:
|
||||
continue
|
||||
|
||||
episode_number = int(self._search_regex(
|
||||
r'(?:Episode|Film)\s*(\d+)',
|
||||
episodebox_title, 'episode number', default=num))
|
||||
episode_title = self._search_regex(
|
||||
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
|
||||
episodebox_title, 'episode title', default=None)
|
||||
|
||||
video_id = 'episode-%d' % episode_number
|
||||
|
||||
common_info = {
|
||||
'id': video_id,
|
||||
'series': anime_title,
|
||||
'episode': episode_title,
|
||||
'episode_number': episode_number,
|
||||
}
|
||||
|
||||
extract_entries(episode_html, video_id, common_info)
|
||||
|
||||
def extract_film(html, video_id):
|
||||
common_info = {
|
||||
'id': anime_id,
|
||||
'title': anime_title,
|
||||
'description': anime_description,
|
||||
}
|
||||
extract_entries(html, video_id, common_info)
|
||||
|
||||
extract_episodes(webpage)
|
||||
|
||||
if not entries:
|
||||
extract_film(webpage, anime_id)
|
||||
|
||||
return self.playlist_result(entries, anime_id, anime_title, anime_description)
|
||||
|
@@ -13,6 +13,7 @@ from ..utils import (
|
||||
parse_duration,
|
||||
unified_strdate,
|
||||
xpath_text,
|
||||
update_url_query,
|
||||
)
|
||||
from ..compat import compat_etree_fromstring
|
||||
|
||||
@@ -34,6 +35,7 @@ class ARDMediathekIE(InfoExtractor):
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'HTTP Error 404: Not Found',
|
||||
}, {
|
||||
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
|
||||
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
|
||||
@@ -44,6 +46,7 @@ class ARDMediathekIE(InfoExtractor):
|
||||
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
|
||||
'duration': 5252,
|
||||
},
|
||||
'skip': 'HTTP Error 404: Not Found',
|
||||
}, {
|
||||
# audio
|
||||
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
|
||||
@@ -55,6 +58,7 @@ class ARDMediathekIE(InfoExtractor):
|
||||
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
|
||||
'duration': 3240,
|
||||
},
|
||||
'skip': 'HTTP Error 404: Not Found',
|
||||
}, {
|
||||
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
|
||||
'only_matching': True,
|
||||
@@ -113,11 +117,14 @@ class ARDMediathekIE(InfoExtractor):
|
||||
continue
|
||||
if ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
|
||||
video_id, preference=-1, f4m_id='hds', fatal=False))
|
||||
update_url_query(stream_url, {
|
||||
'hdcore': '3.1.1',
|
||||
'plugin': 'aasp-3.1.1.69.124'
|
||||
}),
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
elif ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False))
|
||||
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
if server and server.startswith('rtmp'):
|
||||
f = {
|
||||
@@ -231,7 +238,8 @@ class ARDIE(InfoExtractor):
|
||||
'title': 'Die Story im Ersten: Mission unter falscher Flagge',
|
||||
'upload_date': '20140804',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
}
|
||||
},
|
||||
'skip': 'HTTP Error 404: Not Found',
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@@ -90,6 +90,7 @@ class BrightcoveLegacyIE(InfoExtractor):
|
||||
'description': 'md5:363109c02998fee92ec02211bd8000df',
|
||||
'uploader': 'National Ballet of Canada',
|
||||
},
|
||||
'skip': 'Video gone',
|
||||
},
|
||||
{
|
||||
# test flv videos served by akamaihd.net
|
||||
@@ -108,7 +109,7 @@ class BrightcoveLegacyIE(InfoExtractor):
|
||||
},
|
||||
},
|
||||
{
|
||||
# playlist test
|
||||
# playlist with 'videoList'
|
||||
# from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players
|
||||
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
|
||||
'info_dict': {
|
||||
@@ -117,6 +118,15 @@ class BrightcoveLegacyIE(InfoExtractor):
|
||||
},
|
||||
'playlist_mincount': 7,
|
||||
},
|
||||
{
|
||||
# playlist with 'playlistTab' (https://github.com/rg3/youtube-dl/issues/9965)
|
||||
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
|
||||
'info_dict': {
|
||||
'id': '1522758701001',
|
||||
'title': 'Lesson 08',
|
||||
},
|
||||
'playlist_mincount': 10,
|
||||
},
|
||||
]
|
||||
FLV_VCODECS = {
|
||||
1: 'SORENSON',
|
||||
@@ -298,13 +308,19 @@ class BrightcoveLegacyIE(InfoExtractor):
|
||||
info_url, player_key, 'Downloading playlist information')
|
||||
|
||||
json_data = json.loads(playlist_info)
|
||||
if 'videoList' not in json_data:
|
||||
if 'videoList' in json_data:
|
||||
playlist_info = json_data['videoList']
|
||||
playlist_dto = playlist_info['mediaCollectionDTO']
|
||||
elif 'playlistTabs' in json_data:
|
||||
playlist_info = json_data['playlistTabs']
|
||||
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
|
||||
else:
|
||||
raise ExtractorError('Empty playlist')
|
||||
playlist_info = json_data['videoList']
|
||||
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
|
||||
|
||||
videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
|
||||
|
||||
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
|
||||
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
|
||||
playlist_title=playlist_dto['displayName'])
|
||||
|
||||
def _extract_video_info(self, video_info):
|
||||
video_id = compat_str(video_info['id'])
|
||||
@@ -585,6 +601,13 @@ class BrightcoveNewIE(InfoExtractor):
|
||||
'format_id': build_format_id('rtmp'),
|
||||
})
|
||||
formats.append(f)
|
||||
|
||||
errors = json_data.get('errors')
|
||||
if not formats and errors:
|
||||
error = errors[0]
|
||||
raise ExtractorError(
|
||||
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
|
@@ -1,16 +1,10 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
)
|
||||
from .onet import OnetBaseIE
|
||||
|
||||
|
||||
class ClipRsIE(InfoExtractor):
|
||||
class ClipRsIE(OnetBaseIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
|
||||
_TEST = {
|
||||
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
|
||||
@@ -27,64 +21,13 @@ class ClipRsIE(InfoExtractor):
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
display_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
|
||||
mvp_id = self._search_mvp_id(webpage)
|
||||
|
||||
response = self._download_json(
|
||||
'http://qi.ckm.onetapi.pl/', video_id,
|
||||
query={
|
||||
'body[id]': video_id,
|
||||
'body[jsonrpc]': '2.0',
|
||||
'body[method]': 'get_asset_detail',
|
||||
'body[params][ID_Publikacji]': video_id,
|
||||
'body[params][Service]': 'www.onet.pl',
|
||||
'content-type': 'application/jsonp',
|
||||
'x-onet-app': 'player.front.onetapi.pl',
|
||||
})
|
||||
info_dict = self._extract_from_id(mvp_id, webpage)
|
||||
info_dict['display_id'] = display_id
|
||||
|
||||
error = response.get('error')
|
||||
if error:
|
||||
raise ExtractorError(
|
||||
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
|
||||
|
||||
video = response['result'].get('0')
|
||||
|
||||
formats = []
|
||||
for _, formats_dict in video['formats'].items():
|
||||
if not isinstance(formats_dict, dict):
|
||||
continue
|
||||
for format_id, format_list in formats_dict.items():
|
||||
if not isinstance(format_list, list):
|
||||
continue
|
||||
for f in format_list:
|
||||
if not f.get('url'):
|
||||
continue
|
||||
formats.append({
|
||||
'url': f['url'],
|
||||
'format_id': format_id,
|
||||
'height': int_or_none(f.get('vertical_resolution')),
|
||||
'width': int_or_none(f.get('horizontal_resolution')),
|
||||
'abr': float_or_none(f.get('audio_bitrate')),
|
||||
'vbr': float_or_none(f.get('video_bitrate')),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
meta = video.get('meta', {})
|
||||
|
||||
title = self._og_search_title(webpage, default=None) or meta['title']
|
||||
description = self._og_search_description(webpage, default=None) or meta.get('description')
|
||||
duration = meta.get('length') or meta.get('lenght')
|
||||
timestamp = parse_iso8601(meta.get('addDate'), ' ')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'timestamp': timestamp,
|
||||
'formats': formats,
|
||||
}
|
||||
return info_dict
|
||||
|
@@ -44,6 +44,7 @@ from ..utils import (
|
||||
sanitized_Request,
|
||||
unescapeHTML,
|
||||
unified_strdate,
|
||||
unified_timestamp,
|
||||
url_basename,
|
||||
xpath_element,
|
||||
xpath_text,
|
||||
@@ -161,6 +162,7 @@ class InfoExtractor(object):
|
||||
* "height" (optional, int)
|
||||
* "resolution" (optional, string "{width}x{height"},
|
||||
deprecated)
|
||||
* "filesize" (optional, int)
|
||||
thumbnail: Full URL to a video thumbnail image.
|
||||
description: Full video description.
|
||||
uploader: Full name of the video uploader.
|
||||
@@ -803,15 +805,17 @@ class InfoExtractor(object):
|
||||
return self._html_search_meta('twitter:player', html,
|
||||
'twitter card player')
|
||||
|
||||
def _search_json_ld(self, html, video_id, **kwargs):
|
||||
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
|
||||
json_ld = self._search_regex(
|
||||
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
|
||||
html, 'JSON-LD', group='json_ld', **kwargs)
|
||||
if not json_ld:
|
||||
return {}
|
||||
return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True))
|
||||
return self._json_ld(
|
||||
json_ld, video_id, fatal=kwargs.get('fatal', True),
|
||||
expected_type=expected_type)
|
||||
|
||||
def _json_ld(self, json_ld, video_id, fatal=True):
|
||||
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
|
||||
if isinstance(json_ld, compat_str):
|
||||
json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
|
||||
if not json_ld:
|
||||
@@ -819,6 +823,8 @@ class InfoExtractor(object):
|
||||
info = {}
|
||||
if json_ld.get('@context') == 'http://schema.org':
|
||||
item_type = json_ld.get('@type')
|
||||
if expected_type is not None and expected_type != item_type:
|
||||
return info
|
||||
if item_type == 'TVEpisode':
|
||||
info.update({
|
||||
'episode': unescapeHTML(json_ld.get('name')),
|
||||
@@ -837,6 +843,19 @@ class InfoExtractor(object):
|
||||
'title': unescapeHTML(json_ld.get('headline')),
|
||||
'description': unescapeHTML(json_ld.get('articleBody')),
|
||||
})
|
||||
elif item_type == 'VideoObject':
|
||||
info.update({
|
||||
'url': json_ld.get('contentUrl'),
|
||||
'title': unescapeHTML(json_ld.get('name')),
|
||||
'description': unescapeHTML(json_ld.get('description')),
|
||||
'thumbnail': json_ld.get('thumbnailUrl'),
|
||||
'duration': parse_duration(json_ld.get('duration')),
|
||||
'timestamp': unified_timestamp(json_ld.get('uploadDate')),
|
||||
'filesize': float_or_none(json_ld.get('contentSize')),
|
||||
'tbr': int_or_none(json_ld.get('bitrate')),
|
||||
'width': int_or_none(json_ld.get('width')),
|
||||
'height': int_or_none(json_ld.get('height')),
|
||||
})
|
||||
return dict((k, v) for k, v in info.items() if v is not None)
|
||||
|
||||
@staticmethod
|
||||
@@ -1729,6 +1748,13 @@ class InfoExtractor(object):
|
||||
def _mark_watched(self, *args, **kwargs):
|
||||
raise NotImplementedError('This method must be implemented by subclasses')
|
||||
|
||||
def geo_verification_headers(self):
|
||||
headers = {}
|
||||
geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
|
||||
if geo_verification_proxy:
|
||||
headers['Ytdl-request-proxy'] = geo_verification_proxy
|
||||
return headers
|
||||
|
||||
|
||||
class SearchInfoExtractor(InfoExtractor):
|
||||
"""
|
||||
|
@@ -16,6 +16,7 @@ from ..utils import (
|
||||
sanitized_Request,
|
||||
str_to_int,
|
||||
unescapeHTML,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
@@ -111,6 +112,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
|
||||
}
|
||||
]
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
# Look for embedded Dailymotion player
|
||||
matches = re.findall(
|
||||
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
|
||||
return list(map(lambda m: unescapeHTML(m[1]), matches))
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
@@ -153,18 +161,19 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
|
||||
type_ = media.get('type')
|
||||
if type_ == 'application/vnd.lumberjack.manifest':
|
||||
continue
|
||||
ext = determine_ext(media_url)
|
||||
if type_ == 'application/x-mpegURL' or ext == 'm3u8':
|
||||
ext = mimetype2ext(type_) or determine_ext(media_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
media_url, video_id, 'mp4', preference=-1,
|
||||
m3u8_id='hls', fatal=False))
|
||||
elif type_ == 'application/f4m' or ext == 'f4m':
|
||||
elif ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
|
||||
else:
|
||||
f = {
|
||||
'url': media_url,
|
||||
'format_id': 'http-%s' % quality,
|
||||
'ext': ext,
|
||||
}
|
||||
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
|
||||
if m:
|
||||
|
@@ -66,22 +66,32 @@ class DaumIE(InfoExtractor):
|
||||
'view_count': int,
|
||||
'comment_count': int,
|
||||
},
|
||||
}, {
|
||||
# Requires dte_type=WEB (#9972)
|
||||
'url': 'http://tvpot.daum.net/v/s3794Uf1NZeZ1qMpGpeqeRU',
|
||||
'md5': 'a8917742069a4dd442516b86e7d66529',
|
||||
'info_dict': {
|
||||
'id': 's3794Uf1NZeZ1qMpGpeqeRU',
|
||||
'ext': 'mp4',
|
||||
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
|
||||
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
|
||||
'upload_date': '20160611',
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = compat_urllib_parse_unquote(self._match_id(url))
|
||||
query = compat_urllib_parse_urlencode({'vid': video_id})
|
||||
movie_data = self._download_json(
|
||||
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
|
||||
video_id, 'Downloading video formats info')
|
||||
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
|
||||
video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
|
||||
|
||||
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
|
||||
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
|
||||
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
|
||||
|
||||
info = self._download_xml(
|
||||
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
|
||||
'Downloading video info')
|
||||
'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
|
||||
'Downloading video info', query={'vid': video_id})
|
||||
|
||||
formats = []
|
||||
for format_el in movie_data['output_list']['output_list']:
|
||||
|
@@ -139,9 +139,9 @@ from .chirbit import (
|
||||
ChirbitProfileIE,
|
||||
)
|
||||
from .cinchcast import CinchcastIE
|
||||
from .cliprs import ClipRsIE
|
||||
from .clipfish import ClipfishIE
|
||||
from .cliphunter import CliphunterIE
|
||||
from .cliprs import ClipRsIE
|
||||
from .clipsyndicate import ClipsyndicateIE
|
||||
from .closertotruth import CloserToTruthIE
|
||||
from .cloudy import CloudyIE
|
||||
@@ -256,6 +256,7 @@ from .fivemin import FiveMinIE
|
||||
from .fivetv import FiveTVIE
|
||||
from .fktv import FKTVIE
|
||||
from .flickr import FlickrIE
|
||||
from .flipagram import FlipagramIE
|
||||
from .folketinget import FolketingetIE
|
||||
from .footyroom import FootyRoomIE
|
||||
from .formula1 import Formula1IE
|
||||
@@ -368,6 +369,7 @@ from .jove import JoveIE
|
||||
from .jwplatform import JWPlatformIE
|
||||
from .jpopsukitv import JpopsukiIE
|
||||
from .kaltura import KalturaIE
|
||||
from .kamcord import KamcordIE
|
||||
from .kanalplay import KanalPlayIE
|
||||
from .kankan import KankanIE
|
||||
from .karaoketv import KaraoketvIE
|
||||
@@ -581,6 +583,10 @@ from .nytimes import (
|
||||
from .nuvid import NuvidIE
|
||||
from .odnoklassniki import OdnoklassnikiIE
|
||||
from .oktoberfesttv import OktoberfestTVIE
|
||||
from .onet import (
|
||||
OnetIE,
|
||||
OnetChannelIE,
|
||||
)
|
||||
from .onionstudios import OnionStudiosIE
|
||||
from .ooyala import (
|
||||
OoyalaIE,
|
||||
@@ -1066,6 +1072,7 @@ from .youtube import (
|
||||
YoutubeSearchDateIE,
|
||||
YoutubeSearchIE,
|
||||
YoutubeSearchURLIE,
|
||||
YoutubeSharedVideoIE,
|
||||
YoutubeShowIE,
|
||||
YoutubeSubscriptionsIE,
|
||||
YoutubeTruncatedIDIE,
|
||||
|
@@ -219,12 +219,23 @@ class FacebookIE(InfoExtractor):
|
||||
|
||||
BEFORE = '{swf.addParam(param[0], param[1]);});'
|
||||
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
|
||||
m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage)
|
||||
if m:
|
||||
swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"')
|
||||
PATTERN = re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER)
|
||||
|
||||
for m in re.findall(PATTERN, webpage):
|
||||
swf_params = m.replace('\\\\', '\\').replace('\\"', '"')
|
||||
data = dict(json.loads(swf_params))
|
||||
params_raw = compat_urllib_parse_unquote(data['params'])
|
||||
video_data = json.loads(params_raw)['video_data']
|
||||
video_data_candidate = json.loads(params_raw)['video_data']
|
||||
for _, f in video_data_candidate.items():
|
||||
if not f:
|
||||
continue
|
||||
if isinstance(f, dict):
|
||||
f = [f]
|
||||
if isinstance(f, list):
|
||||
continue
|
||||
if f[0].get('video_id') == video_id:
|
||||
video_data = video_data_candidate
|
||||
break
|
||||
|
||||
def video_data_list2dict(video_data):
|
||||
ret = {}
|
||||
|
115
youtube_dl/extractor/flipagram.py
Normal file
115
youtube_dl/extractor/flipagram.py
Normal file
@@ -0,0 +1,115 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
float_or_none,
|
||||
try_get,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class FlipagramIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'https://flipagram.com/f/nyvTSJMKId',
|
||||
'md5': '888dcf08b7ea671381f00fab74692755',
|
||||
'info_dict': {
|
||||
'id': 'nyvTSJMKId',
|
||||
'ext': 'mp4',
|
||||
'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
|
||||
'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
|
||||
'duration': 35.571,
|
||||
'timestamp': 1461244995,
|
||||
'upload_date': '20160421',
|
||||
'uploader': 'kitty juria',
|
||||
'uploader_id': 'sjuria101',
|
||||
'creator': 'kitty juria',
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
'repost_count': int,
|
||||
'comment_count': int,
|
||||
'comments': list,
|
||||
'formats': 'mincount:2',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video_data = self._parse_json(
|
||||
self._search_regex(
|
||||
r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
|
||||
video_id)
|
||||
|
||||
flipagram = video_data['flipagram']
|
||||
video = flipagram['video']
|
||||
|
||||
json_ld = self._search_json_ld(webpage, video_id, default=False)
|
||||
title = json_ld.get('title') or flipagram['captionText']
|
||||
description = json_ld.get('description') or flipagram.get('captionText')
|
||||
|
||||
formats = [{
|
||||
'url': video['url'],
|
||||
'width': int_or_none(video.get('width')),
|
||||
'height': int_or_none(video.get('height')),
|
||||
'filesize': int_or_none(video_data.get('size')),
|
||||
}]
|
||||
|
||||
preview_url = try_get(
|
||||
flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
|
||||
if preview_url:
|
||||
formats.append({
|
||||
'url': preview_url,
|
||||
'ext': 'm4a',
|
||||
'vcodec': 'none',
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
counts = flipagram.get('counts', {})
|
||||
user = flipagram.get('user', {})
|
||||
video_data = flipagram.get('video', {})
|
||||
|
||||
thumbnails = [{
|
||||
'url': self._proto_relative_url(cover['url']),
|
||||
'width': int_or_none(cover.get('width')),
|
||||
'height': int_or_none(cover.get('height')),
|
||||
'filesize': int_or_none(cover.get('size')),
|
||||
} for cover in flipagram.get('covers', []) if cover.get('url')]
|
||||
|
||||
# Note that this only retrieves comments that are initally loaded.
|
||||
# For videos with large amounts of comments, most won't be retrieved.
|
||||
comments = []
|
||||
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
|
||||
text = comment.get('comment')
|
||||
if not text or not isinstance(text, list):
|
||||
continue
|
||||
comments.append({
|
||||
'author': comment.get('user', {}).get('name'),
|
||||
'author_id': comment.get('user', {}).get('username'),
|
||||
'id': comment.get('id'),
|
||||
'text': text[0],
|
||||
'timestamp': unified_timestamp(comment.get('created')),
|
||||
})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': float_or_none(flipagram.get('duration'), 1000),
|
||||
'thumbnails': thumbnails,
|
||||
'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
|
||||
'uploader': user.get('name'),
|
||||
'uploader_id': user.get('username'),
|
||||
'creator': user.get('name'),
|
||||
'view_count': int_or_none(counts.get('plays')),
|
||||
'like_count': int_or_none(counts.get('likes')),
|
||||
'repost_count': int_or_none(counts.get('reflips')),
|
||||
'comment_count': int_or_none(counts.get('comments')),
|
||||
'comments': comments,
|
||||
'formats': formats,
|
||||
}
|
@@ -14,7 +14,10 @@ from ..utils import (
|
||||
parse_duration,
|
||||
determine_ext,
|
||||
)
|
||||
from .dailymotion import DailymotionCloudIE
|
||||
from .dailymotion import (
|
||||
DailymotionIE,
|
||||
DailymotionCloudIE,
|
||||
)
|
||||
|
||||
|
||||
class FranceTVBaseInfoExtractor(InfoExtractor):
|
||||
@@ -188,6 +191,21 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# Dailymotion embed
|
||||
'url': 'http://www.francetvinfo.fr/politique/notre-dame-des-landes/video-sur-france-inter-cecile-duflot-denonce-le-regard-meprisant-de-patrick-cohen_1520091.html',
|
||||
'md5': 'ee7f1828f25a648addc90cb2687b1f12',
|
||||
'info_dict': {
|
||||
'id': 'x4iiko0',
|
||||
'ext': 'mp4',
|
||||
'title': 'NDDL, référendum, Brexit : Cécile Duflot répond à Patrick Cohen',
|
||||
'description': 'Au lendemain de la victoire du "oui" au référendum sur l\'aéroport de Notre-Dame-des-Landes, l\'ancienne ministre écologiste est l\'invitée de Patrick Cohen. Plus d\'info : https://www.franceinter.fr/emissions/le-7-9/le-7-9-27-juin-2016',
|
||||
'timestamp': 1467011958,
|
||||
'upload_date': '20160627',
|
||||
'uploader': 'France Inter',
|
||||
'uploader_id': 'x2q2ez',
|
||||
},
|
||||
'add_ie': ['Dailymotion'],
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@@ -197,7 +215,13 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
|
||||
|
||||
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
|
||||
if dmcloud_url:
|
||||
return self.url_result(dmcloud_url, 'DailymotionCloud')
|
||||
return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
|
||||
|
||||
dailymotion_urls = DailymotionIE._extract_urls(webpage)
|
||||
if dailymotion_urls:
|
||||
return self.playlist_result([
|
||||
self.url_result(dailymotion_url, DailymotionIE.ie_key())
|
||||
for dailymotion_url in dailymotion_urls])
|
||||
|
||||
video_id, catalogue = self._search_regex(
|
||||
(r'id-video=([^@]+@[^"]+)',
|
||||
|
@@ -49,7 +49,10 @@ from .pornhub import PornHubIE
|
||||
from .xhamster import XHamsterEmbedIE
|
||||
from .tnaflix import TNAFlixNetworkEmbedIE
|
||||
from .vimeo import VimeoIE
|
||||
from .dailymotion import DailymotionCloudIE
|
||||
from .dailymotion import (
|
||||
DailymotionIE,
|
||||
DailymotionCloudIE,
|
||||
)
|
||||
from .onionstudios import OnionStudiosIE
|
||||
from .viewlift import ViewLiftEmbedIE
|
||||
from .screenwavemedia import ScreenwaveMediaIE
|
||||
@@ -1295,6 +1298,38 @@ class GenericIE(InfoExtractor):
|
||||
'uploader': 'cylus cyrus',
|
||||
},
|
||||
},
|
||||
{
|
||||
# video stored on custom kaltura server
|
||||
'url': 'http://www.expansion.com/multimedia/videos.html?media=EQcM30NHIPv',
|
||||
'md5': '537617d06e64dfed891fa1593c4b30cc',
|
||||
'info_dict': {
|
||||
'id': '0_1iotm5bh',
|
||||
'ext': 'mp4',
|
||||
'title': 'Elecciones británicas: 5 lecciones para Rajoy',
|
||||
'description': 'md5:435a89d68b9760b92ce67ed227055f16',
|
||||
'uploader_id': 'videos.expansion@el-mundo.net',
|
||||
'upload_date': '20150429',
|
||||
'timestamp': 1430303472,
|
||||
},
|
||||
'add_ie': ['Kaltura'],
|
||||
},
|
||||
# {
|
||||
# # TODO: find another test
|
||||
# # http://schema.org/VideoObject
|
||||
# 'url': 'https://flipagram.com/f/nyvTSJMKId',
|
||||
# 'md5': '888dcf08b7ea671381f00fab74692755',
|
||||
# 'info_dict': {
|
||||
# 'id': 'nyvTSJMKId',
|
||||
# 'ext': 'mp4',
|
||||
# 'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
|
||||
# 'description': '#love for cats.',
|
||||
# 'timestamp': 1461244995,
|
||||
# 'upload_date': '20160421',
|
||||
# },
|
||||
# 'params': {
|
||||
# 'force_generic_extractor': True,
|
||||
# },
|
||||
# }
|
||||
]
|
||||
|
||||
def report_following_redirect(self, new_url):
|
||||
@@ -1658,12 +1693,9 @@ class GenericIE(InfoExtractor):
|
||||
if matches:
|
||||
return _playlist_from_matches(matches, lambda m: m[-1])
|
||||
|
||||
# Look for embedded Dailymotion player
|
||||
matches = re.findall(
|
||||
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
|
||||
matches = DailymotionIE._extract_urls(webpage)
|
||||
if matches:
|
||||
return _playlist_from_matches(
|
||||
matches, lambda m: unescapeHTML(m[1]))
|
||||
return _playlist_from_matches(matches)
|
||||
|
||||
# Look for embedded Dailymotion playlist player (#3822)
|
||||
m = re.search(
|
||||
@@ -2142,6 +2174,19 @@ class GenericIE(InfoExtractor):
|
||||
if embed_url:
|
||||
return self.url_result(embed_url)
|
||||
|
||||
# Looking for http://schema.org/VideoObject
|
||||
json_ld = self._search_json_ld(
|
||||
webpage, video_id, default=None, expected_type='VideoObject')
|
||||
if json_ld and json_ld.get('url'):
|
||||
info_dict.update({
|
||||
'title': video_title or info_dict['title'],
|
||||
'description': video_description,
|
||||
'thumbnail': video_thumbnail,
|
||||
'age_limit': age_limit
|
||||
})
|
||||
info_dict.update(json_ld)
|
||||
return info_dict
|
||||
|
||||
def check_video(vurl):
|
||||
if YoutubeIE.suitable(vurl):
|
||||
return True
|
||||
|
@@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
|
||||
'md5': '5b0591f55961117155430b5d544fdb01',
|
||||
# MD5 checksum differs on my machine and Travis CI
|
||||
'info_dict': {
|
||||
'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
|
||||
'ext': 'mp4',
|
||||
@@ -293,14 +293,10 @@ class IqiyiIE(InfoExtractor):
|
||||
't': tm,
|
||||
}
|
||||
|
||||
headers = {}
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
headers['Ytdl-request-proxy'] = cn_verification_proxy
|
||||
return self._download_json(
|
||||
'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
|
||||
video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
|
||||
query=params, headers=headers)
|
||||
query=params, headers=self.geo_verification_headers())
|
||||
|
||||
def _extract_playlist(self, webpage):
|
||||
PAGE_SIZE = 50
|
||||
|
@@ -6,7 +6,6 @@ import base64
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urllib_parse_urlencode,
|
||||
compat_urlparse,
|
||||
compat_parse_qs,
|
||||
)
|
||||
@@ -15,6 +14,7 @@ from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
unsmuggle_url,
|
||||
smuggle_url,
|
||||
)
|
||||
|
||||
|
||||
@@ -34,7 +34,8 @@ class KalturaIE(InfoExtractor):
|
||||
)(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))?
|
||||
)
|
||||
'''
|
||||
_API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?'
|
||||
_SERVICE_URL = 'http://cdnapi.kaltura.com'
|
||||
_SERVICE_BASE = '/api_v3/index.php'
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'kaltura:269692:1_1jc2y3e4',
|
||||
@@ -88,18 +89,26 @@ class KalturaIE(InfoExtractor):
|
||||
(?P<q3>["\'])(?P<id>.+?)(?P=q3)
|
||||
''', webpage))
|
||||
if mobj:
|
||||
return 'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict()
|
||||
embed_info = mobj.groupdict()
|
||||
url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
|
||||
escaped_pid = re.escape(embed_info['partner_id'])
|
||||
service_url = re.search(
|
||||
r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
|
||||
webpage)
|
||||
if service_url:
|
||||
url = smuggle_url(url, {'service_url': service_url.group(1)})
|
||||
return url
|
||||
|
||||
def _kaltura_api_call(self, video_id, actions, *args, **kwargs):
|
||||
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
|
||||
params = actions[0]
|
||||
if len(actions) > 1:
|
||||
for i, a in enumerate(actions[1:], start=1):
|
||||
for k, v in a.items():
|
||||
params['%d:%s' % (i, k)] = v
|
||||
|
||||
query = compat_urllib_parse_urlencode(params)
|
||||
url = self._API_BASE + query
|
||||
data = self._download_json(url, video_id, *args, **kwargs)
|
||||
data = self._download_json(
|
||||
(service_url or self._SERVICE_URL) + self._SERVICE_BASE,
|
||||
video_id, query=params, *args, **kwargs)
|
||||
|
||||
status = data if len(actions) == 1 else data[0]
|
||||
if status.get('objectType') == 'KalturaAPIException':
|
||||
@@ -108,7 +117,7 @@ class KalturaIE(InfoExtractor):
|
||||
|
||||
return data
|
||||
|
||||
def _get_kaltura_signature(self, video_id, partner_id):
|
||||
def _get_kaltura_signature(self, video_id, partner_id, service_url=None):
|
||||
actions = [{
|
||||
'apiVersion': '3.1',
|
||||
'expiry': 86400,
|
||||
@@ -118,10 +127,10 @@ class KalturaIE(InfoExtractor):
|
||||
'widgetId': '_%s' % partner_id,
|
||||
}]
|
||||
return self._kaltura_api_call(
|
||||
video_id, actions, note='Downloading Kaltura signature')['ks']
|
||||
video_id, actions, service_url, note='Downloading Kaltura signature')['ks']
|
||||
|
||||
def _get_video_info(self, video_id, partner_id):
|
||||
signature = self._get_kaltura_signature(video_id, partner_id)
|
||||
def _get_video_info(self, video_id, partner_id, service_url=None):
|
||||
signature = self._get_kaltura_signature(video_id, partner_id, service_url)
|
||||
actions = [
|
||||
{
|
||||
'action': 'null',
|
||||
@@ -144,7 +153,7 @@ class KalturaIE(InfoExtractor):
|
||||
},
|
||||
]
|
||||
return self._kaltura_api_call(
|
||||
video_id, actions, note='Downloading video info JSON')
|
||||
video_id, actions, service_url, note='Downloading video info JSON')
|
||||
|
||||
def _real_extract(self, url):
|
||||
url, smuggled_data = unsmuggle_url(url, {})
|
||||
@@ -153,7 +162,7 @@ class KalturaIE(InfoExtractor):
|
||||
partner_id, entry_id = mobj.group('partner_id', 'id')
|
||||
ks = None
|
||||
if partner_id and entry_id:
|
||||
info, flavor_assets = self._get_video_info(entry_id, partner_id)
|
||||
info, flavor_assets = self._get_video_info(entry_id, partner_id, smuggled_data.get('service_url'))
|
||||
else:
|
||||
path, query = mobj.group('path', 'query')
|
||||
if not path and not query:
|
||||
@@ -201,12 +210,17 @@ class KalturaIE(InfoExtractor):
|
||||
unsigned_url += '?referrer=%s' % referrer
|
||||
return unsigned_url
|
||||
|
||||
data_url = info['dataUrl']
|
||||
if '/flvclipper/' in data_url:
|
||||
data_url = re.sub(r'/flvclipper/.*', '/serveFlavor', data_url)
|
||||
|
||||
formats = []
|
||||
for f in flavor_assets:
|
||||
# Continue if asset is not ready
|
||||
if f['status'] != 2:
|
||||
continue
|
||||
video_url = sign_url('%s/flavorId/%s' % (info['dataUrl'], f['id']))
|
||||
video_url = sign_url(
|
||||
'%s/flavorId/%s' % (data_url, f['id']))
|
||||
formats.append({
|
||||
'format_id': '%(fileExt)s-%(bitrate)s' % f,
|
||||
'ext': f.get('fileExt'),
|
||||
@@ -219,9 +233,12 @@ class KalturaIE(InfoExtractor):
|
||||
'width': int_or_none(f.get('width')),
|
||||
'url': video_url,
|
||||
})
|
||||
m3u8_url = sign_url(info['dataUrl'].replace('format/url', 'format/applehttp'))
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
if '/playManifest/' in data_url:
|
||||
m3u8_url = sign_url(data_url.replace(
|
||||
'format/url', 'format/applehttp'))
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
m3u8_url, entry_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
|
||||
self._check_formats(formats, entry_id)
|
||||
self._sort_formats(formats)
|
||||
|
71
youtube_dl/extractor/kamcord.py
Normal file
71
youtube_dl/extractor/kamcord.py
Normal file
@@ -0,0 +1,71 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
qualities,
|
||||
)
|
||||
|
||||
|
||||
class KamcordIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
|
||||
'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
|
||||
'info_dict': {
|
||||
'id': 'hNYRduDgWb4',
|
||||
'ext': 'mp4',
|
||||
'title': 'Drinking Madness',
|
||||
'uploader': 'jacksfilms',
|
||||
'uploader_id': '3044562',
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
'comment_count': int,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video = self._parse_json(
|
||||
self._search_regex(
|
||||
r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
|
||||
webpage, 'video'),
|
||||
video_id)['video']
|
||||
|
||||
title = video['title']
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
|
||||
self._sort_formats(formats)
|
||||
|
||||
uploader = video.get('user', {}).get('username')
|
||||
uploader_id = video.get('user', {}).get('id')
|
||||
|
||||
view_count = int_or_none(video.get('viewCount'))
|
||||
like_count = int_or_none(video.get('heartCount'))
|
||||
comment_count = int_or_none(video.get('messageCount'))
|
||||
|
||||
preference_key = qualities(('small', 'medium', 'large'))
|
||||
|
||||
thumbnails = [{
|
||||
'url': thumbnail_url,
|
||||
'id': thumbnail_id,
|
||||
'preference': preference_key(thumbnail_id),
|
||||
} for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
|
||||
if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'uploader': uploader,
|
||||
'uploader_id': uploader_id,
|
||||
'view_count': view_count,
|
||||
'like_count': like_count,
|
||||
'comment_count': comment_count,
|
||||
'thumbnails': thumbnails,
|
||||
'formats': formats,
|
||||
}
|
@@ -26,11 +26,6 @@ class KuwoBaseIE(InfoExtractor):
|
||||
def _get_formats(self, song_id, tolerate_ip_deny=False):
|
||||
formats = []
|
||||
for file_format in self._FORMATS:
|
||||
headers = {}
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
headers['Ytdl-request-proxy'] = cn_verification_proxy
|
||||
|
||||
query = {
|
||||
'format': file_format['ext'],
|
||||
'br': file_format.get('br', ''),
|
||||
@@ -42,7 +37,7 @@ class KuwoBaseIE(InfoExtractor):
|
||||
song_url = self._download_webpage(
|
||||
'http://antiserver.kuwo.cn/anti.s',
|
||||
song_id, note='Download %s url info' % file_format['format'],
|
||||
query=query, headers=headers,
|
||||
query=query, headers=self.geo_verification_headers(),
|
||||
)
|
||||
|
||||
if song_url == 'IPDeny' and not tolerate_ip_deny:
|
||||
|
@@ -3,8 +3,8 @@ from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
js_to_json,
|
||||
smuggle_url,
|
||||
)
|
||||
|
||||
|
||||
@@ -18,13 +18,16 @@ class LA7IE(InfoExtractor):
|
||||
_TESTS = [{
|
||||
# 'src' is a plain URL
|
||||
'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
|
||||
'md5': '6054674766e7988d3e02f2148ff92180',
|
||||
'md5': '8b613ffc0c4bf9b9e377169fc19c214c',
|
||||
'info_dict': {
|
||||
'id': 'inccool8-02-10-2015-163722',
|
||||
'ext': 'mp4',
|
||||
'title': 'Inc.Cool8',
|
||||
'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto atletico',
|
||||
'thumbnail': 're:^https?://.*',
|
||||
'uploader_id': 'kdla7pillole@iltrovatore.it',
|
||||
'timestamp': 1443814869,
|
||||
'upload_date': '20151002',
|
||||
},
|
||||
}, {
|
||||
# 'src' is a dictionary
|
||||
@@ -49,26 +52,14 @@ class LA7IE(InfoExtractor):
|
||||
self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'),
|
||||
video_id, transform_source=js_to_json)
|
||||
|
||||
source = player_data['src']
|
||||
source_urls = source.values() if isinstance(source, dict) else [source]
|
||||
|
||||
formats = []
|
||||
for source_url in source_urls:
|
||||
ext = determine_ext(source_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
source_url, video_id, ext='mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='hls'))
|
||||
else:
|
||||
formats.append({
|
||||
'url': source_url,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'url': smuggle_url('kaltura:103:%s' % player_data['vid'], {
|
||||
'service_url': 'http://kdam.iltrovatore.it',
|
||||
}),
|
||||
'id': video_id,
|
||||
'title': player_data['title'],
|
||||
'description': self._og_search_description(webpage, default=None),
|
||||
'thumbnail': player_data.get('poster'),
|
||||
'formats': formats,
|
||||
'ie_key': 'Kaltura',
|
||||
}
|
||||
|
@@ -20,10 +20,10 @@ from ..utils import (
|
||||
int_or_none,
|
||||
orderedSet,
|
||||
parse_iso8601,
|
||||
sanitized_Request,
|
||||
str_or_none,
|
||||
url_basename,
|
||||
urshift,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
@@ -90,6 +90,10 @@ class LeIE(InfoExtractor):
|
||||
_loc3_ = self.ror(_loc3_, _loc2_ % 17)
|
||||
return _loc3_
|
||||
|
||||
# reversed from http://jstatic.letvcdn.com/sdk/player.js
|
||||
def get_mms_key(self, time):
|
||||
return self.ror(time, 8) ^ 185025305
|
||||
|
||||
# see M3U8Encryption class in KLetvPlayer.swf
|
||||
@staticmethod
|
||||
def decrypt_m3u8(encrypted_data):
|
||||
@@ -110,28 +114,7 @@ class LeIE(InfoExtractor):
|
||||
|
||||
return bytes(_loc7_)
|
||||
|
||||
def _real_extract(self, url):
|
||||
media_id = self._match_id(url)
|
||||
page = self._download_webpage(url, media_id)
|
||||
params = {
|
||||
'id': media_id,
|
||||
'platid': 1,
|
||||
'splatid': 101,
|
||||
'format': 1,
|
||||
'tkey': self.calc_time_key(int(time.time())),
|
||||
'domain': 'www.le.com'
|
||||
}
|
||||
play_json_req = sanitized_Request(
|
||||
'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
|
||||
)
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
|
||||
|
||||
play_json = self._download_json(
|
||||
play_json_req,
|
||||
media_id, 'Downloading playJson data')
|
||||
|
||||
def _check_errors(self, play_json):
|
||||
# Check for errors
|
||||
playstatus = play_json['playstatus']
|
||||
if playstatus['status'] == 0:
|
||||
@@ -142,43 +125,99 @@ class LeIE(InfoExtractor):
|
||||
msg = 'Generic error. flag = %d' % flag
|
||||
raise ExtractorError(msg, expected=True)
|
||||
|
||||
playurl = play_json['playurl']
|
||||
def _real_extract(self, url):
|
||||
media_id = self._match_id(url)
|
||||
page = self._download_webpage(url, media_id)
|
||||
|
||||
formats = ['350', '1000', '1300', '720p', '1080p']
|
||||
dispatch = playurl['dispatch']
|
||||
play_json_h5 = self._download_json(
|
||||
'http://api.le.com/mms/out/video/playJsonH5',
|
||||
media_id, 'Downloading html5 playJson data', query={
|
||||
'id': media_id,
|
||||
'platid': 3,
|
||||
'splatid': 304,
|
||||
'format': 1,
|
||||
'tkey': self.get_mms_key(int(time.time())),
|
||||
'domain': 'www.le.com',
|
||||
'tss': 'no',
|
||||
},
|
||||
headers=self.geo_verification_headers())
|
||||
self._check_errors(play_json_h5)
|
||||
|
||||
urls = []
|
||||
for format_id in formats:
|
||||
if format_id in dispatch:
|
||||
media_url = playurl['domain'][0] + dispatch[format_id][0]
|
||||
media_url += '&' + compat_urllib_parse_urlencode({
|
||||
'm3v': 1,
|
||||
play_json_flash = self._download_json(
|
||||
'http://api.le.com/mms/out/video/playJson',
|
||||
media_id, 'Downloading flash playJson data', query={
|
||||
'id': media_id,
|
||||
'platid': 1,
|
||||
'splatid': 101,
|
||||
'format': 1,
|
||||
'tkey': self.calc_time_key(int(time.time())),
|
||||
'domain': 'www.le.com',
|
||||
},
|
||||
headers=self.geo_verification_headers())
|
||||
self._check_errors(play_json_flash)
|
||||
|
||||
def get_h5_urls(media_url, format_id):
|
||||
location = self._download_json(
|
||||
media_url, media_id,
|
||||
'Download JSON metadata for format %s' % format_id, query={
|
||||
'format': 1,
|
||||
'expect': 3,
|
||||
'rateid': format_id,
|
||||
})
|
||||
'tss': 'no',
|
||||
})['location']
|
||||
|
||||
nodes_data = self._download_json(
|
||||
media_url, media_id,
|
||||
'Download JSON metadata for format %s' % format_id)
|
||||
return {
|
||||
'http': update_url_query(location, {'tss': 'no'}),
|
||||
'hls': update_url_query(location, {'tss': 'ios'}),
|
||||
}
|
||||
|
||||
req = self._request_webpage(
|
||||
nodes_data['nodelist'][0]['location'], media_id,
|
||||
note='Downloading m3u8 information for format %s' % format_id)
|
||||
def get_flash_urls(media_url, format_id):
|
||||
media_url += '&' + compat_urllib_parse_urlencode({
|
||||
'm3v': 1,
|
||||
'format': 1,
|
||||
'expect': 3,
|
||||
'rateid': format_id,
|
||||
})
|
||||
|
||||
m3u8_data = self.decrypt_m3u8(req.read())
|
||||
nodes_data = self._download_json(
|
||||
media_url, media_id,
|
||||
'Download JSON metadata for format %s' % format_id)
|
||||
|
||||
url_info_dict = {
|
||||
'url': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
|
||||
'ext': determine_ext(dispatch[format_id][1]),
|
||||
'format_id': format_id,
|
||||
'protocol': 'm3u8',
|
||||
}
|
||||
req = self._request_webpage(
|
||||
nodes_data['nodelist'][0]['location'], media_id,
|
||||
note='Downloading m3u8 information for format %s' % format_id)
|
||||
|
||||
if format_id[-1:] == 'p':
|
||||
url_info_dict['height'] = int_or_none(format_id[:-1])
|
||||
m3u8_data = self.decrypt_m3u8(req.read())
|
||||
|
||||
urls.append(url_info_dict)
|
||||
return {
|
||||
'hls': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
|
||||
}
|
||||
|
||||
extracted_formats = []
|
||||
formats = []
|
||||
for play_json, get_urls in ((play_json_h5, get_h5_urls), (play_json_flash, get_flash_urls)):
|
||||
playurl = play_json['playurl']
|
||||
play_domain = playurl['domain'][0]
|
||||
|
||||
for format_id, format_data in playurl.get('dispatch', []).items():
|
||||
if format_id in extracted_formats:
|
||||
continue
|
||||
extracted_formats.append(format_id)
|
||||
|
||||
media_url = play_domain + format_data[0]
|
||||
for protocol, format_url in get_urls(media_url, format_id).items():
|
||||
f = {
|
||||
'url': format_url,
|
||||
'ext': determine_ext(format_data[1]),
|
||||
'format_id': '%s-%s' % (protocol, format_id),
|
||||
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
|
||||
'quality': int_or_none(format_id),
|
||||
}
|
||||
|
||||
if format_id[-1:] == 'p':
|
||||
f['height'] = int_or_none(format_id[:-1])
|
||||
|
||||
formats.append(f)
|
||||
self._sort_formats(formats, ('height', 'quality', 'format_id'))
|
||||
|
||||
publish_time = parse_iso8601(self._html_search_regex(
|
||||
r'发布时间 ([^<>]+) ', page, 'publish time', default=None),
|
||||
@@ -187,7 +226,7 @@ class LeIE(InfoExtractor):
|
||||
|
||||
return {
|
||||
'id': media_id,
|
||||
'formats': urls,
|
||||
'formats': formats,
|
||||
'title': playurl['title'],
|
||||
'thumbnail': playurl['pic'],
|
||||
'description': description,
|
||||
|
@@ -100,7 +100,7 @@ class LyndaIE(LyndaBaseIE):
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
|
||||
'md5': 'ecfc6862da89489161fb9cd5f5a6fac1',
|
||||
'md5': '679734f6786145da3546585de9a356be',
|
||||
'info_dict': {
|
||||
'id': '114408',
|
||||
'ext': 'mp4',
|
||||
|
@@ -11,13 +11,14 @@ from ..utils import (
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
sanitized_Request,
|
||||
urlencode_postdata,
|
||||
get_element_by_attribute,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
class MetacafeIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
|
||||
_VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/(?P<video_id>[^/]+)/(?P<display_id>[^/?#]+)'
|
||||
_DISCLAIMER = 'http://www.metacafe.com/family_filter/'
|
||||
_FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
|
||||
IE_NAME = 'metacafe'
|
||||
@@ -47,6 +48,7 @@ class MetacafeIE(InfoExtractor):
|
||||
'uploader': 'ign',
|
||||
'description': 'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.',
|
||||
},
|
||||
'skip': 'Page is temporarily unavailable.',
|
||||
},
|
||||
# AnyClip video
|
||||
{
|
||||
@@ -55,8 +57,8 @@ class MetacafeIE(InfoExtractor):
|
||||
'id': 'an-dVVXnuY7Jh77J',
|
||||
'ext': 'mp4',
|
||||
'title': 'The Andromeda Strain (1971): Stop the Bomb Part 3',
|
||||
'uploader': 'anyclip',
|
||||
'description': 'md5:38c711dd98f5bb87acf973d573442e67',
|
||||
'uploader': 'AnyClip',
|
||||
'description': 'md5:cbef0460d31e3807f6feb4e7a5952e5b',
|
||||
},
|
||||
},
|
||||
# age-restricted video
|
||||
@@ -110,28 +112,25 @@ class MetacafeIE(InfoExtractor):
|
||||
def report_disclaimer(self):
|
||||
self.to_screen('Retrieving disclaimer')
|
||||
|
||||
def _real_initialize(self):
|
||||
def _confirm_age(self):
|
||||
# Retrieve disclaimer
|
||||
self.report_disclaimer()
|
||||
self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
|
||||
|
||||
# Confirm age
|
||||
disclaimer_form = {
|
||||
'filters': '0',
|
||||
'submit': "Continue - I'm over 18",
|
||||
}
|
||||
request = sanitized_Request(self._FILTER_POST, urlencode_postdata(disclaimer_form))
|
||||
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
|
||||
self.report_age_confirmation()
|
||||
self._download_webpage(request, None, False, 'Unable to confirm age')
|
||||
self._download_webpage(
|
||||
self._FILTER_POST, None, False, 'Unable to confirm age',
|
||||
data=urlencode_postdata({
|
||||
'filters': '0',
|
||||
'submit': "Continue - I'm over 18",
|
||||
}), headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
})
|
||||
|
||||
def _real_extract(self, url):
|
||||
# Extract id and simplified title from URL
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
if mobj is None:
|
||||
raise ExtractorError('Invalid URL: %s' % url)
|
||||
|
||||
video_id = mobj.group(1)
|
||||
video_id, display_id = re.match(self._VALID_URL, url).groups()
|
||||
|
||||
# the video may come from an external site
|
||||
m_external = re.match('^(\w{2})-(.*)$', video_id)
|
||||
@@ -144,15 +143,24 @@ class MetacafeIE(InfoExtractor):
|
||||
if prefix == 'cb':
|
||||
return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
|
||||
|
||||
# Retrieve video webpage to extract further information
|
||||
req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id)
|
||||
# self._confirm_age()
|
||||
|
||||
# AnyClip videos require the flashversion cookie so that we get the link
|
||||
# to the mp4 file
|
||||
mobj_an = re.match(r'^an-(.*?)$', video_id)
|
||||
if mobj_an:
|
||||
req.headers['Cookie'] = 'flashVersion=0;'
|
||||
webpage = self._download_webpage(req, video_id)
|
||||
headers = {}
|
||||
if video_id.startswith('an-'):
|
||||
headers['Cookie'] = 'flashVersion=0;'
|
||||
|
||||
# Retrieve video webpage to extract further information
|
||||
webpage = self._download_webpage(url, video_id, headers=headers)
|
||||
|
||||
error = get_element_by_attribute(
|
||||
'class', 'notfound-page-title', webpage)
|
||||
if error:
|
||||
raise ExtractorError(error, expected=True)
|
||||
|
||||
video_title = self._html_search_meta(
|
||||
['og:title', 'twitter:title'], webpage, 'title', default=None) or self._search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
|
||||
|
||||
# Extract URL, uploader and title from webpage
|
||||
self.report_extraction(video_id)
|
||||
@@ -216,20 +224,40 @@ class MetacafeIE(InfoExtractor):
|
||||
'player_url': player_url,
|
||||
'ext': play_path.partition(':')[0],
|
||||
})
|
||||
if video_url is None:
|
||||
flashvars = self._parse_json(self._search_regex(
|
||||
r'flashvars\s*=\s*({.*});', webpage, 'flashvars',
|
||||
default=None), video_id, fatal=False)
|
||||
if flashvars:
|
||||
video_url = []
|
||||
for source in flashvars.get('sources'):
|
||||
source_url = source.get('src')
|
||||
if not source_url:
|
||||
continue
|
||||
ext = mimetype2ext(source.get('type')) or determine_ext(source_url)
|
||||
if ext == 'm3u8':
|
||||
video_url.extend(self._extract_m3u8_formats(
|
||||
source_url, video_id, 'mp4',
|
||||
'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
video_url.append({
|
||||
'url': source_url,
|
||||
'ext': ext,
|
||||
})
|
||||
|
||||
if video_url is None:
|
||||
raise ExtractorError('Unsupported video type')
|
||||
|
||||
video_title = self._html_search_regex(
|
||||
r'(?im)<title>(.*) - Video</title>', webpage, 'title')
|
||||
description = self._og_search_description(webpage)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
description = self._html_search_meta(
|
||||
['og:description', 'twitter:description', 'description'],
|
||||
webpage, 'title', fatal=False)
|
||||
thumbnail = self._html_search_meta(
|
||||
['og:image', 'twitter:image'], webpage, 'title', fatal=False)
|
||||
video_uploader = self._html_search_regex(
|
||||
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
|
||||
webpage, 'uploader nickname', fatal=False)
|
||||
duration = int_or_none(
|
||||
self._html_search_meta('video:duration', webpage))
|
||||
|
||||
self._html_search_meta('video:duration', webpage, default=None))
|
||||
age_limit = (
|
||||
18
|
||||
if re.search(r'(?:"contentRating":|"rating",)"restricted"', webpage)
|
||||
@@ -242,10 +270,11 @@ class MetacafeIE(InfoExtractor):
|
||||
'url': video_url,
|
||||
'ext': video_ext,
|
||||
}]
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'description': description,
|
||||
'uploader': video_uploader,
|
||||
'title': video_title,
|
||||
|
@@ -26,7 +26,8 @@ class MGTVIE(InfoExtractor):
|
||||
video_id = self._match_id(url)
|
||||
api_data = self._download_json(
|
||||
'http://v.api.mgtv.com/player/video', video_id,
|
||||
query={'video_id': video_id})['data']
|
||||
query={'video_id': video_id},
|
||||
headers=self.geo_verification_headers())['data']
|
||||
info = api_data['info']
|
||||
|
||||
formats = []
|
||||
|
@@ -12,12 +12,69 @@ from ..utils import (
|
||||
get_element_by_attribute,
|
||||
int_or_none,
|
||||
remove_start,
|
||||
extract_attributes,
|
||||
determine_ext,
|
||||
)
|
||||
|
||||
|
||||
class MiTeleIE(InfoExtractor):
|
||||
class MiTeleBaseIE(InfoExtractor):
|
||||
def _get_player_info(self, url, webpage):
|
||||
player_data = extract_attributes(self._search_regex(
|
||||
r'(?s)(<ms-video-player.+?</ms-video-player>)',
|
||||
webpage, 'ms video player'))
|
||||
video_id = player_data['data-media-id']
|
||||
config_url = compat_urlparse.urljoin(url, player_data['data-config'])
|
||||
config = self._download_json(
|
||||
config_url, video_id, 'Downloading config JSON')
|
||||
mmc_url = config['services']['mmc']
|
||||
|
||||
duration = None
|
||||
formats = []
|
||||
for m_url in (mmc_url, mmc_url.replace('/flash.json', '/html5.json')):
|
||||
mmc = self._download_json(
|
||||
m_url, video_id, 'Downloading mmc JSON')
|
||||
if not duration:
|
||||
duration = int_or_none(mmc.get('duration'))
|
||||
for location in mmc['locations']:
|
||||
gat = self._proto_relative_url(location.get('gat'), 'http:')
|
||||
bas = location.get('bas')
|
||||
loc = location.get('loc')
|
||||
ogn = location.get('ogn')
|
||||
if None in (gat, bas, loc, ogn):
|
||||
continue
|
||||
token_data = {
|
||||
'bas': bas,
|
||||
'icd': loc,
|
||||
'ogn': ogn,
|
||||
'sta': '0',
|
||||
}
|
||||
media = self._download_json(
|
||||
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
|
||||
video_id, 'Downloading %s JSON' % location['loc'])
|
||||
file_ = media.get('file')
|
||||
if not file_:
|
||||
continue
|
||||
ext = determine_ext(file_)
|
||||
if ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
elif ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
'thumbnail': player_data.get('data-poster') or config.get('poster', {}).get('imageUrl'),
|
||||
'duration': duration,
|
||||
}
|
||||
|
||||
|
||||
class MiTeleIE(MiTeleBaseIE):
|
||||
IE_DESC = 'mitele.es'
|
||||
_VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
|
||||
_VALID_URL = r'https?://www\.mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
|
||||
@@ -25,7 +82,7 @@ class MiTeleIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': '0NF1jJnxS1Wu3pHrmvFyw2',
|
||||
'display_id': 'programa-144',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Tor, la web invisible',
|
||||
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
|
||||
'series': 'Diario de',
|
||||
@@ -40,7 +97,7 @@ class MiTeleIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': 'eLZSwoEd1S3pVyUm8lc6F',
|
||||
'display_id': 'programa-226',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Cuarto Milenio - Temporada 6 - Programa 226',
|
||||
'description': 'md5:50daf9fadefa4e62d9fc866d0c015701',
|
||||
'series': 'Cuarto Milenio',
|
||||
@@ -59,40 +116,7 @@ class MiTeleIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
config_url = self._search_regex(
|
||||
r'data-config\s*=\s*"([^"]+)"', webpage, 'data config url')
|
||||
config_url = compat_urlparse.urljoin(url, config_url)
|
||||
|
||||
config = self._download_json(
|
||||
config_url, display_id, 'Downloading config JSON')
|
||||
|
||||
mmc = self._download_json(
|
||||
config['services']['mmc'], display_id, 'Downloading mmc JSON')
|
||||
|
||||
formats = []
|
||||
for location in mmc['locations']:
|
||||
gat = self._proto_relative_url(location.get('gat'), 'http:')
|
||||
bas = location.get('bas')
|
||||
loc = location.get('loc')
|
||||
ogn = location.get('ogn')
|
||||
if None in (gat, bas, loc, ogn):
|
||||
continue
|
||||
token_data = {
|
||||
'bas': bas,
|
||||
'icd': loc,
|
||||
'ogn': ogn,
|
||||
'sta': '0',
|
||||
}
|
||||
media = self._download_json(
|
||||
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
|
||||
display_id, 'Downloading %s JSON' % location['loc'])
|
||||
file_ = media.get('file')
|
||||
if not file_:
|
||||
continue
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
|
||||
display_id, f4m_id=loc))
|
||||
self._sort_formats(formats)
|
||||
info = self._get_player_info(url, webpage)
|
||||
|
||||
title = self._search_regex(
|
||||
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>',
|
||||
@@ -112,21 +136,12 @@ class MiTeleIE(InfoExtractor):
|
||||
title = remove_start(self._search_regex(
|
||||
r'<title>([^<]+)</title>', webpage, 'title'), 'Ver online ')
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'data-media-id\s*=\s*"([^"]+)"', webpage,
|
||||
'data media id', default=None) or display_id
|
||||
thumbnail = config.get('poster', {}).get('imageUrl')
|
||||
duration = int_or_none(mmc.get('duration'))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
info.update({
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': get_element_by_attribute('class', 'text', webpage),
|
||||
'series': series,
|
||||
'season': season,
|
||||
'episode': episode,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
}
|
||||
})
|
||||
return info
|
||||
|
@@ -8,7 +8,7 @@ from ..utils import update_url_query
|
||||
|
||||
class NickIE(MTVServicesInfoExtractor):
|
||||
IE_NAME = 'nick.com'
|
||||
_VALID_URL = r'https?://(?:www\.)?nick\.com/videos/clip/(?P<id>[^/?#.]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?nick(?:jr)?\.com/(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
|
||||
_FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
|
||||
@@ -52,6 +52,9 @@ class NickIE(MTVServicesInfoExtractor):
|
||||
}
|
||||
},
|
||||
],
|
||||
}, {
|
||||
'url': 'http://www.nickjr.com/paw-patrol/videos/pups-save-a-goldrush-s3-ep302-full-episode/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _get_feed_query(self, uri):
|
||||
|
172
youtube_dl/extractor/onet.py
Normal file
172
youtube_dl/extractor/onet.py
Normal file
@@ -0,0 +1,172 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
get_element_by_class,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
parse_iso8601,
|
||||
remove_start,
|
||||
strip_or_none,
|
||||
url_basename,
|
||||
)
|
||||
|
||||
|
||||
class OnetBaseIE(InfoExtractor):
|
||||
def _search_mvp_id(self, webpage):
|
||||
return self._search_regex(
|
||||
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
|
||||
|
||||
def _extract_from_id(self, video_id, webpage):
|
||||
response = self._download_json(
|
||||
'http://qi.ckm.onetapi.pl/', video_id,
|
||||
query={
|
||||
'body[id]': video_id,
|
||||
'body[jsonrpc]': '2.0',
|
||||
'body[method]': 'get_asset_detail',
|
||||
'body[params][ID_Publikacji]': video_id,
|
||||
'body[params][Service]': 'www.onet.pl',
|
||||
'content-type': 'application/jsonp',
|
||||
'x-onet-app': 'player.front.onetapi.pl',
|
||||
})
|
||||
|
||||
error = response.get('error')
|
||||
if error:
|
||||
raise ExtractorError(
|
||||
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
|
||||
|
||||
video = response['result'].get('0')
|
||||
|
||||
formats = []
|
||||
for _, formats_dict in video['formats'].items():
|
||||
if not isinstance(formats_dict, dict):
|
||||
continue
|
||||
for format_id, format_list in formats_dict.items():
|
||||
if not isinstance(format_list, list):
|
||||
continue
|
||||
for f in format_list:
|
||||
video_url = f.get('url')
|
||||
if not video_url:
|
||||
continue
|
||||
ext = determine_ext(video_url)
|
||||
if format_id == 'ism':
|
||||
# TODO: Support Microsoft Smooth Streaming
|
||||
continue
|
||||
elif ext == 'mpd':
|
||||
# TODO: Current DASH formats are broken - $Time$ pattern in
|
||||
# <SegmentTemplate> not implemented yet
|
||||
# formats.extend(self._extract_mpd_formats(
|
||||
# video_url, video_id, mpd_id='dash', fatal=False))
|
||||
continue
|
||||
else:
|
||||
formats.append({
|
||||
'url': video_url,
|
||||
'format_id': format_id,
|
||||
'height': int_or_none(f.get('vertical_resolution')),
|
||||
'width': int_or_none(f.get('horizontal_resolution')),
|
||||
'abr': float_or_none(f.get('audio_bitrate')),
|
||||
'vbr': float_or_none(f.get('video_bitrate')),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
meta = video.get('meta', {})
|
||||
|
||||
title = self._og_search_title(webpage, default=None) or meta['title']
|
||||
description = self._og_search_description(webpage, default=None) or meta.get('description')
|
||||
duration = meta.get('length') or meta.get('lenght')
|
||||
timestamp = parse_iso8601(meta.get('addDate'), ' ')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'timestamp': timestamp,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
|
||||
class OnetIE(OnetBaseIE):
|
||||
_VALID_URL = 'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
|
||||
IE_NAME = 'onet.tv'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://onet.tv/k/openerfestival/open-er-festival-2016-najdziwniejsze-wymagania-gwiazd/qbpyqc',
|
||||
'md5': 'e3ffbf47590032ac3f27249204173d50',
|
||||
'info_dict': {
|
||||
'id': 'qbpyqc',
|
||||
'display_id': 'open-er-festival-2016-najdziwniejsze-wymagania-gwiazd',
|
||||
'ext': 'mp4',
|
||||
'title': 'Open\'er Festival 2016: najdziwniejsze wymagania gwiazd',
|
||||
'description': 'Trzy samochody, których nigdy nie użyto, prywatne spa, hotel dekorowany czarnym suknem czy nielegalne używki. Organizatorzy koncertów i festiwali muszą stawać przed nie lada wyzwaniem zapraszając gwia...',
|
||||
'upload_date': '20160705',
|
||||
'timestamp': 1467721580,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
display_id, video_id = mobj.group('display_id', 'id')
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
mvp_id = self._search_mvp_id(webpage)
|
||||
|
||||
info_dict = self._extract_from_id(mvp_id, webpage)
|
||||
info_dict.update({
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
})
|
||||
|
||||
return info_dict
|
||||
|
||||
|
||||
class OnetChannelIE(OnetBaseIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/(?P<id>[a-z]+)(?:[?#]|$)'
|
||||
IE_NAME = 'onet.tv:channel'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://onet.tv/k/openerfestival',
|
||||
'info_dict': {
|
||||
'id': 'openerfestival',
|
||||
'title': 'Open\'er Festival Live',
|
||||
'description': 'Dziękujemy, że oglądaliście transmisje. Zobaczcie nasze relacje i wywiady z artystami.',
|
||||
},
|
||||
'playlist_mincount': 46,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
channel_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, channel_id)
|
||||
|
||||
current_clip_info = self._parse_json(self._search_regex(
|
||||
r'var\s+currentClip\s*=\s*({[^}]+})', webpage, 'video info'), channel_id,
|
||||
transform_source=lambda s: js_to_json(re.sub(r'\'\s*\+\s*\'', '', s)))
|
||||
video_id = remove_start(current_clip_info['ckmId'], 'mvp:')
|
||||
video_name = url_basename(current_clip_info['url'])
|
||||
|
||||
if self._downloader.params.get('noplaylist'):
|
||||
self.to_screen(
|
||||
'Downloading just video %s because of --no-playlist' % video_name)
|
||||
return self._extract_from_id(video_id, webpage)
|
||||
|
||||
self.to_screen(
|
||||
'Downloading channel %s - add --no-playlist to just download video %s' % (
|
||||
channel_id, video_name))
|
||||
matches = re.findall(
|
||||
r'<a[^>]+href=[\'"](https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/[0-9a-z-]+/[0-9a-z]+)',
|
||||
webpage)
|
||||
entries = [
|
||||
self.url_result(video_link, OnetIE.ie_key())
|
||||
for video_link in matches]
|
||||
|
||||
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
|
||||
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
|
||||
return self.playlist_result(entries, channel_id, channel_title, channel_description)
|
@@ -7,6 +7,8 @@ from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
int_or_none,
|
||||
float_or_none,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
@@ -15,15 +17,14 @@ class OnionStudiosIE(InfoExtractor):
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.onionstudios.com/videos/hannibal-charges-forward-stops-for-a-cocktail-2937',
|
||||
'md5': 'd4851405d31adfadf71cd7a487b765bb',
|
||||
'md5': 'e49f947c105b8a78a675a0ee1bddedfe',
|
||||
'info_dict': {
|
||||
'id': '2937',
|
||||
'ext': 'mp4',
|
||||
'title': 'Hannibal charges forward, stops for a cocktail',
|
||||
'description': 'md5:e786add7f280b7f0fe237b64cc73df76',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'uploader': 'The A.V. Club',
|
||||
'uploader_id': 'TheAVClub',
|
||||
'uploader_id': 'the-av-club',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.onionstudios.com/embed?id=2855&autoplay=true',
|
||||
@@ -40,50 +41,38 @@ class OnionStudiosIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://www.onionstudios.com/embed?id=%s' % video_id, video_id)
|
||||
video_data = self._download_json(
|
||||
'http://www.onionstudios.com/video/%s.json' % video_id, video_id)
|
||||
|
||||
title = video_data['title']
|
||||
|
||||
formats = []
|
||||
for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage):
|
||||
ext = determine_ext(src)
|
||||
for source in video_data.get('sources', []):
|
||||
source_url = source.get('url')
|
||||
if not source_url:
|
||||
continue
|
||||
ext = mimetype2ext(source.get('content_type')) or determine_ext(source_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
height = int_or_none(self._search_regex(
|
||||
r'/(\d+)\.%s' % ext, src, 'height', default=None))
|
||||
tbr = int_or_none(source.get('bitrate'))
|
||||
formats.append({
|
||||
'format_id': ext + ('-%sp' % height if height else ''),
|
||||
'url': src,
|
||||
'height': height,
|
||||
'format_id': ext + ('-%d' % tbr if tbr else ''),
|
||||
'url': source_url,
|
||||
'width': int_or_none(source.get('width')),
|
||||
'tbr': tbr,
|
||||
'ext': ext,
|
||||
'preference': 1,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = self._search_regex(
|
||||
r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
|
||||
webpage, 'title', group='title')
|
||||
description = self._search_regex(
|
||||
r'share_description\s*=\s*(["\'])(?P<description>[^\'"]+?)\1',
|
||||
webpage, 'description', default=None, group='description')
|
||||
thumbnail = self._search_regex(
|
||||
r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',
|
||||
webpage, 'thumbnail', default=False, group='thumbnail')
|
||||
|
||||
uploader_id = self._search_regex(
|
||||
r'twitter_handle\s*=\s*(["\'])(?P<uploader_id>[^\1]+?)\1',
|
||||
webpage, 'uploader id', fatal=False, group='uploader_id')
|
||||
uploader = self._search_regex(
|
||||
r'window\.channelName\s*=\s*(["\'])Embedded:(?P<uploader>[^\1]+?)\1',
|
||||
webpage, 'uploader', default=False, group='uploader')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'uploader': uploader,
|
||||
'uploader_id': uploader_id,
|
||||
'thumbnail': video_data.get('poster_url'),
|
||||
'uploader': video_data.get('channel_name'),
|
||||
'uploader_id': video_data.get('channel_slug'),
|
||||
'duration': float_or_none(video_data.get('duration', 1000)),
|
||||
'tags': video_data.get('tags'),
|
||||
'formats': formats,
|
||||
}
|
||||
|
@@ -33,6 +33,7 @@ class PolskieRadioIE(InfoExtractor):
|
||||
'timestamp': 1456594200,
|
||||
'upload_date': '20160227',
|
||||
'duration': 2364,
|
||||
'thumbnail': 're:^https?://static\.prsa\.pl/images/.*\.jpg$'
|
||||
},
|
||||
}],
|
||||
}, {
|
||||
@@ -68,6 +69,8 @@ class PolskieRadioIE(InfoExtractor):
|
||||
r'(?s)<span[^>]+id="datetime2"[^>]*>(.+?)</span>',
|
||||
webpage, 'timestamp', fatal=False))
|
||||
|
||||
thumbnail_url = self._og_search_thumbnail(webpage)
|
||||
|
||||
entries = []
|
||||
|
||||
media_urls = set()
|
||||
@@ -87,6 +90,7 @@ class PolskieRadioIE(InfoExtractor):
|
||||
'duration': int_or_none(media.get('length')),
|
||||
'vcodec': 'none' if media.get('provider') == 'audio' else None,
|
||||
'timestamp': timestamp,
|
||||
'thumbnail': thumbnail_url
|
||||
})
|
||||
|
||||
title = self._og_search_title(webpage).strip()
|
||||
|
@@ -82,6 +82,10 @@ class PornHubIE(InfoExtractor):
|
||||
# removed by uploader
|
||||
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph572716d15a111',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# private video
|
||||
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph56fd731fce6b7',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.thumbzilla.com/video/ph56c6114abd99a/horny-girlfriend-sex',
|
||||
'only_matching': True,
|
||||
@@ -107,7 +111,7 @@ class PornHubIE(InfoExtractor):
|
||||
webpage = self._download_webpage(req, video_id)
|
||||
|
||||
error_msg = self._html_search_regex(
|
||||
r'(?s)<div[^>]+class=(["\']).*?\bremoved\b.*?\1[^>]*>(?P<error>.+?)</div>',
|
||||
r'(?s)<div[^>]+class=(["\']).*?\b(?:removed|userMessageSection)\b.*?\1[^>]*>(?P<error>.+?)</div>',
|
||||
webpage, 'error message', default=None, group='error')
|
||||
if error_msg:
|
||||
error_msg = re.sub(r'\s+', ' ', error_msg)
|
||||
|
@@ -5,7 +5,7 @@ import re
|
||||
|
||||
from hashlib import sha1
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urllib_parse_urlencode
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
determine_ext,
|
||||
@@ -71,6 +71,7 @@ class ProSiebenSat1IE(InfoExtractor):
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'This video is unavailable',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.sixx.de/stars-style/video/sexy-laufen-in-ugg-boots-clip',
|
||||
@@ -86,6 +87,7 @@ class ProSiebenSat1IE(InfoExtractor):
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'This video is unavailable',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.sat1.de/film/der-ruecktritt/video/im-interview-kai-wiesinger-clip',
|
||||
@@ -101,6 +103,7 @@ class ProSiebenSat1IE(InfoExtractor):
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'This video is unavailable',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.kabeleins.de/tv/rosins-restaurants/videos/jagd-auf-fertigkost-im-elsthal-teil-2-ganze-folge',
|
||||
@@ -116,6 +119,7 @@ class ProSiebenSat1IE(InfoExtractor):
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'This video is unavailable',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.ran.de/fussball/bundesliga/video/schalke-toennies-moechte-raul-zurueck-ganze-folge',
|
||||
@@ -131,6 +135,7 @@ class ProSiebenSat1IE(InfoExtractor):
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'This video is unavailable',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.the-voice-of-germany.de/video/31-andreas-kuemmert-rocket-man-clip',
|
||||
@@ -227,70 +232,42 @@ class ProSiebenSat1IE(InfoExtractor):
|
||||
]
|
||||
|
||||
def _extract_clip(self, url, webpage):
|
||||
clip_id = self._html_search_regex(self._CLIPID_REGEXES, webpage, 'clip id')
|
||||
clip_id = self._html_search_regex(
|
||||
self._CLIPID_REGEXES, webpage, 'clip id')
|
||||
|
||||
access_token = 'prosieben'
|
||||
client_name = 'kolibri-2.0.19-splec4'
|
||||
client_location = url
|
||||
|
||||
videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse_urlencode({
|
||||
'access_token': access_token,
|
||||
'client_location': client_location,
|
||||
'client_name': client_name,
|
||||
'ids': clip_id,
|
||||
})
|
||||
|
||||
video = self._download_json(videos_api_url, clip_id, 'Downloading videos JSON')[0]
|
||||
video = self._download_json(
|
||||
'http://vas.sim-technik.de/vas/live/v2/videos',
|
||||
clip_id, 'Downloading videos JSON', query={
|
||||
'access_token': access_token,
|
||||
'client_location': client_location,
|
||||
'client_name': client_name,
|
||||
'ids': clip_id,
|
||||
})[0]
|
||||
|
||||
if video.get('is_protected') is True:
|
||||
raise ExtractorError('This video is DRM protected.', expected=True)
|
||||
|
||||
duration = float_or_none(video.get('duration'))
|
||||
source_ids = [source['id'] for source in video['sources']]
|
||||
source_ids_str = ','.join(map(str, source_ids))
|
||||
source_ids = [compat_str(source['id']) for source in video['sources']]
|
||||
|
||||
g = '01!8d8F_)r9]4s[qeuXfP%'
|
||||
client_id = g[:2] + sha1(''.join([clip_id, g, access_token, client_location, g, client_name]).encode('utf-8')).hexdigest()
|
||||
|
||||
client_id = g[:2] + sha1(''.join([clip_id, g, access_token, client_location, g, client_name])
|
||||
.encode('utf-8')).hexdigest()
|
||||
|
||||
sources_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources?%s' % (clip_id, compat_urllib_parse_urlencode({
|
||||
'access_token': access_token,
|
||||
'client_id': client_id,
|
||||
'client_location': client_location,
|
||||
'client_name': client_name,
|
||||
}))
|
||||
|
||||
sources = self._download_json(sources_api_url, clip_id, 'Downloading sources JSON')
|
||||
sources = self._download_json(
|
||||
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
|
||||
clip_id, 'Downloading sources JSON', query={
|
||||
'access_token': access_token,
|
||||
'client_id': client_id,
|
||||
'client_location': client_location,
|
||||
'client_name': client_name,
|
||||
})
|
||||
server_id = sources['server_id']
|
||||
|
||||
client_id = g[:2] + sha1(''.join([g, clip_id, access_token, server_id,
|
||||
client_location, source_ids_str, g, client_name])
|
||||
.encode('utf-8')).hexdigest()
|
||||
|
||||
url_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url?%s' % (clip_id, compat_urllib_parse_urlencode({
|
||||
'access_token': access_token,
|
||||
'client_id': client_id,
|
||||
'client_location': client_location,
|
||||
'client_name': client_name,
|
||||
'server_id': server_id,
|
||||
'source_ids': source_ids_str,
|
||||
}))
|
||||
|
||||
urls = self._download_json(url_api_url, clip_id, 'Downloading urls JSON')
|
||||
|
||||
title = self._html_search_regex(self._TITLE_REGEXES, webpage, 'title')
|
||||
description = self._html_search_regex(self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
upload_date = unified_strdate(self._html_search_regex(
|
||||
self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
|
||||
|
||||
formats = []
|
||||
|
||||
urls_sources = urls['sources']
|
||||
if isinstance(urls_sources, dict):
|
||||
urls_sources = urls_sources.values()
|
||||
|
||||
def fix_bitrate(bitrate):
|
||||
bitrate = int_or_none(bitrate)
|
||||
@@ -298,37 +275,73 @@ class ProSiebenSat1IE(InfoExtractor):
|
||||
return None
|
||||
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
|
||||
|
||||
for source in urls_sources:
|
||||
protocol = source['protocol']
|
||||
source_url = source['url']
|
||||
if protocol == 'rtmp' or protocol == 'rtmpe':
|
||||
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
|
||||
if not mobj:
|
||||
formats = []
|
||||
for source_id in source_ids:
|
||||
client_id = g[:2] + sha1(''.join([g, clip_id, access_token, server_id, client_location, source_id, g, client_name]).encode('utf-8')).hexdigest()
|
||||
urls = self._download_json(
|
||||
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
|
||||
clip_id, 'Downloading urls JSON', fatal=False, query={
|
||||
'access_token': access_token,
|
||||
'client_id': client_id,
|
||||
'client_location': client_location,
|
||||
'client_name': client_name,
|
||||
'server_id': server_id,
|
||||
'source_ids': source_id,
|
||||
})
|
||||
if not urls:
|
||||
continue
|
||||
if urls.get('status_code') != 0:
|
||||
raise ExtractorError('This video is unavailable', expected=True)
|
||||
urls_sources = urls['sources']
|
||||
if isinstance(urls_sources, dict):
|
||||
urls_sources = urls_sources.values()
|
||||
for source in urls_sources:
|
||||
source_url = source.get('url')
|
||||
if not source_url:
|
||||
continue
|
||||
path = mobj.group('path')
|
||||
mp4colon_index = path.rfind('mp4:')
|
||||
app = path[:mp4colon_index]
|
||||
play_path = path[mp4colon_index:]
|
||||
formats.append({
|
||||
'url': '%s/%s' % (mobj.group('url'), app),
|
||||
'app': app,
|
||||
'play_path': play_path,
|
||||
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
|
||||
'page_url': 'http://www.prosieben.de',
|
||||
'vbr': fix_bitrate(source['bitrate']),
|
||||
'ext': 'mp4',
|
||||
'format_id': '%s_%s' % (source['cdn'], source['bitrate']),
|
||||
})
|
||||
elif 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(source_url, clip_id))
|
||||
else:
|
||||
formats.append({
|
||||
'url': source_url,
|
||||
'vbr': fix_bitrate(source['bitrate']),
|
||||
})
|
||||
|
||||
protocol = source.get('protocol')
|
||||
mimetype = source.get('mimetype')
|
||||
if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
source_url, clip_id, f4m_id='hds', fatal=False))
|
||||
elif mimetype == 'application/x-mpegURL':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
source_url, clip_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
tbr = fix_bitrate(source['bitrate'])
|
||||
if protocol in ('rtmp', 'rtmpe'):
|
||||
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
|
||||
if not mobj:
|
||||
continue
|
||||
path = mobj.group('path')
|
||||
mp4colon_index = path.rfind('mp4:')
|
||||
app = path[:mp4colon_index]
|
||||
play_path = path[mp4colon_index:]
|
||||
formats.append({
|
||||
'url': '%s/%s' % (mobj.group('url'), app),
|
||||
'app': app,
|
||||
'play_path': play_path,
|
||||
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
|
||||
'page_url': 'http://www.prosieben.de',
|
||||
'tbr': tbr,
|
||||
'ext': 'flv',
|
||||
'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
|
||||
})
|
||||
else:
|
||||
formats.append({
|
||||
'url': source_url,
|
||||
'tbr': tbr,
|
||||
'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
description = self._html_search_regex(
|
||||
self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
upload_date = unified_strdate(self._html_search_regex(
|
||||
self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
|
||||
|
||||
return {
|
||||
'id': clip_id,
|
||||
'title': title,
|
||||
|
@@ -12,6 +12,7 @@ from ..utils import (
|
||||
unified_strdate,
|
||||
xpath_element,
|
||||
ExtractorError,
|
||||
determine_protocol,
|
||||
)
|
||||
|
||||
|
||||
@@ -22,13 +23,13 @@ class RadioCanadaIE(InfoExtractor):
|
||||
'url': 'http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272',
|
||||
'info_dict': {
|
||||
'id': '7184272',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Le parcours du tireur capté sur vidéo',
|
||||
'description': 'Images des caméras de surveillance fournies par la GRC montrant le parcours du tireur d\'Ottawa',
|
||||
'upload_date': '20141023',
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
@@ -36,11 +37,14 @@ class RadioCanadaIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
app_code, video_id = re.match(self._VALID_URL, url).groups()
|
||||
|
||||
device_types = ['ipad', 'android']
|
||||
if app_code != 'toutv':
|
||||
device_types.append('flash')
|
||||
|
||||
formats = []
|
||||
# TODO: extract m3u8 and f4m formats
|
||||
# m3u8 formats can be extracted using ipad device_type return 403 error code when ffmpeg try to download segements
|
||||
# TODO: extract f4m formats
|
||||
# f4m formats can be extracted using flashhd device_type but they produce unplayable file
|
||||
for device_type in ('flash',):
|
||||
for device_type in device_types:
|
||||
v_data = self._download_xml(
|
||||
'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx',
|
||||
video_id, note='Downloading %s XML' % device_type, query={
|
||||
@@ -52,7 +56,7 @@ class RadioCanadaIE(InfoExtractor):
|
||||
# paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction
|
||||
'paysJ391wsHjbOJwvCs26toz': 'CA',
|
||||
'bypasslock': 'NZt5K62gRqfc',
|
||||
})
|
||||
}, fatal=False)
|
||||
v_url = xpath_text(v_data, 'url')
|
||||
if not v_url:
|
||||
continue
|
||||
@@ -64,7 +68,8 @@ class RadioCanadaIE(InfoExtractor):
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
v_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||
elif ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(v_url, video_id, f4m_id='hds', fatal=False))
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
v_url, video_id, f4m_id='hds', fatal=False))
|
||||
else:
|
||||
ext = determine_ext(v_url)
|
||||
bitrates = xpath_element(v_data, 'bitrates')
|
||||
@@ -72,15 +77,28 @@ class RadioCanadaIE(InfoExtractor):
|
||||
tbr = int_or_none(url_e.get('bitrate'))
|
||||
if not tbr:
|
||||
continue
|
||||
f_url = re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url)
|
||||
protocol = determine_protocol({'url': f_url})
|
||||
formats.append({
|
||||
'format_id': 'rtmp-%d' % tbr,
|
||||
'url': re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url),
|
||||
'ext': 'flv',
|
||||
'protocol': 'rtmp',
|
||||
'format_id': '%s-%d' % (protocol, tbr),
|
||||
'url': f_url,
|
||||
'ext': 'flv' if protocol == 'rtmp' else ext,
|
||||
'protocol': protocol,
|
||||
'width': int_or_none(url_e.get('width')),
|
||||
'height': int_or_none(url_e.get('height')),
|
||||
'tbr': tbr,
|
||||
})
|
||||
if protocol == 'rtsp':
|
||||
base_url = self._search_regex(
|
||||
r'rtsp://([^?]+)', f_url, 'base url', default=None)
|
||||
if base_url:
|
||||
base_url = 'http://' + base_url
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
base_url + '/playlist.m3u8', video_id, 'mp4',
|
||||
'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
base_url + '/manifest.f4m', video_id,
|
||||
f4m_id='hds', fatal=False))
|
||||
self._sort_formats(formats)
|
||||
|
||||
metadata = self._download_xml(
|
||||
@@ -115,13 +133,13 @@ class RadioCanadaAudioVideoIE(InfoExtractor):
|
||||
'url': 'http://ici.radio-canada.ca/audio-video/media-7527184/barack-obama-au-vietnam',
|
||||
'info_dict': {
|
||||
'id': '7527184',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Barack Obama au Vietnam',
|
||||
'description': 'Les États-Unis lèvent l\'embargo sur la vente d\'armes qui datait de la guerre du Vietnam',
|
||||
'upload_date': '20160523',
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
@@ -20,17 +20,12 @@ class RaiBaseIE(InfoExtractor):
|
||||
formats = []
|
||||
|
||||
for platform in ('mon', 'flash', 'native'):
|
||||
headers = {}
|
||||
# TODO: rename --cn-verification-proxy
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
headers['Ytdl-request-proxy'] = cn_verification_proxy
|
||||
|
||||
relinker = self._download_xml(
|
||||
relinker_url, video_id,
|
||||
note='Downloading XML metadata for platform %s' % platform,
|
||||
transform_source=fix_xml_ampersands,
|
||||
query={'output': 45, 'pl': platform}, headers=headers)
|
||||
query={'output': 45, 'pl': platform},
|
||||
headers=self.geo_verification_headers())
|
||||
|
||||
media_url = find_xpath_attr(relinker, './url', 'type', 'content').text
|
||||
if media_url == 'http://download.rai.it/video_no_available.mp4':
|
||||
|
@@ -9,7 +9,7 @@ class RTVNHIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?rtvnh\.nl/video/(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.rtvnh.nl/video/131946',
|
||||
'md5': '6e1d0ab079e2a00b6161442d3ceacfc1',
|
||||
'md5': 'cdbec9f44550763c8afc96050fa747dc',
|
||||
'info_dict': {
|
||||
'id': '131946',
|
||||
'ext': 'mp4',
|
||||
@@ -29,15 +29,29 @@ class RTVNHIE(InfoExtractor):
|
||||
raise ExtractorError(
|
||||
'%s returned error code %d' % (self.IE_NAME, status), expected=True)
|
||||
|
||||
formats = self._extract_smil_formats(
|
||||
'http://www.rtvnh.nl/video/smil?m=' + video_id, video_id, fatal=False)
|
||||
formats = []
|
||||
rtmp_formats = self._extract_smil_formats(
|
||||
'http://www.rtvnh.nl/video/smil?m=' + video_id, video_id)
|
||||
formats.extend(rtmp_formats)
|
||||
|
||||
for item in meta['source']['fb']:
|
||||
if item.get('type') == 'hls':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
item['file'], video_id, ext='mp4', entry_protocol='m3u8_native'))
|
||||
elif item.get('type') == '':
|
||||
formats.append({'url': item['file']})
|
||||
for rtmp_format in rtmp_formats:
|
||||
rtmp_url = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
|
||||
rtsp_format = rtmp_format.copy()
|
||||
del rtsp_format['play_path']
|
||||
del rtsp_format['ext']
|
||||
rtsp_format.update({
|
||||
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
|
||||
'url': rtmp_url.replace('rtmp://', 'rtsp://'),
|
||||
'protocol': 'rtsp',
|
||||
})
|
||||
formats.append(rtsp_format)
|
||||
http_base_url = rtmp_url.replace('rtmp://', 'http://')
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
http_base_url + '/playlist.m3u8', video_id, 'mp4',
|
||||
'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
http_base_url + '/manifest.f4m',
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
|
@@ -1,18 +1,12 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import itertools
|
||||
import json
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
mimetype2ext,
|
||||
sanitized_Request,
|
||||
unified_strdate,
|
||||
)
|
||||
|
||||
|
||||
@@ -27,7 +21,8 @@ class SandiaIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'Xyce Software Training - Section 1',
|
||||
'description': 're:(?s)SAND Number: SAND 2013-7800.{200,}',
|
||||
'upload_date': '20120904',
|
||||
'upload_date': '20120409',
|
||||
'timestamp': 1333983600,
|
||||
'duration': 7794,
|
||||
}
|
||||
}
|
||||
@@ -35,81 +30,36 @@ class SandiaIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
req = sanitized_Request(url)
|
||||
req.add_header('Cookie', 'MediasitePlayerCaps=ClientPlugins=4')
|
||||
webpage = self._download_webpage(req, video_id)
|
||||
presentation_data = self._download_json(
|
||||
'http://digitalops.sandia.gov/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
|
||||
video_id, data=json.dumps({
|
||||
'getPlayerOptionsRequest': {
|
||||
'ResourceId': video_id,
|
||||
'QueryString': '',
|
||||
}
|
||||
}), headers={
|
||||
'Content-Type': 'application/json; charset=utf-8',
|
||||
})['d']['Presentation']
|
||||
|
||||
js_path = self._search_regex(
|
||||
r'<script type="text/javascript" src="(/Mediasite/FileServer/Presentation/[^"]+)"',
|
||||
webpage, 'JS code URL')
|
||||
js_url = compat_urlparse.urljoin(url, js_path)
|
||||
|
||||
js_code = self._download_webpage(
|
||||
js_url, video_id, note='Downloading player')
|
||||
|
||||
def extract_str(key, **args):
|
||||
return self._search_regex(
|
||||
r'Mediasite\.PlaybackManifest\.%s\s*=\s*(.+);\s*?\n' % re.escape(key),
|
||||
js_code, key, **args)
|
||||
|
||||
def extract_data(key, **args):
|
||||
data_json = extract_str(key, **args)
|
||||
if data_json is None:
|
||||
return data_json
|
||||
return self._parse_json(
|
||||
data_json, video_id, transform_source=js_to_json)
|
||||
title = presentation_data['Title']
|
||||
|
||||
formats = []
|
||||
for i in itertools.count():
|
||||
fd = extract_data('VideoUrls[%d]' % i, default=None)
|
||||
if fd is None:
|
||||
break
|
||||
formats.append({
|
||||
'format_id': '%s' % i,
|
||||
'format_note': fd['MimeType'].partition('/')[2],
|
||||
'ext': mimetype2ext(fd['MimeType']),
|
||||
'url': fd['Location'],
|
||||
'protocol': 'f4m' if fd['MimeType'] == 'video/x-mp4-fragmented' else None,
|
||||
})
|
||||
for stream in presentation_data.get('Streams', []):
|
||||
for fd in stream.get('VideoUrls', []):
|
||||
formats.append({
|
||||
'format_id': fd['MediaType'],
|
||||
'format_note': fd['MimeType'].partition('/')[2],
|
||||
'ext': mimetype2ext(fd['MimeType']),
|
||||
'url': fd['Location'],
|
||||
'protocol': 'f4m' if fd['MimeType'] == 'video/x-mp4-fragmented' else None,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
slide_baseurl = compat_urlparse.urljoin(
|
||||
url, extract_data('SlideBaseUrl'))
|
||||
slide_template = slide_baseurl + re.sub(
|
||||
r'\{0:D?([0-9+])\}', r'%0\1d', extract_data('SlideImageFileNameTemplate'))
|
||||
slides = []
|
||||
last_slide_time = 0
|
||||
for i in itertools.count(1):
|
||||
sd = extract_str('Slides[%d]' % i, default=None)
|
||||
if sd is None:
|
||||
break
|
||||
timestamp = int_or_none(self._search_regex(
|
||||
r'^Mediasite\.PlaybackManifest\.CreateSlide\("[^"]*"\s*,\s*([0-9]+),',
|
||||
sd, 'slide %s timestamp' % i, fatal=False))
|
||||
slides.append({
|
||||
'url': slide_template % i,
|
||||
'duration': timestamp - last_slide_time,
|
||||
})
|
||||
last_slide_time = timestamp
|
||||
formats.append({
|
||||
'format_id': 'slides',
|
||||
'protocol': 'slideshow',
|
||||
'url': json.dumps(slides),
|
||||
'preference': -10000, # Downloader not yet written
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = extract_data('Title')
|
||||
description = extract_data('Description', fatal=False)
|
||||
duration = int_or_none(extract_data(
|
||||
'Duration', fatal=False), scale=1000)
|
||||
upload_date = unified_strdate(extract_data('AirDate', fatal=False))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'description': presentation_data.get('Description'),
|
||||
'formats': formats,
|
||||
'upload_date': upload_date,
|
||||
'duration': duration,
|
||||
'timestamp': int_or_none(presentation_data.get('UnixTime'), 1000),
|
||||
'duration': int_or_none(presentation_data.get('Duration'), 1000),
|
||||
}
|
||||
|
@@ -5,6 +5,8 @@ from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
qualities,
|
||||
int_or_none,
|
||||
mimetype2ext,
|
||||
determine_ext,
|
||||
)
|
||||
|
||||
|
||||
@@ -34,19 +36,21 @@ class SixPlayIE(InfoExtractor):
|
||||
source_type, source_url = source.get('type'), source.get('src')
|
||||
if not source_url or source_type == 'hls/primetime':
|
||||
continue
|
||||
if source_type == 'application/vnd.apple.mpegURL':
|
||||
ext = mimetype2ext(source_type) or determine_ext(source_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
source_url, video_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
source_url.replace('.m3u8', '.f4m'),
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
elif source_type == 'video/mp4':
|
||||
elif ext == 'mp4':
|
||||
quality = source.get('quality')
|
||||
formats.append({
|
||||
'url': source_url,
|
||||
'format_id': quality,
|
||||
'quality': quality_key(quality),
|
||||
'ext': ext,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
|
@@ -9,6 +9,7 @@ from ..compat import (
|
||||
)
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
get_element_by_id,
|
||||
)
|
||||
|
||||
|
||||
@@ -40,7 +41,7 @@ class SlideshareIE(InfoExtractor):
|
||||
bucket = info['jsplayer']['video_bucket']
|
||||
ext = info['jsplayer']['video_extension']
|
||||
video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
|
||||
description = self._html_search_regex(
|
||||
description = get_element_by_id('slideshow-description-paragraph', webpage) or self._html_search_regex(
|
||||
r'(?s)<p[^>]+itemprop="description"[^>]*>(.+?)</p>', webpage,
|
||||
'description', fatal=False)
|
||||
|
||||
@@ -51,5 +52,5 @@ class SlideshareIE(InfoExtractor):
|
||||
'ext': ext,
|
||||
'url': video_url,
|
||||
'thumbnail': info['slideshow']['pin_image_url'],
|
||||
'description': description,
|
||||
'description': description.strip() if description else None,
|
||||
}
|
||||
|
@@ -8,10 +8,7 @@ from ..compat import (
|
||||
compat_str,
|
||||
compat_urllib_parse_urlencode,
|
||||
)
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
sanitized_Request,
|
||||
)
|
||||
from ..utils import ExtractorError
|
||||
|
||||
|
||||
class SohuIE(InfoExtractor):
|
||||
@@ -96,15 +93,10 @@ class SohuIE(InfoExtractor):
|
||||
else:
|
||||
base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid='
|
||||
|
||||
req = sanitized_Request(base_data_url + vid_id)
|
||||
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
req.add_header('Ytdl-request-proxy', cn_verification_proxy)
|
||||
|
||||
return self._download_json(
|
||||
req, video_id,
|
||||
'Downloading JSON data for %s' % vid_id)
|
||||
base_data_url + vid_id, video_id,
|
||||
'Downloading JSON data for %s' % vid_id,
|
||||
headers=self.geo_verification_headers())
|
||||
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
|
@@ -4,8 +4,13 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from .spiegeltv import SpiegeltvIE
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
extract_attributes,
|
||||
unified_strdate,
|
||||
get_element_by_attribute,
|
||||
)
|
||||
|
||||
|
||||
class SpiegelIE(InfoExtractor):
|
||||
@@ -19,6 +24,7 @@ class SpiegelIE(InfoExtractor):
|
||||
'title': 'Vulkanausbruch in Ecuador: Der "Feuerschlund" ist wieder aktiv',
|
||||
'description': 'md5:8029d8310232196eb235d27575a8b9f4',
|
||||
'duration': 49,
|
||||
'upload_date': '20130311',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.spiegel.de/video/schach-wm-videoanalyse-des-fuenften-spiels-video-1309159.html',
|
||||
@@ -29,6 +35,7 @@ class SpiegelIE(InfoExtractor):
|
||||
'title': 'Schach-WM in der Videoanalyse: Carlsen nutzt die Fehlgriffe des Titelverteidigers',
|
||||
'description': 'md5:c2322b65e58f385a820c10fa03b2d088',
|
||||
'duration': 983,
|
||||
'upload_date': '20131115',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-embed.html',
|
||||
@@ -38,6 +45,7 @@ class SpiegelIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'description': 'SPIEGEL ONLINE-Nutzer durften den deutschen Astronauten Alexander Gerst über sein Leben auf der ISS-Station befragen. Hier kommen seine Antworten auf die besten sechs Fragen.',
|
||||
'title': 'Fragen an Astronaut Alexander Gerst: "Bekommen Sie die Tageszeiten mit?"',
|
||||
'upload_date': '20140904',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
|
||||
@@ -52,10 +60,10 @@ class SpiegelIE(InfoExtractor):
|
||||
if SpiegeltvIE.suitable(handle.geturl()):
|
||||
return self.url_result(handle.geturl(), 'Spiegeltv')
|
||||
|
||||
title = re.sub(r'\s+', ' ', self._html_search_regex(
|
||||
r'(?s)<(?:h1|div) class="module-title"[^>]*>(.*?)</(?:h1|div)>',
|
||||
webpage, 'title'))
|
||||
description = self._html_search_meta('description', webpage, 'description')
|
||||
video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default=''))
|
||||
|
||||
title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage)
|
||||
description = video_data.get('data-video-teaser') or self._html_search_meta('description', webpage, 'description')
|
||||
|
||||
base_url = self._search_regex(
|
||||
[r'server\s*:\s*(["\'])(?P<url>.+?)\1', r'var\s+server\s*=\s*"(?P<url>[^"]+)\"'],
|
||||
@@ -87,8 +95,9 @@ class SpiegelIE(InfoExtractor):
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'description': description.strip() if description else None,
|
||||
'duration': duration,
|
||||
'upload_date': unified_strdate(video_data.get('data-video-date')),
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
@@ -104,6 +113,7 @@ class SpiegelArticleIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'Faszination Badminton: Nennt es bloß nicht Federball',
|
||||
'description': 're:^Patrick Kämnitz gehört.{100,}',
|
||||
'upload_date': '20140825',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.spiegel.de/wissenschaft/weltall/astronaut-alexander-gerst-antwortet-spiegel-online-lesern-a-989876.html',
|
||||
|
@@ -11,7 +11,7 @@ from ..utils import (
|
||||
class SRMediathekIE(ARDMediathekIE):
|
||||
IE_NAME = 'sr:mediathek'
|
||||
IE_DESC = 'Saarländischer Rundfunk'
|
||||
_VALID_URL = r'https?://sr-mediathek\.sr-online\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
|
||||
_VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://sr-mediathek.sr-online.de/index.php?seite=7&id=28455',
|
||||
@@ -35,7 +35,9 @@ class SRMediathekIE(ARDMediathekIE):
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['Unable to download f4m manifest']
|
||||
}, {
|
||||
'url': 'http://sr-mediathek.de/index.php?seite=7&id=7480',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@@ -56,7 +56,7 @@ class StitcherIE(InfoExtractor):
|
||||
|
||||
episode = self._parse_json(
|
||||
js_to_json(self._search_regex(
|
||||
r'(?s)var\s+stitcher\s*=\s*({.+?});\n', webpage, 'episode config')),
|
||||
r'(?s)var\s+stitcher(?:Config)?\s*=\s*({.+?});\n', webpage, 'episode config')),
|
||||
display_id)['config']['episode']
|
||||
|
||||
title = unescapeHTML(episode['title'])
|
||||
|
@@ -1,50 +1,41 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urllib_parse_unquote,
|
||||
compat_urllib_parse_urlencode,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
get_element_by_attribute,
|
||||
parse_duration,
|
||||
strip_jsonp,
|
||||
)
|
||||
from .mitele import MiTeleBaseIE
|
||||
|
||||
|
||||
class TelecincoIE(InfoExtractor):
|
||||
class TelecincoIE(MiTeleBaseIE):
|
||||
IE_DESC = 'telecinco.es, cuatro.com and mediaset.es'
|
||||
_VALID_URL = r'https?://www\.(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.telecinco.es/robinfood/temporada-01/t01xp14/Bacalao-cocochas-pil-pil_0_1876350223.html',
|
||||
'md5': '5cbef3ad5ef17bf0d21570332d140729',
|
||||
'md5': '8d7b2d5f699ee2709d992a63d5cd1712',
|
||||
'info_dict': {
|
||||
'id': 'MDSVID20141015_0058',
|
||||
'id': 'JEA5ijCnF6p5W08A1rNKn7',
|
||||
'ext': 'mp4',
|
||||
'title': 'Con Martín Berasategui, hacer un bacalao al ...',
|
||||
'title': 'Bacalao con kokotxas al pil-pil',
|
||||
'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
|
||||
'duration': 662,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html',
|
||||
'md5': '0a5b9f3cc8b074f50a0578f823a12694',
|
||||
'md5': '284393e5387b3b947b77c613ef04749a',
|
||||
'info_dict': {
|
||||
'id': 'MDSVID20150916_0128',
|
||||
'id': 'jn24Od1zGLG4XUZcnUnZB6',
|
||||
'ext': 'mp4',
|
||||
'title': '¿Quién es este ex futbolista con el que hablan ...',
|
||||
'title': '¿Quién es este ex futbolista con el que hablan Leo Messi y Luis Suárez?',
|
||||
'description': 'md5:a62ecb5f1934fc787107d7b9a2262805',
|
||||
'duration': 79,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html',
|
||||
'md5': 'ad1bfaaba922dd4a295724b05b68f86a',
|
||||
'md5': '749afab6ea5a136a8806855166ae46a2',
|
||||
'info_dict': {
|
||||
'id': 'MDSVID20150513_0220',
|
||||
'id': 'aywerkD2Sv1vGNqq9b85Q2',
|
||||
'ext': 'mp4',
|
||||
'title': '#DOYLACARA. Con la trata no hay trato',
|
||||
'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
|
||||
'duration': 50,
|
||||
},
|
||||
}, {
|
||||
@@ -56,40 +47,16 @@ class TelecincoIE(InfoExtractor):
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
episode = self._match_id(url)
|
||||
webpage = self._download_webpage(url, episode)
|
||||
embed_data_json = self._search_regex(
|
||||
r'(?s)MSV\.embedData\[.*?\]\s*=\s*({.*?});', webpage, 'embed data',
|
||||
).replace('\'', '"')
|
||||
embed_data = json.loads(embed_data_json)
|
||||
|
||||
domain = embed_data['mediaUrl']
|
||||
if not domain.startswith('http'):
|
||||
# only happens in telecinco.es videos
|
||||
domain = 'http://' + domain
|
||||
info_url = compat_urlparse.urljoin(
|
||||
domain,
|
||||
compat_urllib_parse_unquote(embed_data['flashvars']['host'])
|
||||
)
|
||||
info_el = self._download_xml(info_url, episode).find('./video/info')
|
||||
|
||||
video_link = info_el.find('videoUrl/link').text
|
||||
token_query = compat_urllib_parse_urlencode({'id': video_link})
|
||||
token_info = self._download_json(
|
||||
embed_data['flashvars']['ov_tk'] + '?' + token_query,
|
||||
episode,
|
||||
transform_source=strip_jsonp
|
||||
)
|
||||
formats = self._extract_m3u8_formats(
|
||||
token_info['tokenizedUrl'], episode, ext='mp4', entry_protocol='m3u8_native')
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': embed_data['videoId'],
|
||||
'display_id': episode,
|
||||
'title': info_el.find('title').text,
|
||||
'formats': formats,
|
||||
'description': get_element_by_attribute('class', 'text', webpage),
|
||||
'thumbnail': info_el.find('thumb').text,
|
||||
'duration': parse_duration(info_el.find('duration').text),
|
||||
}
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
title = self._html_search_meta(
|
||||
['og:title', 'twitter:title'], webpage, 'title')
|
||||
info = self._get_player_info(url, webpage)
|
||||
info.update({
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': self._html_search_meta(
|
||||
['og:description', 'twitter:description'],
|
||||
webpage, 'title', fatal=False),
|
||||
})
|
||||
return info
|
||||
|
@@ -92,12 +92,11 @@ class ThreeQSDNIE(InfoExtractor):
|
||||
if not item_url or item_url in urls:
|
||||
return
|
||||
urls.add(item_url)
|
||||
type_ = item.get('type')
|
||||
ext = determine_ext(item_url, default_ext=None)
|
||||
if type_ == 'application/dash+xml' or ext == 'mpd':
|
||||
ext = mimetype2ext(item.get('type')) or determine_ext(item_url, default_ext=None)
|
||||
if ext == 'mpd':
|
||||
formats.extend(self._extract_mpd_formats(
|
||||
item_url, video_id, mpd_id='mpd', fatal=False))
|
||||
elif type_ in ('application/vnd.apple.mpegURL', 'application/x-mpegurl') or ext == 'm3u8':
|
||||
elif ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
item_url, video_id, 'mp4',
|
||||
entry_protocol='m3u8' if live else 'm3u8_native',
|
||||
@@ -111,7 +110,7 @@ class ThreeQSDNIE(InfoExtractor):
|
||||
formats.append({
|
||||
'url': item_url,
|
||||
'format_id': item.get('quality'),
|
||||
'ext': 'mp4' if item_url.startswith('rtsp') else mimetype2ext(type_) or ext,
|
||||
'ext': 'mp4' if item_url.startswith('rtsp') else ext,
|
||||
'vcodec': 'none' if stream_type == 'audio' else None,
|
||||
})
|
||||
|
||||
|
@@ -1,74 +1,41 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
unified_strdate,
|
||||
)
|
||||
from ..utils import int_or_none
|
||||
|
||||
|
||||
class TouTvIE(InfoExtractor):
|
||||
IE_NAME = 'tou.tv'
|
||||
_VALID_URL = r'https?://www\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+(?:/(?P<episode>S[0-9]+E[0-9]+)))'
|
||||
_VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+/S[0-9]+E[0-9]+)'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.tou.tv/30-vies/S04E41',
|
||||
'url': 'http://ici.tou.tv/garfield-tout-court/S2015E17',
|
||||
'info_dict': {
|
||||
'id': '30-vies_S04E41',
|
||||
'id': '122017',
|
||||
'ext': 'mp4',
|
||||
'title': '30 vies Saison 4 / Épisode 41',
|
||||
'description': 'md5:da363002db82ccbe4dafeb9cab039b09',
|
||||
'age_limit': 8,
|
||||
'uploader': 'Groupe des Nouveaux Médias',
|
||||
'duration': 1296,
|
||||
'upload_date': '20131118',
|
||||
'thumbnail': 'http://static.tou.tv/medias/images/2013-11-18_19_00_00_30VIES_0341_01_L.jpeg',
|
||||
'title': 'Saison 2015 Épisode 17',
|
||||
'description': 'La photo de famille 2',
|
||||
'upload_date': '20100717',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True, # Requires rtmpdump
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Only available in Canada'
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
mediaId = self._search_regex(
|
||||
r'"idMedia":\s*"([^"]+)"', webpage, 'media ID')
|
||||
|
||||
streams_url = 'http://release.theplatform.com/content.select?pid=' + mediaId
|
||||
streams_doc = self._download_xml(
|
||||
streams_url, video_id, note='Downloading stream list')
|
||||
|
||||
video_url = next(n.text
|
||||
for n in streams_doc.findall('.//choice/url')
|
||||
if '//ad.doubleclick' not in n.text)
|
||||
if video_url.endswith('/Unavailable.flv'):
|
||||
raise ExtractorError(
|
||||
'Access to this video is blocked from outside of Canada',
|
||||
expected=True)
|
||||
|
||||
duration_str = self._html_search_meta(
|
||||
'video:duration', webpage, 'duration')
|
||||
duration = int(duration_str) if duration_str else None
|
||||
upload_date_str = self._html_search_meta(
|
||||
'video:release_date', webpage, 'upload date')
|
||||
upload_date = unified_strdate(upload_date_str) if upload_date_str else None
|
||||
path = self._match_id(url)
|
||||
metadata = self._download_json('http://ici.tou.tv/presentation/%s' % path, path)
|
||||
video_id = metadata['IdMedia']
|
||||
details = metadata['Details']
|
||||
title = details['OriginalTitle']
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'url': 'radiocanada:%s:%s' % (metadata.get('AppCode', 'toutv'), video_id),
|
||||
'id': video_id,
|
||||
'title': self._og_search_title(webpage),
|
||||
'url': video_url,
|
||||
'description': self._og_search_description(webpage),
|
||||
'uploader': self._dc_search_uploader(webpage),
|
||||
'thumbnail': self._og_search_thumbnail(webpage),
|
||||
'age_limit': self._media_rating_search(webpage),
|
||||
'duration': duration,
|
||||
'upload_date': upload_date,
|
||||
'ext': 'mp4',
|
||||
'title': title,
|
||||
'thumbnail': details.get('ImageUrl'),
|
||||
'duration': int_or_none(details.get('LengthInSeconds')),
|
||||
}
|
||||
|
@@ -1,25 +1,62 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
determine_ext,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
class TweakersIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://tweakers\.net/video/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'https://tweakers.net/video/9926/new-nintendo-3ds-xl-op-alle-fronten-beter.html',
|
||||
'md5': '3147e4ddad366f97476a93863e4557c8',
|
||||
'md5': 'fe73e417c093a788e0160c4025f88b15',
|
||||
'info_dict': {
|
||||
'id': '9926',
|
||||
'ext': 'mp4',
|
||||
'title': 'New Nintendo 3DS XL - Op alle fronten beter',
|
||||
'description': 'md5:f97324cc71e86e11c853f0763820e3ba',
|
||||
'description': 'md5:3789b21fed9c0219e9bcaacd43fab280',
|
||||
'thumbnail': 're:^https?://.*\.jpe?g$',
|
||||
'duration': 386,
|
||||
'uploader_id': 's7JeEm',
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
entries = self._extract_xspf_playlist(
|
||||
'https://tweakers.net/video/s1playlist/%s/playlist.xspf' % playlist_id, playlist_id)
|
||||
return self.playlist_result(entries, playlist_id)
|
||||
video_id = self._match_id(url)
|
||||
video_data = self._download_json(
|
||||
'https://tweakers.net/video/s1playlist/%s/1920/1080/playlist.json' % video_id,
|
||||
video_id)['items'][0]
|
||||
|
||||
title = video_data['title']
|
||||
|
||||
formats = []
|
||||
for location in video_data.get('locations', {}).get('progressive', []):
|
||||
format_id = location.get('label')
|
||||
width = int_or_none(location.get('width'))
|
||||
height = int_or_none(location.get('height'))
|
||||
for source in location.get('sources', []):
|
||||
source_url = source.get('src')
|
||||
if not source_url:
|
||||
continue
|
||||
ext = mimetype2ext(source.get('type')) or determine_ext(source_url)
|
||||
formats.append({
|
||||
'format_id': format_id,
|
||||
'url': source_url,
|
||||
'width': width,
|
||||
'height': height,
|
||||
'ext': ext,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': video_data.get('description'),
|
||||
'thumbnail': video_data.get('poster'),
|
||||
'duration': int_or_none(video_data.get('duration')),
|
||||
'uploader_id': video_data.get('account'),
|
||||
'formats': formats,
|
||||
}
|
||||
|
@@ -67,6 +67,20 @@ class XuiteIE(InfoExtractor):
|
||||
'categories': ['電玩動漫'],
|
||||
},
|
||||
'skip': 'Video removed',
|
||||
}, {
|
||||
# Video with encoded media id
|
||||
# from http://forgetfulbc.blogspot.com/2016/06/date.html
|
||||
'url': 'http://vlog.xuite.net/embed/cE1xbENoLTI3NDQ3MzM2LmZsdg==?ar=0&as=0',
|
||||
'info_dict': {
|
||||
'id': 'cE1xbENoLTI3NDQ3MzM2LmZsdg==',
|
||||
'ext': 'mp4',
|
||||
'title': '男女平權只是口號?專家解釋約會時男生是否該幫女生付錢 (中字)',
|
||||
'description': 'md5:f0abdcb69df300f522a5442ef3146f2a',
|
||||
'timestamp': 1466160960,
|
||||
'upload_date': '20160617',
|
||||
'uploader': 'B.C. & Lowy',
|
||||
'uploader_id': '232279340',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://vlog.xuite.net/play/S1dDUjdyLTMyOTc3NjcuZmx2/%E5%AD%AB%E7%87%95%E5%A7%BF-%E7%9C%BC%E6%B7%9A%E6%88%90%E8%A9%A9',
|
||||
'only_matching': True,
|
||||
@@ -80,10 +94,9 @@ class XuiteIE(InfoExtractor):
|
||||
def base64_encode_utf8(data):
|
||||
return base64.b64encode(data.encode('utf-8')).decode('utf-8')
|
||||
|
||||
def _extract_flv_config(self, media_id):
|
||||
base64_media_id = self.base64_encode_utf8(media_id)
|
||||
def _extract_flv_config(self, encoded_media_id):
|
||||
flv_config = self._download_xml(
|
||||
'http://vlog.xuite.net/flash/player?media=%s' % base64_media_id,
|
||||
'http://vlog.xuite.net/flash/player?media=%s' % encoded_media_id,
|
||||
'flv config')
|
||||
prop_dict = {}
|
||||
for prop in flv_config.findall('./property'):
|
||||
@@ -108,9 +121,14 @@ class XuiteIE(InfoExtractor):
|
||||
'%s returned error: %s' % (self.IE_NAME, error_msg),
|
||||
expected=True)
|
||||
|
||||
video_id = self._html_search_regex(
|
||||
r'data-mediaid="(\d+)"', webpage, 'media id')
|
||||
flv_config = self._extract_flv_config(video_id)
|
||||
encoded_media_id = self._search_regex(
|
||||
r'attributes\.name\s*=\s*"([^"]+)"', webpage,
|
||||
'encoded media id', default=None)
|
||||
if encoded_media_id is None:
|
||||
video_id = self._html_search_regex(
|
||||
r'data-mediaid="(\d+)"', webpage, 'media id')
|
||||
encoded_media_id = self.base64_encode_utf8(video_id)
|
||||
flv_config = self._extract_flv_config(encoded_media_id)
|
||||
|
||||
FORMATS = {
|
||||
'audio': 'mp3',
|
||||
|
@@ -19,6 +19,7 @@ from ..utils import (
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
from .brightcove import BrightcoveNewIE
|
||||
from .nbc import NBCSportsVPlayerIE
|
||||
|
||||
|
||||
@@ -227,7 +228,12 @@ class YahooIE(InfoExtractor):
|
||||
# Look for NBCSports iframes
|
||||
nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
|
||||
if nbc_sports_url:
|
||||
return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
|
||||
return self.url_result(nbc_sports_url, NBCSportsVPlayerIE.ie_key())
|
||||
|
||||
# Look for Brightcove New Studio embeds
|
||||
bc_url = BrightcoveNewIE._extract_url(webpage)
|
||||
if bc_url:
|
||||
return self.url_result(bc_url, BrightcoveNewIE.ie_key())
|
||||
|
||||
# Query result is often embedded in webpage as JSON. Sometimes explicit requests
|
||||
# to video API results in a failure with geo restriction reason therefore using
|
||||
|
@@ -16,7 +16,6 @@ from ..compat import (
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
get_element_by_attribute,
|
||||
sanitized_Request,
|
||||
)
|
||||
|
||||
|
||||
@@ -218,14 +217,10 @@ class YoukuIE(InfoExtractor):
|
||||
headers = {
|
||||
'Referer': req_url,
|
||||
}
|
||||
headers.update(self.geo_verification_headers())
|
||||
self._set_cookie('youku.com', 'xreferrer', 'http://www.youku.com')
|
||||
req = sanitized_Request(req_url, headers=headers)
|
||||
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
req.add_header('Ytdl-request-proxy', cn_verification_proxy)
|
||||
|
||||
raw_data = self._download_json(req, video_id, note=note)
|
||||
raw_data = self._download_json(req_url, video_id, note=note, headers=headers)
|
||||
|
||||
return raw_data['data']
|
||||
|
||||
|
@@ -1730,6 +1730,39 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
}
|
||||
|
||||
|
||||
class YoutubeSharedVideoIE(InfoExtractor):
|
||||
_VALID_URL = r'(?:https?:)?//(?:www\.)?youtube\.com/shared\?ci=(?P<id>[0-9A-Za-z_-]{11})'
|
||||
IE_NAME = 'youtube:shared'
|
||||
|
||||
_TEST = {
|
||||
'url': 'https://www.youtube.com/shared?ci=1nEzmT-M4fU',
|
||||
'info_dict': {
|
||||
'id': 'uPDB5I9wfp8',
|
||||
'ext': 'webm',
|
||||
'title': 'Pocoyo: 90 minutos de episódios completos Português para crianças - PARTE 3',
|
||||
'description': 'md5:d9e4d9346a2dfff4c7dc4c8cec0f546d',
|
||||
'upload_date': '20160219',
|
||||
'uploader': 'Pocoyo - Português (BR)',
|
||||
'uploader_id': 'PocoyoBrazil',
|
||||
},
|
||||
'add_ie': ['Youtube'],
|
||||
'params': {
|
||||
# There are already too many Youtube downloads
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
real_video_id = self._html_search_meta(
|
||||
'videoId', webpage, 'YouTube video id', fatal=True)
|
||||
|
||||
return self.url_result(real_video_id, YoutubeIE.ie_key())
|
||||
|
||||
|
||||
class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
|
||||
IE_DESC = 'YouTube.com playlists'
|
||||
_VALID_URL = r"""(?x)(?:
|
||||
@@ -1945,10 +1978,13 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
|
||||
return (False if YoutubePlaylistsIE.suitable(url) or YoutubeLiveIE.suitable(url)
|
||||
else super(YoutubeChannelIE, cls).suitable(url))
|
||||
|
||||
def _build_template_url(self, url, channel_id):
|
||||
return self._TEMPLATE_URL % channel_id
|
||||
|
||||
def _real_extract(self, url):
|
||||
channel_id = self._match_id(url)
|
||||
|
||||
url = self._TEMPLATE_URL % channel_id
|
||||
url = self._build_template_url(url, channel_id)
|
||||
|
||||
# Channel by page listing is restricted to 35 pages of 30 items, i.e. 1050 videos total (see #5778)
|
||||
# Workaround by extracting as a playlist if managed to obtain channel playlist URL
|
||||
@@ -1962,9 +1998,13 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
|
||||
channel_playlist_id = self._html_search_meta(
|
||||
'channelId', channel_page, 'channel id', default=None)
|
||||
if not channel_playlist_id:
|
||||
channel_playlist_id = self._search_regex(
|
||||
r'data-(?:channel-external-|yt)id="([^"]+)"',
|
||||
channel_page, 'channel id', default=None)
|
||||
channel_url = self._html_search_meta(
|
||||
('al:ios:url', 'twitter:app:url:iphone', 'twitter:app:url:ipad'),
|
||||
channel_page, 'channel url', default=None)
|
||||
if channel_url:
|
||||
channel_playlist_id = self._search_regex(
|
||||
r'vnd\.youtube://user/([0-9A-Za-z_-]+)',
|
||||
channel_url, 'channel id', default=None)
|
||||
if channel_playlist_id and channel_playlist_id.startswith('UC'):
|
||||
playlist_id = 'UU' + channel_playlist_id[2:]
|
||||
return self.url_result(
|
||||
@@ -1987,20 +2027,39 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
|
||||
for video_id, video_title in self.extract_videos_from_page(channel_page)]
|
||||
return self.playlist_result(entries, channel_id)
|
||||
|
||||
try:
|
||||
next(self._entries(channel_page, channel_id))
|
||||
except StopIteration:
|
||||
alert_message = self._html_search_regex(
|
||||
r'(?s)<div[^>]+class=(["\']).*?\byt-alert-message\b.*?\1[^>]*>(?P<alert>[^<]+)</div>',
|
||||
channel_page, 'alert', default=None, group='alert')
|
||||
if alert_message:
|
||||
raise ExtractorError('Youtube said: %s' % alert_message, expected=True)
|
||||
|
||||
return self.playlist_result(self._entries(channel_page, channel_id), channel_id)
|
||||
|
||||
|
||||
class YoutubeUserIE(YoutubeChannelIE):
|
||||
IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
|
||||
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:user/|c/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
|
||||
_TEMPLATE_URL = 'https://www.youtube.com/user/%s/videos'
|
||||
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
|
||||
_TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos'
|
||||
IE_NAME = 'youtube:user'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.youtube.com/user/TheLinuxFoundation',
|
||||
'playlist_mincount': 320,
|
||||
'info_dict': {
|
||||
'title': 'TheLinuxFoundation',
|
||||
'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ',
|
||||
'title': 'Uploads from The Linux Foundation',
|
||||
}
|
||||
}, {
|
||||
# Only available via https://www.youtube.com/c/12minuteathlete/videos
|
||||
# but not https://www.youtube.com/user/12minuteathlete/videos
|
||||
'url': 'https://www.youtube.com/c/12minuteathlete/videos',
|
||||
'playlist_mincount': 249,
|
||||
'info_dict': {
|
||||
'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
|
||||
'title': 'Uploads from 12 Minute Athlete',
|
||||
}
|
||||
}, {
|
||||
'url': 'ytuser:phihag',
|
||||
@@ -2008,6 +2067,13 @@ class YoutubeUserIE(YoutubeChannelIE):
|
||||
}, {
|
||||
'url': 'https://www.youtube.com/c/gametrailers',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.youtube.com/gametrailers',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# This channel is not available.
|
||||
'url': 'https://www.youtube.com/user/kananishinoSMEJ/videos',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
@@ -2020,6 +2086,10 @@ class YoutubeUserIE(YoutubeChannelIE):
|
||||
else:
|
||||
return super(YoutubeUserIE, cls).suitable(url)
|
||||
|
||||
def _build_template_url(self, url, channel_id):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
return self._TEMPLATE_URL % (mobj.group('user') or 'user', mobj.group('id'))
|
||||
|
||||
|
||||
class YoutubeLiveIE(YoutubeBaseInfoExtractor):
|
||||
IE_DESC = 'YouTube.com live streams'
|
||||
|
@@ -26,7 +26,11 @@ def parseOpts(overrideArguments=None):
|
||||
except IOError:
|
||||
return default # silently skip if file is not present
|
||||
try:
|
||||
res = compat_shlex_split(optionf.read(), comments=True)
|
||||
# FIXME: https://github.com/rg3/youtube-dl/commit/dfe5fa49aed02cf36ba9f743b11b0903554b5e56
|
||||
contents = optionf.read()
|
||||
if sys.version_info < (3,):
|
||||
contents = contents.decode(preferredencoding())
|
||||
res = compat_shlex_split(contents, comments=True)
|
||||
finally:
|
||||
optionf.close()
|
||||
return res
|
||||
@@ -209,11 +213,16 @@ def parseOpts(overrideArguments=None):
|
||||
action='store_const', const='::', dest='source_address',
|
||||
help='Make all connections via IPv6 (experimental)',
|
||||
)
|
||||
network.add_option(
|
||||
'--geo-verification-proxy',
|
||||
dest='geo_verification_proxy', default=None, metavar='URL',
|
||||
help='Use this proxy to verify the IP address for some geo-restricted sites. '
|
||||
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
|
||||
)
|
||||
network.add_option(
|
||||
'--cn-verification-proxy',
|
||||
dest='cn_verification_proxy', default=None, metavar='URL',
|
||||
help='Use this proxy to verify the IP address for some Chinese sites. '
|
||||
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
|
||||
help=optparse.SUPPRESS_HELP,
|
||||
)
|
||||
|
||||
selection = optparse.OptionGroup(parser, 'Video Selection')
|
||||
@@ -807,11 +816,11 @@ def parseOpts(overrideArguments=None):
|
||||
system_conf = []
|
||||
user_conf = []
|
||||
else:
|
||||
system_conf = compat_conf(_readOptions('/etc/youtube-dl.conf'))
|
||||
system_conf = _readOptions('/etc/youtube-dl.conf')
|
||||
if '--ignore-config' in system_conf:
|
||||
user_conf = []
|
||||
else:
|
||||
user_conf = compat_conf(_readUserConf())
|
||||
user_conf = _readUserConf()
|
||||
argv = system_conf + user_conf + command_line_conf
|
||||
|
||||
opts, args = parser.parse_args(argv)
|
||||
|
@@ -310,9 +310,17 @@ def get_element_by_id(id, html):
|
||||
return get_element_by_attribute('id', id, html)
|
||||
|
||||
|
||||
def get_element_by_attribute(attribute, value, html):
|
||||
def get_element_by_class(class_name, html):
|
||||
return get_element_by_attribute(
|
||||
'class', r'[^\'"]*\b%s\b[^\'"]*' % re.escape(class_name),
|
||||
html, escape_value=False)
|
||||
|
||||
|
||||
def get_element_by_attribute(attribute, value, html, escape_value=True):
|
||||
"""Return the content of the tag with the specified attribute in the passed HTML document"""
|
||||
|
||||
value = re.escape(value) if escape_value else value
|
||||
|
||||
m = re.search(r'''(?xs)
|
||||
<([a-zA-Z0-9:._-]+)
|
||||
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'))*?
|
||||
@@ -321,7 +329,7 @@ def get_element_by_attribute(attribute, value, html):
|
||||
\s*>
|
||||
(?P<content>.*?)
|
||||
</\1>
|
||||
''' % (re.escape(attribute), re.escape(value)), html)
|
||||
''' % (re.escape(attribute), value), html)
|
||||
|
||||
if not m:
|
||||
return None
|
||||
@@ -1444,6 +1452,8 @@ def shell_quote(args):
|
||||
def smuggle_url(url, data):
|
||||
""" Pass additional data in a URL for internal use. """
|
||||
|
||||
url, idata = unsmuggle_url(url, {})
|
||||
data.update(idata)
|
||||
sdata = compat_urllib_parse_urlencode(
|
||||
{'__youtubedl_smuggle': json.dumps(data)})
|
||||
return url + '#' + sdata
|
||||
@@ -2095,6 +2105,7 @@ def mimetype2ext(mt):
|
||||
return ext
|
||||
|
||||
_, _, res = mt.rpartition('/')
|
||||
res = res.lower()
|
||||
|
||||
return {
|
||||
'3gpp': '3gp',
|
||||
@@ -2106,6 +2117,12 @@ def mimetype2ext(mt):
|
||||
'x-flv': 'flv',
|
||||
'x-mp4-fragmented': 'mp4',
|
||||
'x-ms-wmv': 'wmv',
|
||||
'mpegurl': 'm3u8',
|
||||
'x-mpegurl': 'm3u8',
|
||||
'vnd.apple.mpegurl': 'm3u8',
|
||||
'dash+xml': 'mpd',
|
||||
'f4m': 'f4m',
|
||||
'f4m+xml': 'f4m',
|
||||
}.get(res, res)
|
||||
|
||||
|
||||
|
@@ -1,3 +1,3 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2016.07.03.1'
|
||||
__version__ = '2016.07.09.1'
|
||||
|
Reference in New Issue
Block a user