Compare commits

...

19 Commits

Author SHA1 Message Date
Sergey M․
3e7ec5330a release 2019.04.24 2019-04-24 10:05:54 +07:00
Sergey M․
98933c14e1 [ChangeLog] Actualize
[ci skip]
2019-04-24 10:05:08 +07:00
Sergey M․
56667d622c [youtube] Fix extraction (closes #20758, closes #20759, closes #20761, closes #20762, closes #20764, closes #20766, closes #20767, closes #20769, closes #20771, closes #20768, closes #20770) 2019-04-24 09:58:00 +07:00
Remita Amine
50d660479d [toutv] fix extraction and extract series info(closes #20757) 2019-04-24 00:28:00 +01:00
Remita Amine
1fa8893734 [vrv] add support for movie listings(closes #19229) 2019-04-22 23:50:37 +01:00
Remita Amine
15be3eb5e5 [youtube] raise ExtractorError when no data available(#20737) 2019-04-22 20:52:43 +01:00
Sergey M․
e09965d550 [soundcloud] Add support for new rendition and improve extraction (closes #20699) 2019-04-23 00:39:36 +07:00
Remita Amine
3fd86cfe13 [ooyala] add support for geo verification proxy 2019-04-22 10:04:56 +01:00
Remita Amine
fdc2183650 [nrl] Add new extractor(closes #15991) 2019-04-22 10:04:00 +01:00
Remita Amine
85b6335d55 [vimeo] extract live archive source format(#19144) 2019-04-21 21:05:58 +01:00
Remita Amine
c25720ef6a [vimeo] add support live streams and improve info extraction(closes #19144) 2019-04-21 17:20:52 +01:00
Remita Amine
c9b19d7a55 [ntvcojp] Add new extractor 2019-04-21 14:51:26 +01:00
Remita Amine
47cfa00516 [nhk] extract rtmpt format 2019-04-21 13:25:04 +01:00
Remita Amine
061d1cd948 [nhk] add support for audio URLs 2019-04-21 13:17:22 +01:00
Remita Amine
5de538787d [udemy] add another course id extraction pattern(closes #20491) 2019-04-19 20:44:59 +01:00
Sergey M․
9abeefd527 [openload] Add support for oload.services (closes #20691) 2019-04-18 23:56:20 +07:00
ealgase
f3914b06a0 [openload] Add support for openloed.co (closes #20691)
While the .co could be captured directly, I anticipate that there will be more TLD's for openloed in the future.
2019-04-18 01:51:32 +07:00
Remita Amine
81d989c21e [bravotv] fix extraction(closes #19213) 2019-04-18 01:50:30 +07:00
Sergey M․
cd6c75b05f release 2019.04.17 2019-04-18 01:50:25 +07:00
18 changed files with 461 additions and 146 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.04.07*. If it's not, read [this FAQ entry](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.04.07**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.04.24*. If it's not, read [this FAQ entry](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.04.24**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/ytdl-org/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/ytdl-org/youtube-dl#faq) and [BUGS](https://github.com/ytdl-org/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.04.07
[debug] youtube-dl version 2019.04.24
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -1,4 +1,26 @@
version <unreleased>
version 2019.04.24
Extractors
* [youtube] Fix extraction (#20758, #20759, #20761, #20762, #20764, #20766,
#20767, #20769, #20771, #20768, #20770)
* [toutv] Fix extraction and extract series info (#20757)
+ [vrv] Add support for movie listings (#19229)
+ [youtube] Print error when no data is available (#20737)
+ [soundcloud] Add support for new rendition and improve extraction (#20699)
+ [ooyala] Add support for geo verification proxy
+ [nrl] Add support for nrl.com (#15991)
+ [vimeo] Extract live archive source format (#19144)
+ [vimeo] Add support for live streams and improve info extraction (#19144)
+ [ntvcojp] Add support for cu.ntv.co.jp
+ [nhk] Extract RTMPT format
+ [nhk] Add support for audio URLs
+ [udemy] Add another course id extraction pattern (#20491)
+ [openload] Add support for oload.services (#20691)
+ [openload] Add support for openloed.co (#20691, #20693)
* [bravotv] Fix extraction (#19213)
version 2019.04.17
Extractors
* [openload] Randomize User-Agent (closes #20688)

View File

@@ -46,6 +46,7 @@
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimeOnDemand**
- **Anvato**
- **aol.com**
- **APA**
- **Aparat**
- **AppleConnect**
@@ -200,6 +201,7 @@
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **CTVNews**
- **cu.ntv.co.jp**: Nippon Television Network
- **Culturebox**
- **CultureUnplugged**
- **curiositystream**
@@ -623,6 +625,7 @@
- **NRKTVEpisodes**
- **NRKTVSeason**
- **NRKTVSeries**
- **NRLTV**
- **ntv.ru**
- **Nuvid**
- **NYTimes**
@@ -632,7 +635,6 @@
- **OdaTV**
- **Odnoklassniki**
- **OktoberfestTV**
- **on.aol.com**
- **OnDemandKorea**
- **onet.pl**
- **onet.tv**
@@ -853,6 +855,7 @@
- **StreamCZ**
- **StreetVoice**
- **StretchInternet**
- **stv:player**
- **SunPorno**
- **SVT**
- **SVTPage**
@@ -1124,6 +1127,8 @@
- **XVideos**
- **XXXYMovies**
- **Yahoo**: Yahoo screen and movies
- **yahoo:gyao**
- **yahoo:gyao:player**
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
@@ -12,16 +14,16 @@ from ..utils import (
class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
'md5': '9086d0b7ef0ea2aabc4781d75f4e5863',
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'md5': 'e34684cfea2a96cd2ee1ef3a60909de9',
'info_dict': {
'id': 'zHyk1_HU_mPy',
'id': 'epL0pmK1kQlT',
'ext': 'mp4',
'title': 'LCK Ep 12: Fishy Finale',
'description': 'S13/E12: Two eliminated chefs have just 12 minutes to cook up a delicious fish dish.',
'title': 'The Top Chef Season 16 Winner Is...',
'description': 'Find out who takes the title of Top Chef!',
'uploader': 'NBCU-BRAV',
'upload_date': '20160302',
'timestamp': 1456945320,
'upload_date': '20190314',
'timestamp': 1552591860,
}
}, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
@@ -32,30 +34,38 @@ class BravoTVIE(AdobePassIE):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>({.+?})</script>', webpage, 'drupal settings'),
display_id)
info = {}
query = {
'mbr': 'true',
}
account_pid, release_pid = [None] * 2
tve = settings.get('sharedTVE')
tve = settings.get('ls_tve')
if tve:
query['manifest'] = 'm3u'
account_pid = 'HNK2IC'
release_pid = tve['release_pid']
mobj = re.search(r'<[^>]+id="pdk-player"[^>]+data-url=["\']?(?:https?:)?//player\.theplatform\.com/p/([^/]+)/(?:[^/]+/)*select/([^?#&"\']+)', webpage)
if mobj:
account_pid, tp_path = mobj.groups()
release_pid = tp_path.strip('/').split('/')[-1]
else:
account_pid = 'HNK2IC'
tp_path = release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('adobePass', {})
adobe_pass = settings.get('tve_adobe_auth', {})
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
else:
shared_playlist = settings['shared_playlist']
shared_playlist = settings['ls_playlist']
account_pid = shared_playlist['account_pid']
metadata = shared_playlist['video_metadata'][shared_playlist['default_clip']]
release_pid = metadata['release_pid']
tp_path = release_pid = metadata.get('release_pid')
if not release_pid:
release_pid = metadata['guid']
tp_path = 'media/guid/2140479951/' + release_pid
info.update({
'title': metadata['title'],
'description': metadata.get('description'),
@@ -67,7 +77,7 @@ class BravoTVIE(AdobePassIE):
'_type': 'url_transparent',
'id': release_pid,
'url': smuggle_url(update_url_query(
'http://link.theplatform.com/s/%s/%s' % (account_pid, release_pid),
'http://link.theplatform.com/s/%s/%s' % (account_pid, tp_path),
query), {'force_smil_url': True}),
'ie_key': 'ThePlatform',
})

View File

@@ -2019,6 +2019,8 @@ class InfoExtractor(object):
if res is False:
return []
mpd_doc, urlh = res
if mpd_doc is None:
return []
mpd_base_url = base_url(urlh.geturl())
return self._parse_mpd_formats(

View File

@@ -808,6 +808,8 @@ from .nrk import (
NRKTVSeasonIE,
NRKTVSeriesIE,
)
from .nrl import NRLTVIE
from .ntvcojp import NTVCoJpCUIE
from .ntvde import NTVDeIE
from .ntvru import NTVRuIE
from .nytimes import (

View File

@@ -1,54 +1,81 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/(?:vod|ondemand)/(?P<id>[^/]+/[^/?#&]+)'
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[a-z]+-\d{8}-\d+)'
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
# Videos available only for a limited period of time. Visit
# http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples.
'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815',
'info_dict': {
'id': 'A1bnNiNTE6nY3jLllS-BIISfcC_PpvF5',
'ext': 'flv',
'title': 'TOKYO FASHION EXPRESS - The Kimono as Global Fashion',
'description': 'md5:db338ee6ce8204f415b754782f819824',
'series': 'TOKYO FASHION EXPRESS',
'episode': 'The Kimono as Global Fashion',
},
'skip': 'Videos available only for a limited period of time',
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}]
_API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k'
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sodesdlist/v7/episode/%s/%s/all%s.json'
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(self._API_URL, video_id)
try:
episode = next(
e for e in data['data']['episodes']
if e.get('url') and video_id in e['url'])
except StopIteration:
raise ExtractorError('Unable to find episode')
embed_code = episode['vod_id']
lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
if episode_id.isdigit():
episode_id = episode_id[:4] + '-' + episode_id[4:]
is_video = m_type == 'video'
episode = self._download_json(
self._API_URL_TEMPLATE % ('v' if is_video else 'r', episode_id, lang, '/all' if is_video else ''),
episode_id, query={'apikey': 'EJfK8jdS57GqlupFgAfAAwr573q01y6k'})['data']['episodes'][0]
title = episode.get('sub_title_clean') or episode['sub_title']
description = episode.get('description_clean') or episode.get('description')
series = episode.get('title_clean') or episode.get('title')
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % embed_code,
def get_clean_field(key):
return episode.get(key + '_clean') or episode.get(key)
series = get_clean_field('title')
thumbnails = []
for s, w, h in [('', 640, 360), ('_l', 1280, 720)]:
img_path = episode.get('image' + s)
if not img_path:
continue
thumbnails.append({
'id': '%dp' % h,
'height': h,
'width': w,
'url': 'https://www3.nhk.or.jp' + img_path,
})
info = {
'id': episode_id + '-' + lang,
'title': '%s - %s' % (series, title) if series and title else title,
'description': description,
'description': get_clean_field('description'),
'thumbnails': thumbnails,
'series': series,
'episode': title,
}
if is_video:
info.update({
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:' + episode['vod_id'],
})
else:
audio = episode['audio']
audio_path = audio['audio']
info['formats'] = self._extract_m3u8_formats(
'https://nhks-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
episode_id, 'm4a', m3u8_id='hls', fatal=False)
for proto in ('rtmpt', 'rtmp'):
info['formats'].append({
'ext': 'flv',
'format_id': proto,
'url': '%s://flv.nhk.or.jp/ondemand/mp4:flv%s' % (proto, audio_path),
'vcodec': 'none',
})
for f in info['formats']:
f['language'] = lang
return info

View File

@@ -0,0 +1,30 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class NRLTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nrl\.com/tv(/[^/]+)*/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.nrl.com/tv/news/match-highlights-titans-v-knights-862805/',
'info_dict': {
'id': 'YyNnFuaDE6kPJqlDhG4CGQ_w89mKTau4',
'ext': 'mp4',
'title': 'Match Highlights: Titans v Knights',
},
'params': {
# m3u8 download
'skip_download': True,
'format': 'bestvideo',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
q_data = self._parse_json(self._search_regex(
r"(?s)q-data='({.+?})'", webpage, 'player data'), display_id)
ooyala_id = q_data['videoId']
return self.url_result(
'ooyala:' + ooyala_id, 'Ooyala', ooyala_id, q_data.get('title'))

View File

@@ -0,0 +1,49 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
js_to_json,
smuggle_url,
)
class NTVCoJpCUIE(InfoExtractor):
IE_NAME = 'cu.ntv.co.jp'
IE_DESC = 'Nippon Television Network'
_VALID_URL = r'https?://cu\.ntv\.co\.jp/(?!program)(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://cu.ntv.co.jp/televiva-chill-gohan_181031/',
'info_dict': {
'id': '5978891207001',
'ext': 'mp4',
'title': '桜エビと炒り卵がポイント! 「中華風 エビチリおにぎり」──『美虎』五十嵐美幸',
'upload_date': '20181213',
'description': 'md5:211b52f4fd60f3e0e72b68b0c6ba52a9',
'uploader_id': '3855502814001',
'timestamp': 1544669941,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
player_config = self._parse_json(self._search_regex(
r'(?s)PLAYER_CONFIG\s*=\s*({.+?})',
webpage, 'player config'), display_id, js_to_json)
video_id = player_config['videoId']
account_id = player_config.get('account') or '3855502814001'
return {
'_type': 'url_transparent',
'id': video_id,
'display_id': display_id,
'title': self._search_regex(r'<h1[^>]+class="title"[^>]*>([^<]+)', webpage, 'title').strip(),
'description': self._html_search_meta(['description', 'og:description'], webpage),
'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % (account_id, video_id), {'geo_countries': ['JP']}),
'ie_key': 'BrightcoveNew',
}

View File

@@ -36,7 +36,7 @@ class OoyalaBaseIE(InfoExtractor):
'domain': domain,
'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds,dash,smooth',
'embedToken': embed_token,
}), video_id)
}), video_id, headers=self.geo_verification_headers())
cur_auth_data = auth_data['authorization_data'][embed_code]

View File

@@ -244,7 +244,7 @@ class PhantomJSwrapper(object):
class OpenloadIE(InfoExtractor):
_DOMAINS = r'(?:openload\.(?:co|io|link|pw)|oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|pw|live|space)|oladblock\.(?:services|xyz|me))'
_DOMAINS = r'(?:openload\.(?:co|io|link|pw)|oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|pw|live|space|services)|oladblock\.(?:services|xyz|me)|openloed\.co)'
_VALID_URL = r'''(?x)
https?://
(?P<host>
@@ -351,6 +351,9 @@ class OpenloadIE(InfoExtractor):
}, {
'url': 'https://oload.space/f/IY4eZSst3u8/',
'only_matching': True,
}, {
'url': 'https://oload.services/embed/bs1NWj1dCag/',
'only_matching': True,
}, {
'url': 'https://oladblock.services/f/b8NWEgkqNLI/',
'only_matching': True,
@@ -360,6 +363,9 @@ class OpenloadIE(InfoExtractor):
}, {
'url': 'https://oladblock.me/f/b8NWEgkqNLI/',
'only_matching': True,
}, {
'url': 'https://openloed.co/f/b8NWEgkqNLI/',
'only_matching': True,
}]
_USER_AGENT_TPL = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{major}.0.{build}.{patch} Safari/537.36'

View File

@@ -15,7 +15,12 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
KNOWN_EXTENSIONS,
merge_dicts,
mimetype2ext,
str_or_none,
try_get,
unified_timestamp,
update_url_query,
@@ -57,7 +62,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'E.T. ExTerrestrial Music',
'timestamp': 1349920598,
'upload_date': '20121011',
'duration': 143,
'duration': 143.216,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
@@ -100,7 +105,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'jaimeMF',
'timestamp': 1386604920,
'upload_date': '20131209',
'duration': 9,
'duration': 9.927,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
@@ -120,7 +125,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'jaimeMF',
'timestamp': 1386604920,
'upload_date': '20131209',
'duration': 9,
'duration': 9.927,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
@@ -140,7 +145,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'oddsamples',
'timestamp': 1389232924,
'upload_date': '20140109',
'duration': 17,
'duration': 17.346,
'license': 'cc-by-sa',
'view_count': int,
'like_count': int,
@@ -160,7 +165,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'Ori Uplift Music',
'timestamp': 1504206263,
'upload_date': '20170831',
'duration': 7449,
'duration': 7449.096,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
@@ -180,7 +185,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'garyvee',
'timestamp': 1488152409,
'upload_date': '20170226',
'duration': 207,
'duration': 207.012,
'thumbnail': r're:https?://.*\.jpg',
'license': 'all-rights-reserved',
'view_count': int,
@@ -192,9 +197,31 @@ class SoundcloudIE(InfoExtractor):
'skip_download': True,
},
},
# not avaialble via api.soundcloud.com/i1/tracks/id/streams
{
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
'info_dict': {
'id': '583011102',
'ext': 'mp3',
'title': 'Mezzo Valzer',
'description': 'md5:4138d582f81866a530317bae316e8b61',
'uploader': 'Giovanni Sarani',
'timestamp': 1551394171,
'upload_date': '20190228',
'duration': 180.157,
'thumbnail': r're:https?://.*\.jpg',
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
},
'expected_warnings': ['Unable to download JSON metadata'],
}
]
_CLIENT_ID = 'NmW1FlPaiL94ueEu7oziOWjYEzZzQDcK'
_CLIENT_ID = 'FweeGBOOEOYJWLJN3oEyToGLKhmSz0I7'
@staticmethod
def _extract_urls(webpage):
@@ -202,10 +229,6 @@ class SoundcloudIE(InfoExtractor):
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:w\.)?soundcloud\.com/player.+?)\1',
webpage)]
def report_resolve(self, video_id):
"""Report information extraction."""
self.to_screen('%s: Resolving id' % video_id)
@classmethod
def _resolv_url(cls, url):
return 'https://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
@@ -224,6 +247,10 @@ class SoundcloudIE(InfoExtractor):
def extract_count(key):
return int_or_none(info.get('%s_count' % key))
like_count = extract_count('favoritings')
if like_count is None:
like_count = extract_count('likes')
result = {
'id': track_id,
'uploader': username,
@@ -231,15 +258,17 @@ class SoundcloudIE(InfoExtractor):
'title': title,
'description': info.get('description'),
'thumbnail': thumbnail,
'duration': int_or_none(info.get('duration'), 1000),
'duration': float_or_none(info.get('duration'), 1000),
'webpage_url': info.get('permalink_url'),
'license': info.get('license'),
'view_count': extract_count('playback'),
'like_count': extract_count('favoritings'),
'like_count': like_count,
'comment_count': extract_count('comment'),
'repost_count': extract_count('reposts'),
'genre': info.get('genre'),
}
format_urls = set()
formats = []
query = {'client_id': self._CLIENT_ID}
if secret_token is not None:
@@ -248,6 +277,7 @@ class SoundcloudIE(InfoExtractor):
# We can build a direct link to the song
format_url = update_url_query(
'https://api.soundcloud.com/tracks/%s/download' % track_id, query)
format_urls.add(format_url)
formats.append({
'format_id': 'download',
'ext': info.get('original_format', 'mp3'),
@@ -256,44 +286,91 @@ class SoundcloudIE(InfoExtractor):
'preference': 10,
})
# We have to retrieve the url
# Old API, does not work for some tracks (e.g.
# https://soundcloud.com/giovannisarani/mezzo-valzer)
format_dict = self._download_json(
'https://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
track_id, 'Downloading track url', query=query)
track_id, 'Downloading track url', query=query, fatal=False)
for key, stream_url in format_dict.items():
ext, abr = 'mp3', None
mobj = re.search(r'_([^_]+)_(\d+)_url', key)
if mobj:
ext, abr = mobj.groups()
abr = int(abr)
if key.startswith('http'):
stream_formats = [{
'format_id': key,
'ext': ext,
'url': stream_url,
}]
elif key.startswith('rtmp'):
# The url doesn't have an rtmp app, we have to extract the playpath
url, path = stream_url.split('mp3:', 1)
stream_formats = [{
'format_id': key,
'url': url,
'play_path': 'mp3:' + path,
'ext': 'flv',
}]
elif key.startswith('hls'):
stream_formats = self._extract_m3u8_formats(
stream_url, track_id, ext, entry_protocol='m3u8_native',
m3u8_id=key, fatal=False)
else:
if format_dict:
for key, stream_url in format_dict.items():
if stream_url in format_urls:
continue
format_urls.add(stream_url)
ext, abr = 'mp3', None
mobj = re.search(r'_([^_]+)_(\d+)_url', key)
if mobj:
ext, abr = mobj.groups()
abr = int(abr)
if key.startswith('http'):
stream_formats = [{
'format_id': key,
'ext': ext,
'url': stream_url,
}]
elif key.startswith('rtmp'):
# The url doesn't have an rtmp app, we have to extract the playpath
url, path = stream_url.split('mp3:', 1)
stream_formats = [{
'format_id': key,
'url': url,
'play_path': 'mp3:' + path,
'ext': 'flv',
}]
elif key.startswith('hls'):
stream_formats = self._extract_m3u8_formats(
stream_url, track_id, ext, entry_protocol='m3u8_native',
m3u8_id=key, fatal=False)
else:
continue
if abr:
for f in stream_formats:
f['abr'] = abr
formats.extend(stream_formats)
# New API
transcodings = try_get(
info, lambda x: x['media']['transcodings'], list) or []
for t in transcodings:
if not isinstance(t, dict):
continue
if abr:
for f in stream_formats:
f['abr'] = abr
formats.extend(stream_formats)
format_url = url_or_none(t.get('url'))
if not format_url:
continue
stream = self._download_json(
update_url_query(format_url, query), track_id, fatal=False)
if not isinstance(stream, dict):
continue
stream_url = url_or_none(stream.get('url'))
if not stream_url:
continue
if stream_url in format_urls:
continue
format_urls.add(stream_url)
protocol = try_get(t, lambda x: x['format']['protocol'], compat_str)
if protocol != 'hls' and '/hls' in format_url:
protocol = 'hls'
ext = None
preset = str_or_none(t.get('preset'))
if preset:
ext = preset.split('_')[0]
if ext not in KNOWN_EXTENSIONS:
mimetype = try_get(
t, lambda x: x['format']['mime_type'], compat_str)
ext = mimetype2ext(mimetype) or 'mp3'
format_id_list = []
if protocol:
format_id_list.append(protocol)
format_id_list.append(ext)
format_id = '_'.join(format_id_list)
formats.append({
'url': stream_url,
'format_id': format_id,
'ext': ext,
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
})
if not formats:
# We fallback to the stream_url in the original info, this
@@ -303,11 +380,11 @@ class SoundcloudIE(InfoExtractor):
'url': update_url_query(info['stream_url'], query),
'ext': 'mp3',
})
self._check_formats(formats, track_id)
for f in formats:
f['vcodec'] = 'none'
self._check_formats(formats, track_id)
self._sort_formats(formats)
result['formats'] = formats
@@ -319,6 +396,7 @@ class SoundcloudIE(InfoExtractor):
raise ExtractorError('Invalid URL: %s' % url)
track_id = mobj.group('track_id')
new_info = {}
if track_id is not None:
info_json_url = 'https://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
@@ -344,13 +422,31 @@ class SoundcloudIE(InfoExtractor):
if token:
resolve_title += '/%s' % token
self.report_resolve(full_title)
webpage = self._download_webpage(url, full_title, fatal=False)
if webpage:
entries = self._parse_json(
self._search_regex(
r'var\s+c\s*=\s*(\[.+?\])\s*,\s*o\s*=Date\b', webpage,
'data', default='[]'), full_title, fatal=False)
if entries:
for e in entries:
if not isinstance(e, dict):
continue
if e.get('id') != 67:
continue
data = try_get(e, lambda x: x['data'][0], dict)
if data:
new_info = data
break
info_json_url = self._resolv_url(
'https://soundcloud.com/%s' % resolve_title)
url = 'https://soundcloud.com/%s' % resolve_title
info_json_url = self._resolv_url(url)
info = self._download_json(info_json_url, full_title, 'Downloading info JSON')
# Contains some additional info missing from new_info
info = self._download_json(
info_json_url, full_title, 'Downloading info JSON')
return self._extract_info_dict(info, full_title, secret_token=token)
return self._extract_info_dict(
merge_dicts(info, new_info), full_title, secret_token=token)
class SoundcloudPlaylistBaseIE(SoundcloudIE):
@@ -396,8 +492,6 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
full_title += '/' + token
url += '/' + token
self.report_resolve(full_title)
resolv_url = self._resolv_url(url)
info = self._download_json(resolv_url, full_title)

View File

@@ -66,7 +66,12 @@ class TouTvIE(RadioCanadaIE):
def _real_extract(self, url):
path = self._match_id(url)
metadata = self._download_json('http://ici.tou.tv/presentation/%s' % path, path)
metadata = self._download_json(
'https://services.radio-canada.ca/toutv/presentation/%s' % path, path, query={
'client_key': self._CLIENT_KEY,
'device': 'web',
'version': 4,
})
# IsDrm does not necessarily mean the video is DRM protected (see
# https://github.com/ytdl-org/youtube-dl/issues/13994).
if metadata.get('IsDrm'):
@@ -77,6 +82,12 @@ class TouTvIE(RadioCanadaIE):
return merge_dicts({
'id': video_id,
'title': details.get('OriginalTitle'),
'description': details.get('Description'),
'thumbnail': details.get('ImageUrl'),
'duration': int_or_none(details.get('LengthInSeconds')),
'series': metadata.get('ProgramTitle'),
'season_number': int_or_none(metadata.get('SeasonNumber')),
'season': metadata.get('SeasonTitle'),
'episode_number': int_or_none(metadata.get('EpisodeNumber')),
'episode': metadata.get('EpisodeTitle'),
}, self._extract_info(metadata.get('AppCode', 'toutv'), video_id))

View File

@@ -76,7 +76,10 @@ class UdemyIE(InfoExtractor):
webpage, 'course', default='{}')),
video_id, fatal=False) or {}
course_id = course.get('id') or self._search_regex(
r'data-course-id=["\'](\d+)', webpage, 'course id')
[
r'data-course-id=["\'](\d+)',
r'&quot;courseId&quot;\s*:\s*(\d+)'
], webpage, 'course id')
return course_id, course.get('title')
def _enroll_course(self, base_url, webpage, course_id):

View File

@@ -109,23 +109,9 @@ class VimeoBaseInfoExtractor(InfoExtractor):
def _parse_config(self, config, video_id):
video_data = config['video']
# Extract title
video_title = video_data['title']
# Extract uploader, uploader_url and uploader_id
video_uploader = video_data.get('owner', {}).get('name')
video_uploader_url = video_data.get('owner', {}).get('url')
video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
# Extract video thumbnail
video_thumbnail = video_data.get('thumbnail')
if video_thumbnail is None:
video_thumbs = video_data.get('thumbs')
if video_thumbs and isinstance(video_thumbs, dict):
_, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
# Extract video duration
video_duration = int_or_none(video_data.get('duration'))
live_event = video_data.get('live_event') or {}
is_live = live_event.get('status') == 'started'
formats = []
config_files = video_data.get('files') or config['request'].get('files', {})
@@ -142,6 +128,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'tbr': int_or_none(f.get('bitrate')),
})
# TODO: fix handling of 308 status code returned for live archive manifest requests
for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
manifest_url = cdn_data.get('url')
@@ -151,7 +138,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
if files_type == 'hls':
formats.extend(self._extract_m3u8_formats(
manifest_url, video_id, 'mp4',
'm3u8_native', m3u8_id=format_id,
'm3u8' if is_live else 'm3u8_native', m3u8_id=format_id,
note='Downloading %s m3u8 information' % cdn_name,
fatal=False))
elif files_type == 'dash':
@@ -164,6 +151,10 @@ class VimeoBaseInfoExtractor(InfoExtractor):
else:
mpd_manifest_urls = [(format_id, manifest_url)]
for f_id, m_url in mpd_manifest_urls:
if 'json=1' in m_url:
real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
if real_m_url:
m_url = real_m_url
mpd_formats = self._extract_mpd_formats(
m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
'Downloading %s MPD information' % cdn_name,
@@ -175,6 +166,15 @@ class VimeoBaseInfoExtractor(InfoExtractor):
f['preference'] = -40
formats.extend(mpd_formats)
live_archive = live_event.get('archive') or {}
live_archive_source_url = live_archive.get('source_url')
if live_archive_source_url and live_archive.get('status') == 'done':
formats.append({
'format_id': 'live-archive-source',
'url': live_archive_source_url,
'preference': 1,
})
subtitles = {}
text_tracks = config['request'].get('text_tracks')
if text_tracks:
@@ -184,15 +184,33 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'url': 'https://vimeo.com' + tt['url'],
}]
thumbnails = []
if not is_live:
for key, thumb in video_data.get('thumbs', {}).items():
thumbnails.append({
'id': key,
'width': int_or_none(key),
'url': thumb,
})
thumbnail = video_data.get('thumbnail')
if thumbnail:
thumbnails.append({
'url': thumbnail,
})
owner = video_data.get('owner') or {}
video_uploader_url = owner.get('url')
return {
'title': video_title,
'uploader': video_uploader,
'uploader_id': video_uploader_id,
'title': self._live_title(video_title) if is_live else video_title,
'uploader': owner.get('name'),
'uploader_id': video_uploader_url.split('/')[-1] if video_uploader_url else None,
'uploader_url': video_uploader_url,
'thumbnail': video_thumbnail,
'duration': video_duration,
'thumbnails': thumbnails,
'duration': int_or_none(video_data.get('duration')),
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
}
def _extract_original_format(self, url, video_id):

View File

@@ -102,6 +102,15 @@ class VRVIE(VRVBaseIE):
# m3u8 download
'skip_download': True,
},
}, {
# movie listing
'url': 'https://vrv.co/watch/G6NQXZ1J6/Lily-CAT',
'info_dict': {
'id': 'G6NQXZ1J6',
'title': 'Lily C.A.T',
'description': 'md5:988b031e7809a6aeb60968be4af7db07',
},
'playlist_count': 2,
}]
_NETRC_MACHINE = 'vrv'
@@ -123,23 +132,23 @@ class VRVIE(VRVBaseIE):
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
if not url or stream_format not in ('hls', 'dash'):
return []
assert audio_lang or hardsub_lang
stream_id_list = []
if audio_lang:
stream_id_list.append('audio-%s' % audio_lang)
if hardsub_lang:
stream_id_list.append('hardsub-%s' % hardsub_lang)
stream_id = '-'.join(stream_id_list)
format_id = '%s-%s' % (stream_format, stream_id)
format_id = stream_format
if stream_id_list:
format_id += '-' + '-'.join(stream_id_list)
if stream_format == 'hls':
adaptive_formats = self._extract_m3u8_formats(
url, video_id, 'mp4', m3u8_id=format_id,
note='Downloading %s m3u8 information' % stream_id,
note='Downloading %s information' % format_id,
fatal=False)
elif stream_format == 'dash':
adaptive_formats = self._extract_mpd_formats(
url, video_id, mpd_id=format_id,
note='Downloading %s MPD information' % stream_id,
note='Downloading %s information' % format_id,
fatal=False)
if audio_lang:
for f in adaptive_formats:
@@ -155,6 +164,23 @@ class VRVIE(VRVBaseIE):
resource_path = object_data['__links__']['resource']['href']
video_data = self._call_cms(resource_path, video_id, 'video')
title = video_data['title']
description = video_data.get('description')
if video_data.get('__class__') == 'movie_listing':
items = self._call_cms(
video_data['__links__']['movie_listing/movies']['href'],
video_id, 'movie listing').get('items') or []
if len(items) != 1:
entries = []
for item in items:
item_id = item.get('id')
if not item_id:
continue
entries.append(self.url_result(
'https://vrv.co/watch/' + item_id,
self.ie_key(), item_id, item.get('title')))
return self.playlist_result(entries, video_id, title, description)
video_data = items[0]
streams_path = video_data['__links__'].get('streams', {}).get('href')
if not streams_path:
@@ -198,7 +224,7 @@ class VRVIE(VRVBaseIE):
'formats': formats,
'subtitles': subtitles,
'thumbnails': thumbnails,
'description': video_data.get('description'),
'description': description,
'duration': float_or_none(video_data.get('duration_ms'), 1000),
'uploader_id': video_data.get('channel_id'),
'series': video_data.get('series_title'),

View File

@@ -1652,7 +1652,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
view_count = extract_view_count(get_video_info)
if not video_info:
video_info = get_video_info
if 'token' in get_video_info:
get_token = get_video_info.get('token') or get_video_info.get('account_playback_token')
if get_token:
# Different get_video_info requests may report different results, e.g.
# some may report video unavailability, but some may serve it without
# any complaint (see https://github.com/ytdl-org/youtube-dl/issues/7362,
@@ -1662,7 +1663,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# due to YouTube measures against IP ranges of hosting providers.
# Working around by preferring the first succeeded video_info containing
# the token if no such video_info yet was found.
if 'token' not in video_info:
token = video_info.get('token') or video_info.get('account_playback_token')
if not token:
video_info = get_video_info
break
@@ -1671,7 +1673,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
video_webpage, 'unavailable message', default=None)
if 'token' not in video_info:
if not video_info:
unavailable_message = extract_unavailable_message()
if not unavailable_message:
unavailable_message = 'Unable to extract video data'
raise ExtractorError(
'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
token = video_info.get('token') or video_info.get('account_playback_token')
if not token:
if 'reason' in video_info:
if 'The uploader has not made this video available in your country.' in video_info['reason']:
regions_allowed = self._html_search_meta(

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2019.04.07'
__version__ = '2019.04.24'