[azmedien] Fix extraction (#28064 )

release 2021.02.04
[ChangeLog] Actualize
2025-06-27 15:22:40 +09:00 · 2021-02-03 23:57:56 +00:00 · 2021-02-04 04:49:25 +07:00 · 2021-02-04 04:47:30 +07:00 · 2021-02-04 04:42:14 +07:00 · 2021-02-04 04:20:09 +07:00
11 changed files with 148 additions and 54 deletions
--- a/.github/ISSUE_TEMPLATE/1_broken_site.md
+++ b/.github/ISSUE_TEMPLATE/1_broken_site.md
@ -18,7 +18,7 @@ title: ''

 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
 - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
 -->

 - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2021.01.24.1**
+- [ ] I've verified that I'm running youtube-dl version **2021.02.04**
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
 - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2021.01.24.1
+ [debug] youtube-dl version 2021.02.04
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/.github/ISSUE_TEMPLATE/2_site_support_request.md
+++ b/.github/ISSUE_TEMPLATE/2_site_support_request.md
@ -19,7 +19,7 @@ labels: 'site-support-request'

 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
 - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
 -->

 - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2021.01.24.1**
+- [ ] I've verified that I'm running youtube-dl version **2021.02.04**
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that none of provided URLs violate any copyrights
 - [ ] I've searched the bugtracker for similar site support requests including closed ones
--- a/.github/ISSUE_TEMPLATE/3_site_feature_request.md
+++ b/.github/ISSUE_TEMPLATE/3_site_feature_request.md
@ -18,13 +18,13 @@ title: ''

 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Finally, put x into all relevant boxes (like this [x])
 -->

 - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2021.01.24.1**
+- [ ] I've verified that I'm running youtube-dl version **2021.02.04**
 - [ ] I've searched the bugtracker for similar site feature requests including closed ones


--- a/.github/ISSUE_TEMPLATE/4_bug_report.md
+++ b/.github/ISSUE_TEMPLATE/4_bug_report.md
@ -18,7 +18,7 @@ title: ''

 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
 - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
 -->

 - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2021.01.24.1**
+- [ ] I've verified that I'm running youtube-dl version **2021.02.04**
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
 - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2021.01.24.1
+ [debug] youtube-dl version 2021.02.04
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/.github/ISSUE_TEMPLATE/5_feature_request.md
+++ b/.github/ISSUE_TEMPLATE/5_feature_request.md
@ -19,13 +19,13 @@ labels: 'request'

 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.24.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Finally, put x into all relevant boxes (like this [x])
 -->

 - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2021.01.24.1**
+- [ ] I've verified that I'm running youtube-dl version **2021.02.04**
 - [ ] I've searched the bugtracker for similar feature requests including closed ones


--- a/42
+++ b/42
@ -1,3 +1,45 @@
+version 2021.02.04
+
+Extractors
+* [pornhub] Implement lazy playlist extraction
+* [svtplay] Fix video id extraction (#28058)
+ [pornhub] Add support for authentication (#18797, #21416, #24294)
+* [pornhub:user] Improve paging
+ [pornhub:user] Add support for URLs unavailable via /videos page (#27853)
+ [bravotv] Add support for oxygen.com (#13357, #22500)
+ [youtube] Pass embed URL to get_video_info request
+* [ccma] Improve metadata extraction (#27994)
+    + Extract age limit, alt title, categories, series and episode number
+    * Fix timestamp multiple subtitles extraction
+* [egghead] Update API domain (#28038)
+- [vidzi] Remove extractor (#12629)
+* [vidio] Improve metadata extraction
+* [youtube] Improve subtitles extraction
+* [youtube] Fix chapter extraction fallback
+* [youtube] Rewrite extractor
+    * Improve format sorting
+    * Remove unused code
+    * Fix series metadata extraction
+    * Fix trailer video extraction
+    * Improve error reporting
+    + Extract video location
+ [vvvvid] Add support for youtube embeds (#27825)
+* [googledrive] Report download page errors (#28005)
+* [vlive] Fix error message decoding for python 2 (#28004)
+* [youtube] Improve DASH formats file size extraction
+* [cda] Improve birth validation detection (#14022, #27929)
+ [awaan] Extract uploader id (#27963)
+ [medialaan] Add support DPG Media MyChannels based websites (#14871, #15597,
+  #16106, #16489)
+* [abcnews] Fix extraction (#12394, #27920)
+* [AMP] Fix upload date and timestamp extraction (#27970)
+* [tv4] Relax URL regular expression (#27964)
+ [tv2] Add support for mtvuutiset.fi (#27744)
+* [adn] Improve login warning reporting
+* [zype] Fix uplynk id extraction (#27956)
+ [adn] Add support for authentication (#17091, #27841, #27937)
+
+
 version 2021.01.24.1

 Core
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@ -537,6 +537,7 @@
 - **mtv:video**
 - **mtvjapan**
 - **mtvservices:embedded**
+ - **MTVUutisetArticle**
 - **MuenchenTV**: münchen.tv
 - **mva**: Microsoft Virtual Academy videos
 - **mva:course**: Microsoft Virtual Academy courses
@ -1058,7 +1059,6 @@
 - **vidme**
 - **vidme:user**
 - **vidme:user:likes**
- - **Vidzi**
 - **vier**: vier.be and vijf.be
 - **vier:videos**
 - **viewlift**
@ -1103,6 +1103,7 @@
 - **vrv**
 - **vrv:series**
 - **VShare**
+ - **VTM**
 - **VTXTV**
 - **vube**: Vube.com
 - **VuClip**
--- a/youtube_dl/extractor/azmedien.py
+++ b/youtube_dl/extractor/azmedien.py
@ -47,7 +47,7 @@ class AZMedienIE(InfoExtractor):
        'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
        'only_matching': True
    }]
-    _API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
+    _API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/a4016f65fe62b81dc6664dd9f4910e4ab40383be'
    _PARTNER_ID = '1719221'

    def _real_extract(self, url):
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@ -23,11 +23,14 @@ from ..utils import (
    remove_quotes,
    str_to_int,
    update_url_query,
+    urlencode_postdata,
    url_or_none,
 )


 class PornHubBaseIE(InfoExtractor):
+    _NETRC_MACHINE = 'pornhub'
+
    def _download_webpage_handle(self, *args, **kwargs):
        def dl(*args, **kwargs):
            return super(PornHubBaseIE, self)._download_webpage_handle(*args, **kwargs)
@ -53,6 +56,66 @@ class PornHubBaseIE(InfoExtractor):

        return webpage, urlh

+    def _real_initialize(self):
+        self._logged_in = False
+
+    def _login(self, host):
+        if self._logged_in:
+            return
+
+        site = host.split('.')[0]
+
+        # Both sites pornhub and pornhubpremium have separate accounts
+        # so there should be an option to provide credentials for both.
+        # At the same time some videos are available under the same video id
+        # on both sites so that we have to identify them as the same video.
+        # For that purpose we have to keep both in the same extractor
+        # but under different netrc machines.
+        username, password = self._get_login_info(netrc_machine=site)
+        if username is None:
+            return
+
+        login_url = 'https://www.%s/%slogin' % (host, 'premium/' if 'premium' in host else '')
+        login_page = self._download_webpage(
+            login_url, None, 'Downloading %s login page' % site)
+
+        def is_logged(webpage):
+            return any(re.search(p, webpage) for p in (
+                r'class=["\']signOut',
+                r'>Sign\s+[Oo]ut\s*<'))
+
+        if is_logged(login_page):
+            self._logged_in = True
+            return
+
+        login_form = self._hidden_inputs(login_page)
+
+        login_form.update({
+            'username': username,
+            'password': password,
+        })
+
+        response = self._download_json(
+            'https://www.%s/front/authenticate' % host, None,
+            'Logging in to %s' % site,
+            data=urlencode_postdata(login_form),
+            headers={
+                'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
+                'Referer': login_url,
+                'X-Requested-With': 'XMLHttpRequest',
+            })
+
+        if response.get('success') == '1':
+            self._logged_in = True
+            return
+
+        message = response.get('message')
+        if message is not None:
+            raise ExtractorError(
+                'Unable to login: %s' % message, expected=True)
+
+        raise ExtractorError('Unable to log in')
+

 class PornHubIE(PornHubBaseIE):
    IE_DESC = 'PornHub and Thumbzilla'
@ -164,12 +227,20 @@ class PornHubIE(PornHubBaseIE):
    }, {
        'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82',
        'only_matching': True,
+    }, {
+        # Some videos are available with the same id on both premium
+        # and non-premium sites (e.g. this and the following test)
+        'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5f75b0f4b18e3',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5f75b0f4b18e3',
+        'only_matching': True,
    }]

    @staticmethod
    def _extract_urls(webpage):
        return re.findall(
-            r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.(?:com|net|org)/embed/[\da-z]+)',
+            r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub(?:premium)?\.(?:com|net|org)/embed/[\da-z]+)',
            webpage)

    def _extract_count(self, pattern, webpage, name):
@ -181,12 +252,7 @@ class PornHubIE(PornHubBaseIE):
        host = mobj.group('host') or 'pornhub.com'
        video_id = mobj.group('id')

-        if 'premium' in host:
-            if not self._downloader.params.get('cookiefile'):
-                raise ExtractorError(
-                    'PornHub Premium requires authentication.'
-                    ' You may want to use --cookies.',
-                    expected=True)
+        self._login(host)

        self._set_cookie(host, 'age_verified', '1')

@ -427,26 +493,6 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
                container))
        ]

-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        host = mobj.group('host')
-        playlist_id = mobj.group('id')
-
-        webpage = self._download_webpage(url, playlist_id)
-
-        entries = self._extract_entries(webpage, host)
-
-        playlist = self._parse_json(
-            self._search_regex(
-                r'(?:playlistObject|PLAYLIST_VIEW)\s*=\s*({.+?});', webpage,
-                'playlist', default='{}'),
-            playlist_id, fatal=False)
-        title = playlist.get('title') or self._search_regex(
-            r'>Videos\s+in\s+(.+?)\s+[Pp]laylist<', webpage, 'title', fatal=False)
-
-        return self.playlist_result(
-            entries, playlist_id, title, playlist.get('description'))
-

 class PornHubUserIE(PornHubPlaylistBaseIE):
    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
@ -501,24 +547,19 @@ class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
                <button[^>]+\bid=["\']moreDataBtn
            ''', webpage) is not None

-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        host = mobj.group('host')
-        item_id = mobj.group('id')
-
+    def _entries(self, url, host, item_id):
        page = self._extract_page(url)

        VIDEOS = '/videos'

-        def download_page(base_url, num):
-            note = 'Downloading %spage %d' % ('' if VIDEOS in base_url else 'fallback ', num)
+        def download_page(base_url, num, fallback=False):
+            note = 'Downloading page %d%s' % (num, ' (switch to fallback)' if fallback else '')
            return self._download_webpage(
                base_url, item_id, note, query={'page': num})

        def is_404(e):
            return isinstance(e.cause, compat_HTTPError) and e.cause.code == 404

-        entries = []
        base_url = url
        has_page = page is not None
        first_page = page if has_page else 1
@ -532,7 +573,7 @@ class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
                    # 1. https://github.com/ytdl-org/youtube-dl/issues/27853
                    if is_404(e) and page_num == first_page and VIDEOS in base_url:
                        base_url = base_url.replace(VIDEOS, '')
-                        webpage = download_page(base_url, page_num)
+                        webpage = download_page(base_url, page_num, fallback=True)
                    else:
                        raise
            except ExtractorError as e:
@ -542,11 +583,19 @@ class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
            page_entries = self._extract_entries(webpage, host)
            if not page_entries:
                break
-            entries.extend(page_entries)
+            for e in page_entries:
+                yield e
            if not self._has_more(webpage):
                break

-        return self.playlist_result(orderedSet(entries), item_id)
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        host = mobj.group('host')
+        item_id = mobj.group('id')
+
+        self._login(host)
+
+        return self.playlist_result(self._entries(url, host, item_id), item_id)


 class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
--- a/youtube_dl/extractor/svt.py
+++ b/youtube_dl/extractor/svt.py
@ -255,8 +255,10 @@ class SVTPlayIE(SVTPlayBaseIE):
            svt_id = self._search_regex(
                (r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
                 r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
+                 r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)',
                 r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',
-                 r'["\']svtId["\']\s*:\s*["\']([\da-zA-Z-]+)'),
+                 r'["\']svtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
+                 r'["\']svtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)'),
                webpage, 'video id')

        info_dict = self._extract_by_video_id(svt_id, webpage)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@ -1,3 +1,3 @@
 from __future__ import unicode_literals

-__version__ = '2021.01.24.1'
+__version__ = '2021.02.04'
Author	SHA1	Message	Date
Adrian Heine né Lang	fc88e8f0e3	[azmedien] Fix extraction (#28064 )	2021-02-03 23:57:56 +00:00
Sergey M․	cfefb7d854	release 2021.02.04	2021-02-04 04:49:25 +07:00
Sergey M․	3c07d007ca	[ChangeLog] Actualize [ci skip]	2021-02-04 04:47:30 +07:00
Sergey M․	89c5a7d5aa	[pornhub] Implement lazy playlist extraction	2021-02-04 04:42:14 +07:00
Sergey M․	2adc0c51cd	[pornhub] Add placeholder netrc machine	2021-02-04 04:20:09 +07:00
Sergey M․	1f0910bc27	[svtplay] Fix video id extraction (closes #28058 )	2021-02-04 04:17:45 +07:00
Sergey M․	e22ff4e356	[pornhub] Add support for authentication (closes #18797 , closes #21416 , closes #24294 )	2021-02-04 04:09:11 +07:00