Compare commits

...

113 Commits

Author SHA1 Message Date
Sergey M․
c3399cac19 [travis] Drop Travis CI (closes #7193, closes #12366) 2021-01-05 02:44:27 +07:00
Sergey M․
9237aaa77f [workflows/ci.yml] Add support for jython 2021-01-05 02:38:44 +07:00
Kevin O'Connor
766fcdd0fa [downloader/hls] Disable decryption in tests (#27660)
Tests truncate the download to 10241 bytes, which is not divisible by 16
and cannot be decrypted. Tests don't really care about the decrypted
content, just that the data they retrieved is the expected data.
Therefore, it's fine to just return the encrypted data to tests.

See: #27621 and #27620
2021-01-05 01:24:01 +07:00
Yurii H
f6ea29e24b [iheart] Update test description value (#27037)
the description has no HTML tags now.
2021-01-04 17:16:27 +01:00
Remita Amine
8a3797a4ab [nrk] fix extraction for videos without a legalAge rating 2021-01-04 12:16:54 +01:00
Remita Amine
745db8899d [iheart] clean HTML tags from episode description 2021-01-04 09:59:20 +01:00
Remita Amine
83db801cbf [iheart] remove print statement 2021-01-04 09:42:27 +01:00
Remita Amine
964a8eb754 [googleplus] Remove Extractor(closes #4955)(closes #7400) 2021-01-04 01:14:26 +01:00
Remita Amine
ac61f2e058 [applepodcasts] Add new extractor(#25918) 2021-01-04 01:14:26 +01:00
Remita Amine
8487e8b98a [googlepodcasts] Add new extractor 2021-01-04 01:14:26 +01:00
Remita Amine
9c484c0019 [iheart] Add new extractor for iHeartRadio(#27037) 2021-01-04 01:14:26 +01:00
Remita Amine
0e96b4b5ce [acast] clean podcast URLs 2021-01-04 01:14:26 +01:00
Remita Amine
a563c97c5c [stitcher] clean podcast URLs 2021-01-04 01:14:25 +01:00
Remita Amine
e88c9ef62a [utils] add a function to clean podcast URLs 2021-01-04 01:14:25 +01:00
Sergey M․
0889eb33e0 [xfileshare] Add support for aparat.cam (closes #27651) 2021-01-04 00:17:51 +07:00
Sergey M․
0021a2b9a1 [nrktv] Add subtitles test 2021-01-03 22:25:17 +07:00
Remita Amine
19ec468635 [twitter] Add support for summary card(closes #25121) 2021-01-03 13:37:24 +01:00
Remita Amine
491ee7efe4 [twitter] try to use a Generic fallback for unknown twitter cards(closes #25982) 2021-01-03 13:29:00 +01:00
Remita Amine
8522bcd97c [stitcher] Add support for shows and show metadata extraction(closes #20510) 2021-01-03 12:12:06 +01:00
Remita Amine
ac71fd5919 [stv] improve episode id extraction(closes #23083) 2021-01-03 10:04:57 +01:00
Sergey M․
8e953dcbb1 release 2021.01.03 2021-01-03 09:12:47 +07:00
Sergey M․
f4afb9a6a8 [ChangeLog] Actualize
[ci skip]
2021-01-03 09:10:40 +07:00
Sergey M․
d5b8cf093c [nrk] Fix age limit extraction 2021-01-03 08:58:05 +07:00
Sergey M․
5c6e84c0ff [nrk] Improve series metadata extraction (closes #27473) 2021-01-03 08:31:26 +07:00
Sergey M․
1aaee908b9 [nrk] PEP 8 2021-01-03 07:57:12 +07:00
Sergey M․
b2d9fd9c9f [nrk] Improve episode and season number extraction 2021-01-03 07:53:05 +07:00
Sergey M․
bc2f83b95e [nrktv] Fix tests 2021-01-03 07:31:03 +07:00
Sergey M․
85de33b04e [nrk] Improve series metadata extraction 2021-01-03 07:30:45 +07:00
Sergey M․
7dfd966848 [nrk] Extract subtitles 2021-01-03 07:02:13 +07:00
Sergey M․
a25d03d7cb [nrk] Fix age limit extraction 2021-01-03 06:51:21 +07:00
Sergey M․
cabfd4b1f0 [nrk] Inline _extract_from_playback 2021-01-03 06:45:17 +07:00
Sergey M․
7b643d4cd0 [nrk] Improve video id extraction 2021-01-03 06:44:28 +07:00
Sergey M․
1f1d01d498 [nrk] Add more shortcut tests 2021-01-03 06:41:37 +07:00
Sergey M․
21a42e2588 [nrk] Improve extraction (closes #27634, closes #27635)
+ Add support for mp3 formats
* Generalize and delegate all item extractors to nrk, beware ie key breakages
+ Add support for podcasts
+ Generalize nrk shortcut form to support all kind of ids
2021-01-03 06:29:52 +07:00
Sergey M․
2df93a0c4a [nrktv] Switch to playback endpoint
mediaelement endpoint is no longer in use.
2021-01-03 05:01:45 +07:00
Remita Amine
75972e200d [vvvvid] fix season metadata extraction(#18130) 2021-01-01 20:21:59 +01:00
Remita Amine
d0d838638c [stitcher] fix extraction(closes #20811)(closes #27606) 2021-01-01 18:38:46 +01:00
Remita Amine
8c17afc471 [acast] fix extraction(closes #21444)(closes #27612)(closes #27613) 2021-01-01 17:05:42 +01:00
Remita Amine
40d66e07df [arcpublishing] add missing staticmethod decorator 2020-12-31 23:42:17 +01:00
Remita Amine
ab89a8678b [arcpublishing] Add new extractor
closes #2298
closes #9340
closes #17200
2020-12-31 23:31:07 +01:00
Remita Amine
4d7d056909 [sky] add support for Sports News articles and Brighcove videos(closes #13054) 2020-12-31 12:18:30 +01:00
Remita Amine
c35bc82606 [vvvvid] skip unplayable episodes and extract akamai formats(closes #27599) 2020-12-31 12:18:30 +01:00
Remita Amine
2f56caf083 [yandexvideo] fix extraction for Python 3.4 2020-12-31 12:18:30 +01:00
Sergey M․
4066945919 release 2020.12.31 2020-12-31 05:17:55 +07:00
Sergey M․
2a84694b1e [ChangeLog] Actualize
[ci skip]
2020-12-31 05:14:33 +07:00
Sergey M․
4046ffe1e1 [redditr] Fix review issues and extract source thumbnail (closes #27503) 2020-12-31 05:07:57 +07:00
ozburo
d1d0612160 [redditr] Extract all thumbnails 2020-12-31 05:07:51 +07:00
Remita Amine
7b0f04ed1f [vvvvid] imporove info extraction 2020-12-30 18:16:47 +01:00
nixxo
2e21b06ea2 [vvvvid] add playlists support (#27574)
closes #18130
2020-12-30 18:12:17 +01:00
Remita Amine
a6f75e6e89 [yandexdisk] extract info from webpage
the public API does not return metadata when download limit is reached
2020-12-30 16:45:53 +01:00
Remita Amine
bd18824c2a [yandexdisk] fix extraction(closes #17861)(closes #27131) 2020-12-30 13:43:56 +01:00
Remita Amine
bdd044e67b [yandexvideo] use old api call as fallback 2020-12-30 13:30:11 +01:00
Remita Amine
f7e95fb2a0 [yandexvideo] fix extraction(closes #25000) 2020-12-30 09:30:30 +01:00
Remita Amine
9dd674e1d2 [utils] accept only supported protocols in url_or_none 2020-12-30 09:22:30 +01:00
Remita Amine
9c1e164e0c [YoutubeDL] Allow format filtering using audio language(#16209) 2020-12-29 19:29:08 +01:00
Remita Amine
c706fbe9fe [nbc] Remove CSNNE extractor 2020-12-29 17:21:05 +01:00
Remita Amine
ebdcf70b0d [nbc] fix NBCSport VPlayer URL extraction(closes #16640) 2020-12-29 17:15:13 +01:00
Remita Amine
5966095e65 [aenetworks] fix HistoryPlayerIE tests 2020-12-29 16:59:31 +01:00
Remita Amine
9ee984fc76 [aenetworks] add support for biography.com (closes #3863) 2020-12-29 16:13:36 +01:00
Remita Amine
53528e1d23 [uktvplay] match new video URLs(closes #17909) 2020-12-29 14:11:37 +01:00
Remita Amine
c931c4b8dd [sevenplay] detect API errors 2020-12-29 14:11:37 +01:00
Remita Amine
7acd042bbb [tenplay] fix format extraction(closes #26653) 2020-12-29 14:11:37 +01:00
Remita Amine
bcfe485e01 [brightcove] raise ExtractorError for DRM protected videos(closes #23467)(closes #27568) 2020-12-29 14:11:37 +01:00
Sergey M․
479cc6d5a1 release 2020.12.29 2020-12-29 02:52:31 +07:00
Sergey M․
38286ee729 [ChangeLog] Actualize
[ci skip]
2020-12-29 02:49:53 +07:00
Sergey M․
1a95953867 [youtube] Improve yt initial data extraction (closes #27524) 2020-12-29 02:29:34 +07:00
Sergey M․
71febd1c52 [youtube:tab] Improve URL matching (closes #27559) 2020-12-29 02:19:43 +07:00
Sergey M․
f1bc56c99b [youtube:tab] Restore retry on browse requests (closes #27313, closes #27564) 2020-12-29 02:11:48 +07:00
Remita Amine
64e419bd73 [aparat] Fix extraction
closes #22285
closes #22611
closes #23348
closes #24354
closes #24591
closes #24904
closes #25418
closes #26070
closes #26350
closes #26738
closes #27563
2020-12-28 18:19:30 +01:00
Remita Amine
782ea947b4 [brightcove] remove sonyliv specific code 2020-12-28 11:12:57 +01:00
Remita Amine
f27224d57b [piksel] import format extraction 2020-12-28 10:50:29 +01:00
Remita Amine
c007188598 [zype] Add support for uplynk videos 2020-12-27 23:47:28 +01:00
Remita Amine
af93ecfd88 [toggle] add support for live.mewatch.sg (closes #27555) 2020-12-27 22:26:20 +01:00
JamKage
794771a164 [go] Added support for FXNetworks (#26826)
Co-authored-by: James Kirrage <james.kirrage@mortgagegym.com>

closes #13972
closes #22467
closes #23754
2020-12-27 17:36:21 +00:00
Sergey M․
6f2eaaf73d [teachable] Improve embed detection (closes #26923) 2020-12-27 22:57:50 +07:00
Remita Amine
4c7a4dbc4d [mitele] fix free video extraction(#24624)(closes #25827)(closes #26757) 2020-12-27 16:22:43 +01:00
Remita Amine
f86b299d0e [telecinco] fix extraction 2020-12-27 16:22:43 +01:00
Sergey M
e474996541 [youtube] Update invidious.snopyta.org (#22667)
Co-authored-by: sofutru <54445344+sofutru@users.noreply.github.com>
2020-12-27 21:15:09 +07:00
Remita Amine
aed617e311 [amcnetworks] improve auth only video detection(closes #27548) 2020-12-27 09:00:08 +01:00
Remita Amine
0fa67c1d68 [generic] Add support for VHX Embeds(#27546) 2020-12-27 09:00:07 +01:00
Sergey M․
365b3cc72d release 2020.12.26 2020-12-26 23:17:35 +07:00
Sergey M․
a272fe21a8 [ChangeLog] Actualize
[ci skip]
2020-12-26 23:13:26 +07:00
Sergey M․
cec1c2f211 [instagram] Fix test 2020-12-26 23:00:15 +07:00
Sergey M․
12053450dc [instagram] Fix comment count extraction 2020-12-26 23:00:15 +07:00
Sergey M․
46cffb0c47 [instagram] Add support for reel URLs (closes #26234, closes #26250) 2020-12-26 23:00:15 +07:00
Remita Amine
c32a059f52 [bbc] switch to media selector v6
closes #23232
closes #23933
closes #26303
closes #26432
closes #26821
closes #27538
2020-12-26 16:57:02 +01:00
Sergey M․
6911312e53 [instagram] Improve thumbnail extraction 2020-12-26 22:42:58 +07:00
Sergey M․
f22b5a6b96 [instagram] Improve extraction (closes #22880) 2020-12-26 22:37:41 +07:00
Andrew Udvare
58e55198c1 [instagram] Fix extraction when authenticated (closes #27422) 2020-12-26 22:31:55 +07:00
Sergey M․
d61ed9f2f1 [spankbang] Remove unused import 2020-12-26 22:14:31 +07:00
Sergey M․
8bc4c6350e [spangbang:playlist] Fix extraction (closes #24087) 2020-12-26 21:58:26 +07:00
Sergey M․
cfa4ffa23b [spangbang] Add support for playlist videos 2020-12-26 21:55:12 +07:00
Sergey M․
4f1dc1463d [pornhub] Improve like and dislike count extraction (closes #27356) 2020-12-26 21:24:43 +07:00
Sergey M․
17e0f41d34 [pornhub] Fix review issues (closes #27393) 2020-12-26 21:17:17 +07:00
JChris246
b57b27ff8f [pornhub] Fix lq formats extraction (closes #27386) 2020-12-26 21:17:11 +07:00
Marco Fantauzzo
bbe8cc6662 [README.md] Update reference to cookie export extension for Chrome (closes #26885) (#27433)
The cookies.txt extension doesn't exist anymore on the Chrome Web Store (see https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg), so I propose to change the link in the README.md to another similar extension called Get cookies.txt (https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/) with the same functions and utility of the old one.

This PR close #26885
2020-12-26 20:50:39 +07:00
Sergey M․
98106accb6 [bongacams] Add extractor (closes #27440) 2020-12-26 20:30:19 +07:00
Sergey M․
af1312bfc3 [youtube:tab] Extend _VALID_URL (closes #27501) 2020-12-26 19:59:57 +07:00
Remita Amine
4c7d7215cd [theweatherchannel] fix extraction (closes #25930)(closes #26051) 2020-12-26 09:37:40 +01:00
Remita Amine
0370d9eb3d [sprout] correct typo 2020-12-25 21:35:55 +01:00
Remita Amine
1434651d20 [sprout] Add support for Universal Kids (closes #22518) 2020-12-25 21:17:44 +01:00
Remita Amine
2c312ab84a [theplatform] allow passing geo bypass countries from other extractors 2020-12-25 21:16:36 +01:00
Remita Amine
0ee78d62d5 [wistia] Add support for playlists (closes #27533) 2020-12-25 17:00:30 +01:00
Remita Amine
7f3c90ab25 [ctv] Add new extractor (closes #27525) 2020-12-24 19:40:08 +01:00
Remita Amine
1d3cd29730 [9c9media] improve info extraction 2020-12-24 17:39:13 +01:00
Remita Amine
4ef1fc9707 [youtube] fix automatic captions extraction(closes #27162)(closes #27388) 2020-12-24 16:05:03 +01:00
Remita Amine
f9e6aa1dcf [sonyliv] fix title for movies 2020-12-24 13:33:12 +01:00
Remita Amine
f83db9064b [sonyliv] fix extraction(closes #25667) 2020-12-24 13:10:20 +01:00
Remita Amine
2da9a86399 [streetvoice] fix extraction(closes #27455)(closes #27492) 2020-12-24 13:10:20 +01:00
Remita Amine
ecaa535cf4 [facebook] add support for watchparty pages(closes #27507) 2020-12-24 13:10:20 +01:00
Remita Amine
79dd92b1fe [cbslocal] fix video extraction 2020-12-24 13:10:20 +01:00
Remita Amine
bd3844c9c2 [brightcove] add another method to extract policyKey 2020-12-24 13:10:20 +01:00
Sergey M․
7bf5e3a84a [mewatch] Relax _VALID_URL (closes #27506) 2020-12-22 23:27:05 +07:00
70 changed files with 2731 additions and 1473 deletions

View File

@@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.12.22. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2020.12.22** - [ ] I've verified that I'm running youtube-dl version **2021.01.03**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.12.22 [debug] youtube-dl version 2021.01.03
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.12.22. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2020.12.22** - [ ] I've verified that I'm running youtube-dl version **2021.01.03**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.12.22. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2020.12.22** - [ ] I've verified that I'm running youtube-dl version **2021.01.03**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.12.22. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2020.12.22** - [ ] I've verified that I'm running youtube-dl version **2021.01.03**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.12.22 [debug] youtube-dl version 2021.01.03
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.12.22. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.01.03. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2020.12.22** - [ ] I've verified that I'm running youtube-dl version **2021.01.03**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -10,28 +10,52 @@ jobs:
os: [ubuntu-latest] os: [ubuntu-latest]
# TODO: python 2.6 # TODO: python 2.6
python-version: [2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7] python-version: [2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7]
python-impl: [cpython]
ytdl-test-set: [core, download] ytdl-test-set: [core, download]
run-tests-ext: [sh] run-tests-ext: [sh]
include: include:
# python 3.2 is only available on windows via setup-python # python 3.2 is only available on windows via setup-python
- os: windows-latest - os: windows-latest
python-version: 3.2 python-version: 3.2
python-impl: cpython
ytdl-test-set: core ytdl-test-set: core
run-tests-ext: bat run-tests-ext: bat
- os: windows-latest - os: windows-latest
python-version: 3.2 python-version: 3.2
python-impl: cpython
ytdl-test-set: download ytdl-test-set: download
run-tests-ext: bat run-tests-ext: bat
# jython
- os: ubuntu-latest
python-impl: jython
ytdl-test-set: core
run-tests-ext: sh
- os: ubuntu-latest
python-impl: jython
ytdl-test-set: download
run-tests-ext: sh
steps: steps:
- uses: actions/checkout@v2 - uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }} - name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2 uses: actions/setup-python@v2
if: ${{ matrix.python-impl == 'cpython' }}
with: with:
python-version: ${{ matrix.python-version }} python-version: ${{ matrix.python-version }}
- name: Set up Java 8
if: ${{ matrix.python-impl == 'jython' }}
uses: actions/setup-java@v1
with:
java-version: 8
- name: Install Jython
if: ${{ matrix.python-impl == 'jython' }}
run: |
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose - name: Install nose
run: pip install nose run: pip install nose
- name: Run tests - name: Run tests
continue-on-error: ${{ matrix.ytdl-test-set == 'download' }} continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env: env:
YTDL_TEST_SET: ${{ matrix.ytdl-test-set }} YTDL_TEST_SET: ${{ matrix.ytdl-test-set }}
run: ./devscripts/run_tests.${{ matrix.run-tests-ext }} run: ./devscripts/run_tests.${{ matrix.run-tests-ext }}

View File

@@ -1,50 +0,0 @@
language: python
python:
- "2.6"
- "2.7"
- "3.2"
- "3.3"
- "3.4"
- "3.5"
- "3.6"
- "pypy"
- "pypy3"
dist: trusty
env:
- YTDL_TEST_SET=core
# - YTDL_TEST_SET=download
jobs:
include:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=core
# - python: 3.7
# dist: xenial
# env: YTDL_TEST_SET=download
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=core
# - python: 3.8
# dist: xenial
# env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core
# - python: 3.8-dev
# dist: xenial
# env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
# - env: JYTHON=true; YTDL_TEST_SET=download
- name: flake8
python: 3.8
dist: xenial
install: pip install flake8
script: flake8 .
fast_finish: true
allow_failures:
# - env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
# - env: JYTHON=true; YTDL_TEST_SET=download
before_install:
- if [ "$JYTHON" == "true" ]; then ./devscripts/install_jython.sh; export PATH="$HOME/jython/bin:$PATH"; fi
script: ./devscripts/run_tests.sh

View File

@@ -1,3 +1,100 @@
version 2021.01.03
Extractors
* [nrk] Improve series metadata extraction (#27473)
+ [nrk] Extract subtitles
* [nrk] Fix age limit extraction
* [nrk] Improve video id extraction
+ [nrk] Add support for podcasts (#27634, #27635)
* [nrk] Generalize and delegate all item extractors to nrk
+ [nrk] Add support for mp3 formats
* [nrktv] Switch to playback endpoint
* [vvvvid] Fix season metadata extraction (#18130)
* [stitcher] Fix extraction (#20811, #27606)
* [acast] Fix extraction (#21444, #27612, #27613)
+ [arcpublishing] Add support for arcpublishing.com (#2298, #9340, #17200)
+ [sky] Add support for Sports News articles and Brighcove videos (#13054)
+ [vvvvid] Extract akamai formats
* [vvvvid] Skip unplayable episodes (#27599)
* [yandexvideo] Fix extraction for Python 3.4
version 2020.12.31
Core
* [utils] Accept only supported protocols in url_or_none
* [YoutubeDL] Allow format filtering using audio language (#16209)
Extractors
+ [redditr] Extract all thumbnails (#27503)
* [vvvvid] Improve info extraction
+ [vvvvid] Add support for playlists (#18130, #27574)
+ [yandexdisk] Extract info from webpage
* [yandexdisk] Fix extraction (#17861, #27131)
* [yandexvideo] Use old API call as fallback
* [yandexvideo] Fix extraction (#25000)
- [nbc] Remove CSNNE extractor
* [nbc] Fix NBCSport VPlayer URL extraction (#16640)
+ [aenetworks] Add support for biography.com (#3863)
* [uktvplay] Match new video URLs (#17909)
* [sevenplay] Detect API errors
* [tenplay] Fix format extraction (#26653)
* [brightcove] Raise error for DRM protected videos (#23467, #27568)
version 2020.12.29
Extractors
* [youtube] Improve yt initial data extraction (#27524)
* [youtube:tab] Improve URL matching #27559)
* [youtube:tab] Restore retry on browse requests (#27313, #27564)
* [aparat] Fix extraction (#22285, #22611, #23348, #24354, #24591, #24904,
#25418, #26070, #26350, #26738, #27563)
- [brightcove] Remove sonyliv specific code
* [piksel] Improve format extraction
+ [zype] Add support for uplynk videos
+ [toggle] Add support for live.mewatch.sg (#27555)
+ [go] Add support for fxnow.fxnetworks.com (#13972, #22467, #23754, #26826)
* [teachable] Improve embed detection (#26923)
* [mitele] Fix free video extraction (#24624, #25827, #26757)
* [telecinco] Fix extraction
* [youtube] Update invidious.snopyta.org (#22667)
* [amcnetworks] Improve auth only video detection (#27548)
+ [generic] Add support for VHX Embeds (#27546)
version 2020.12.26
Extractors
* [instagram] Fix comment count extraction
+ [instagram] Add support for reel URLs (#26234, #26250)
* [bbc] Switch to media selector v6 (#23232, #23933, #26303, #26432, #26821,
#27538)
* [instagram] Improve thumbnail extraction
* [instagram] Fix extraction when authenticated (#22880, #26377, #26981,
#27422)
* [spankbang:playlist] Fix extraction (#24087)
+ [spankbang] Add support for playlist videos
* [pornhub] Improve like and dislike count extraction (#27356)
* [pornhub] Fix lq formats extraction (#27386, #27393)
+ [bongacams] Add support for bongacams.com (#27440)
* [youtube:tab] Extend URL regular expression (#27501)
* [theweatherchannel] Fix extraction (#25930, #26051)
+ [sprout] Add support for Universal Kids (#22518)
* [theplatform] Allow passing geo bypass countries from other extractors
+ [wistia] Add support for playlists (#27533)
+ [ctv] Add support for ctv.ca (#27525)
* [9c9media] Improve info extraction
* [youtube] Fix automatic captions extraction (#27162, #27388)
* [sonyliv] Fix title for movies
* [sonyliv] Fix extraction (#25667)
* [streetvoice] Fix extraction (#27455, #27492)
+ [facebook] Add support for watchparty pages (#27507)
* [cbslocal] Fix video extraction
+ [brightcove] Add another method to extract policyKey
* [mewatch] Relax URL regular expression (#27506)
version 2020.12.22 version 2020.12.22
Core Core

View File

@@ -678,6 +678,7 @@ Also filtering work for comparisons `=` (equals), `^=` (starts with), `$=` (ends
- `container`: Name of the container format - `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`) - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- `format_id`: A short description of the format - `format_id`: A short description of the format
- `language`: Language code
Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain). Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain).
@@ -880,7 +881,7 @@ Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the op
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox). In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.

View File

@@ -1,5 +0,0 @@
#!/bin/bash
wget http://central.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar
java -jar jython-installer-2.7.1.jar -s -d "$HOME/jython"
$HOME/jython/bin/jython -m pip install nose

View File

@@ -57,6 +57,7 @@
- **appletrailers** - **appletrailers**
- **appletrailers:section** - **appletrailers:section**
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ArcPublishing**
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **ARDBetaMediathek** - **ARDBetaMediathek**
@@ -104,6 +105,7 @@
- **BilibiliAudioAlbum** - **BilibiliAudioAlbum**
- **BiliBiliPlayer** - **BiliBiliPlayer**
- **BioBioChileTV** - **BioBioChileTV**
- **Biography**
- **BIQLE** - **BIQLE**
- **BitChute** - **BitChute**
- **BitChuteChannel** - **BitChuteChannel**
@@ -112,6 +114,7 @@
- **blinkx** - **blinkx**
- **Bloomberg** - **Bloomberg**
- **BokeCC** - **BokeCC**
- **BongaCams**
- **BostonGlobe** - **BostonGlobe**
- **Box** - **Box**
- **Bpb**: Bundeszentrale für politische Bildung - **Bpb**: Bundeszentrale für politische Bildung
@@ -146,6 +149,7 @@
- **CBS** - **CBS**
- **CBSInteractive** - **CBSInteractive**
- **CBSLocal** - **CBSLocal**
- **CBSLocalArticle**
- **cbsnews**: CBS News - **cbsnews**: CBS News
- **cbsnews:embed** - **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
@@ -195,9 +199,9 @@
- **CrooksAndLiars** - **CrooksAndLiars**
- **crunchyroll** - **crunchyroll**
- **crunchyroll:playlist** - **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV**
- **CTVNews** - **CTVNews**
- **cu.ntv.co.jp**: Nippon Television Network - **cu.ntv.co.jp**: Nippon Television Network
- **Culturebox** - **Culturebox**
@@ -314,7 +318,6 @@
- **Funk** - **Funk**
- **Fusion** - **Fusion**
- **Fux** - **Fux**
- **FXNetworks**
- **Gaia** - **Gaia**
- **GameInformer** - **GameInformer**
- **GameSpot** - **GameSpot**
@@ -347,6 +350,7 @@
- **hgtv.com:show** - **hgtv.com:show**
- **HiDive** - **HiDive**
- **HistoricFilms** - **HistoricFilms**
- **history:player**
- **history:topic**: History.com Topic - **history:topic**: History.com Topic
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
@@ -607,6 +611,7 @@
- **Npr** - **Npr**
- **NRK** - **NRK**
- **NRKPlaylist** - **NRKPlaylist**
- **NRKRadioPodkast**
- **NRKSkole**: NRK Skole - **NRKSkole**: NRK Skole
- **NRKTV**: NRK TV and NRK Radio - **NRKTV**: NRK TV and NRK Radio
- **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte - **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte
@@ -810,12 +815,13 @@
- **ShowRoomLive** - **ShowRoomLive**
- **Sina** - **Sina**
- **sky.it** - **sky.it**
- **sky:news**
- **sky:sports**
- **sky:sports:news**
- **skyacademy.it** - **skyacademy.it**
- **SkylineWebcams** - **SkylineWebcams**
- **SkyNews**
- **skynewsarabia:article** - **skynewsarabia:article**
- **skynewsarabia:video** - **skynewsarabia:video**
- **SkySports**
- **Slideshare** - **Slideshare**
- **SlidesLive** - **SlidesLive**
- **Slutload** - **Slutload**
@@ -1086,6 +1092,7 @@
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VVVVID** - **VVVVID**
- **VVVVIDShow**
- **VyboryMos** - **VyboryMos**
- **Vzaar** - **Vzaar**
- **Wakanim** - **Wakanim**
@@ -1108,6 +1115,7 @@
- **WeiboMobile** - **WeiboMobile**
- **WeiqiTV**: WQTV - **WeiqiTV**: WQTV
- **Wistia** - **Wistia**
- **WistiaPlaylist**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop** - **WorldStarHipHop**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal

View File

@@ -36,7 +36,7 @@ class TestAllURLsMatching(unittest.TestCase):
assertPlaylist('UUBABnxM4Ar9ten8Mdjj1j0Q') # 585 assertPlaylist('UUBABnxM4Ar9ten8Mdjj1j0Q') # 585
assertPlaylist('PL63F0C78739B09958') assertPlaylist('PL63F0C78739B09958')
assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q') assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8') assertTab('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC') assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertTab('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') # 668 assertTab('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') # 668
self.assertFalse('youtube:playlist' in self.matching_ies('PLtS2H6bU1M')) self.assertFalse('youtube:playlist' in self.matching_ies('PLtS2H6bU1M'))
@@ -57,8 +57,8 @@ class TestAllURLsMatching(unittest.TestCase):
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec') assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos') assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
# def test_youtube_user_matching(self): def test_youtube_user_matching(self):
# self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:tab']) self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:tab'])
def test_youtube_feeds(self): def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/library', ['youtube:tab']) self.assertMatch('https://www.youtube.com/feed/library', ['youtube:tab'])

View File

@@ -21,6 +21,7 @@ from youtube_dl.utils import (
encode_base_n, encode_base_n,
caesar, caesar,
clean_html, clean_html,
clean_podcast_url,
date_from_str, date_from_str,
DateRange, DateRange,
detect_exe_version, detect_exe_version,
@@ -554,6 +555,11 @@ class TestUtil(unittest.TestCase):
self.assertEqual(url_or_none('http$://foo.de'), None) self.assertEqual(url_or_none('http$://foo.de'), None)
self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de') self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de')
self.assertEqual(url_or_none('//foo.de'), '//foo.de') self.assertEqual(url_or_none('//foo.de'), '//foo.de')
self.assertEqual(url_or_none('s3://foo.de'), None)
self.assertEqual(url_or_none('rtmpte://foo.de'), 'rtmpte://foo.de')
self.assertEqual(url_or_none('mms://foo.de'), 'mms://foo.de')
self.assertEqual(url_or_none('rtspu://foo.de'), 'rtspu://foo.de')
self.assertEqual(url_or_none('ftps://foo.de'), 'ftps://foo.de')
def test_parse_age_limit(self): def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None) self.assertEqual(parse_age_limit(None), None)
@@ -1465,6 +1471,10 @@ Line 1
self.assertEqual(get_elements_by_attribute('class', 'foo', html), []) self.assertEqual(get_elements_by_attribute('class', 'foo', html), [])
self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), []) self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), [])
def test_clean_podcast_url(self):
self.assertEqual(clean_podcast_url('https://www.podtrac.com/pts/redirect.mp3/chtbl.com/track/5899E/traffic.megaphone.fm/HSW7835899191.mp3'), 'https://traffic.megaphone.fm/HSW7835899191.mp3')
self.assertEqual(clean_podcast_url('https://play.podtrac.com/npr-344098539/edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3'), 'https://edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -1083,7 +1083,7 @@ class YoutubeDL(object):
'*=': lambda attr, value: value in attr, '*=': lambda attr, value: value in attr,
} }
str_operator_rex = re.compile(r'''(?x) str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id) \s*(?P<key>ext|acodec|vcodec|container|protocol|format_id|language)
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)? \s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+) \s*(?P<value>[a-zA-Z0-9._-]+)
\s*$ \s*$

View File

@@ -172,8 +172,12 @@ class HlsFD(FragmentFD):
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence) iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen( decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read() self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
frag_content = AES.new( # Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content) # size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content) self._append_fragment(ctx, frag_content)
# We only download the first fragment during the test # We only download the first fragment during the test
if test: if test:

View File

@@ -2,21 +2,48 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import functools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
float_or_none, clean_podcast_url,
int_or_none, int_or_none,
try_get, parse_iso8601,
unified_timestamp,
OnDemandPagedList,
) )
class ACastIE(InfoExtractor): class ACastBaseIE(InfoExtractor):
def _extract_episode(self, episode, show_info):
title = episode['title']
info = {
'id': episode['id'],
'display_id': episode.get('episodeUrl'),
'url': clean_podcast_url(episode['url']),
'title': title,
'description': clean_html(episode.get('description') or episode.get('summary')),
'thumbnail': episode.get('image'),
'timestamp': parse_iso8601(episode.get('publishDate')),
'duration': int_or_none(episode.get('duration')),
'filesize': int_or_none(episode.get('contentLength')),
'season_number': int_or_none(episode.get('season')),
'episode': title,
'episode_number': int_or_none(episode.get('episode')),
}
info.update(show_info)
return info
def _extract_show_info(self, show):
return {
'creator': show.get('author'),
'series': show.get('title'),
}
def _call_api(self, path, video_id, query=None):
return self._download_json(
'https://feeder.acast.com/api/v1/shows/' + path, video_id, query=query)
class ACastIE(ACastBaseIE):
IE_NAME = 'acast' IE_NAME = 'acast'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@@ -28,15 +55,15 @@ class ACastIE(InfoExtractor):
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna', 'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': '16d936099ec5ca2d5869e3a813ee8dc4', 'md5': 'f5598f3ad1e4776fed12ec1407153e4b',
'info_dict': { 'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9', 'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3', 'ext': 'mp3',
'title': '2. Raggarmordet - Röster ur det förflutna', 'title': '2. Raggarmordet - Röster ur det förflutna',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4', 'description': 'md5:a992ae67f4d98f1c0141598f7bebbf67',
'timestamp': 1477346700, 'timestamp': 1477346700,
'upload_date': '20161024', 'upload_date': '20161024',
'duration': 2766.602563, 'duration': 2766,
'creator': 'Anton Berg & Martin Johnson', 'creator': 'Anton Berg & Martin Johnson',
'series': 'Spår', 'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna', 'episode': '2. Raggarmordet - Röster ur det förflutna',
@@ -45,7 +72,7 @@ class ACastIE(InfoExtractor):
'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015', 'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22', 'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9', 'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
@@ -54,40 +81,14 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups() channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json( episode = self._call_api(
'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id), '%s/episodes/%s' % (channel, display_id),
display_id) display_id, {'showInfo': 'true'})
media_url = s['url'] return self._extract_episode(
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id): episode, self._extract_show_info(episode.get('show') or {}))
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e.get('name') or s['title']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': media_url,
'title': title,
'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),
'season_number': int_or_none(e.get('seasonNumber')),
'episode': title,
'episode_number': int_or_none(e.get('episodeNumber')),
}
class ACastChannelIE(InfoExtractor): class ACastChannelIE(ACastBaseIE):
IE_NAME = 'acast:channel' IE_NAME = 'acast:channel'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@@ -102,34 +103,24 @@ class ACastChannelIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786', 'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus', 'title': 'Today in Focus',
'description': 'md5:9ba5564de5ce897faeb12963f4537a64', 'description': 'md5:c09ce28c91002ce4ffce71d6504abaae',
}, },
'playlist_mincount': 35, 'playlist_mincount': 200,
}, { }, {
'url': 'http://play.acast.com/s/ft-banking-weekly', 'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True, 'only_matching': True,
}] }]
_API_BASE_URL = 'https://play.acast.com/api/'
_PAGE_SIZE = 10
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url) return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
def _real_extract(self, url): def _real_extract(self, url):
channel_slug = self._match_id(url) show_slug = self._match_id(url)
channel_data = self._download_json( show = self._call_api(show_slug, show_slug)
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug) show_info = self._extract_show_info(show)
entries = OnDemandPagedList(functools.partial( entries = []
self._fetch_page, channel_slug), self._PAGE_SIZE) for episode in (show.get('episodes') or []):
return self.playlist_result(entries, compat_str( entries.append(self._extract_episode(episode, show_info))
channel_data['id']), channel_data['name'], channel_data.get('description')) return self.playlist_result(
entries, show.get('id'), show.get('title'), show.get('description'))

View File

@@ -6,6 +6,7 @@ import re
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
GeoRestrictedError,
int_or_none, int_or_none,
update_url_query, update_url_query,
urlencode_postdata, urlencode_postdata,
@@ -28,6 +29,7 @@ class AENetworksBaseIE(ThePlatformIE):
'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'), 'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'),
'fyi.tv': ('FYI', 'fyi'), 'fyi.tv': ('FYI', 'fyi'),
'historyvault.com': (None, 'historyvault'), 'historyvault.com': (None, 'historyvault'),
'biography.com': (None, 'biography'),
} }
def _extract_aen_smil(self, smil_url, video_id, auth=None): def _extract_aen_smil(self, smil_url, video_id, auth=None):
@@ -54,6 +56,8 @@ class AENetworksBaseIE(ThePlatformIE):
tp_formats, tp_subtitles = self._extract_theplatform_smil( tp_formats, tp_subtitles = self._extract_theplatform_smil(
m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes'])) m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes']))
except ExtractorError as e: except ExtractorError as e:
if isinstance(e, GeoRestrictedError):
raise
last_e = e last_e = e
continue continue
formats.extend(tp_formats) formats.extend(tp_formats)
@@ -67,6 +71,34 @@ class AENetworksBaseIE(ThePlatformIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand = self._DOMAIN_MAP[domain]
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0]
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({
'title': title,
'series': result.get('seriesName'),
'season_number': int_or_none(result.get('tvSeasonNumber')),
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
})
return info
class AENetworksIE(AENetworksBaseIE): class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
@@ -139,32 +171,7 @@ class AENetworksIE(AENetworksBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
domain, canonical = re.match(self._VALID_URL, url).groups() domain, canonical = re.match(self._VALID_URL, url).groups()
requestor_id, brand = self._DOMAIN_MAP[domain] return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url)
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
canonical, query={'filter[canonical]': '/' + canonical})['results'][0]
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({
'title': title,
'series': result.get('seriesName'),
'season_number': int_or_none(result.get('tvSeasonNumber')),
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
})
return info
class AENetworksListBaseIE(AENetworksBaseIE): class AENetworksListBaseIE(AENetworksBaseIE):
@@ -294,3 +301,42 @@ class HistoryTopicIE(AENetworksBaseIE):
return self.url_result( return self.url_result(
'http://www.history.com/videos/' + display_id, 'http://www.history.com/videos/' + display_id,
AENetworksIE.ie_key()) AENetworksIE.ie_key())
class HistoryPlayerIE(AENetworksBaseIE):
IE_NAME = 'history:player'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|biography)\.com)/player/(?P<id>\d+)'
_TESTS = []
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_aetn_info(domain, 'id', video_id, url)
class BiographyIE(AENetworksBaseIE):
_VALID_URL = r'https?://(?:www\.)?biography\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.biography.com/video/vincent-van-gogh-full-episode-2075049808',
'info_dict': {
'id': '30322987',
'ext': 'mp4',
'title': 'Vincent Van Gogh - Full Episode',
'description': 'A full biography about the most influential 20th century painter, Vincent Van Gogh.',
'timestamp': 1311970571,
'upload_date': '20110729',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
player_url = self._search_regex(
r'<phoenix-iframe[^>]+src="(%s)' % HistoryPlayerIE._VALID_URL,
webpage, 'player URL')
return self.url_result(player_url, HistoryPlayerIE.ie_key())

View File

@@ -80,7 +80,8 @@ class AMCNetworksIE(ThePlatformIE):
title = theplatform_metadata['title'] title = theplatform_metadata['title']
rating = try_get( rating = try_get(
theplatform_metadata, lambda x: x['ratings'][0]['rating']) theplatform_metadata, lambda x: x['ratings'][0]['rating'])
if properties.get('videoCategory') == 'TVE-Auth': video_category = properties.get('videoCategory')
if video_category and video_category.endswith('-Auth'):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating) requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
get_element_by_id,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
mimetype2ext, mimetype2ext,
@@ -39,23 +40,15 @@ class AparatIE(InfoExtractor):
webpage = self._download_webpage(url, video_id, fatal=False) webpage = self._download_webpage(url, video_id, fatal=False)
if not webpage: if not webpage:
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id, 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id) video_id)
options = self._parse_json( options = self._parse_json(self._search_regex(
self._search_regex( r'options\s*=\s*({.+?})\s*;', webpage, 'options'), video_id)
r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
webpage, 'options', group='value'),
video_id)
player = options['plugins']['sabaPlayerPlugin']
formats = [] formats = []
for sources in player['multiSRC']: for sources in (options.get('multiSRC') or []):
for item in sources: for item in sources:
if not isinstance(item, dict): if not isinstance(item, dict):
continue continue
@@ -85,11 +78,12 @@ class AparatIE(InfoExtractor):
info = self._search_json_ld(webpage, video_id, default={}) info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'): if not info.get('title'):
info['title'] = player['title'] info['title'] = get_element_by_id('videoTitle', webpage) or \
self._html_search_meta(['og:title', 'twitter:title', 'DC.Title', 'title'], webpage, fatal=True)
return merge_dicts(info, { return merge_dicts(info, {
'id': video_id, 'id': video_id,
'thumbnail': url_or_none(options.get('poster')), 'thumbnail': url_or_none(options.get('poster')),
'duration': int_or_none(player.get('duration')), 'duration': int_or_none(options.get('duration')),
'formats': formats, 'formats': formats,
}) })

View File

@@ -0,0 +1,61 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_podcast_url,
int_or_none,
parse_iso8601,
try_get,
)
class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'md5': 'df02e6acb11c10e844946a39e7222b08',
'info_dict': {
'id': '1000482637777',
'ext': 'mp3',
'title': '207 - Whitney Webb Returns',
'description': 'md5:13a73bade02d2e43737751e3987e1399',
'upload_date': '20200705',
'timestamp': 1593921600,
'duration': 6425,
'series': 'The Tim Dillon Show',
}
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/id1135137367?i=1000482637777',
'only_matching': True,
}]
def _real_extract(self, url):
episode_id = self._match_id(url)
webpage = self._download_webpage(url, episode_id)
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
episode = ember_data['data']['attributes']
description = episode.get('description') or {}
series = None
for inc in (ember_data.get('included') or []):
if inc.get('type') == 'media/podcast':
series = try_get(inc, lambda x: x['attributes']['name'])
return {
'id': episode_id,
'title': episode['name'],
'url': clean_podcast_url(episode['assetUrl']),
'description': description.get('standard') or description.get('short'),
'timestamp': parse_iso8601(episode.get('releaseDateTime')),
'duration': int_or_none(episode.get('durationInMilliseconds'), 1000),
'series': series,
}

View File

@@ -0,0 +1,174 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_iso8601,
try_get,
)
class ArcPublishingIE(InfoExtractor):
_UUID_REGEX = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
_VALID_URL = r'arcpublishing:(?P<org>[a-z]+):(?P<id>%s)' % _UUID_REGEX
_TESTS = [{
# https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/
'url': 'arcpublishing:adn:8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'only_matching': True,
}, {
# https://www.bostonglobe.com/video/2020/12/30/metro/footage-released-showing-officer-talking-about-striking-protesters-with-car/
'url': 'arcpublishing:bostonglobe:232b7ae6-7d73-432d-bc0a-85dbf0119ab1',
'only_matching': True,
}, {
# https://www.actionnewsjax.com/video/live-stream/
'url': 'arcpublishing:cmg:cfb1cf1b-3ab5-4d1b-86c5-a5515d311f2a',
'only_matching': True,
}, {
# https://elcomercio.pe/videos/deportes/deporte-total-futbol-peruano-seleccion-peruana-la-valorizacion-de-los-peruanos-en-el-exterior-tras-un-2020-atipico-nnav-vr-video-noticia/
'url': 'arcpublishing:elcomercio:27a7e1f8-2ec7-4177-874f-a4feed2885b3',
'only_matching': True,
}, {
# https://www.clickondetroit.com/video/community/2020/05/15/events-surrounding-woodward-dream-cruise-being-canceled/
'url': 'arcpublishing:gmg:c8793fb2-8d44-4242-881e-2db31da2d9fe',
'only_matching': True,
}, {
# https://www.wabi.tv/video/2020/12/30/trenton-company-making-equipment-pfizer-covid-vaccine/
'url': 'arcpublishing:gray:0b0ba30e-032a-4598-8810-901d70e6033e',
'only_matching': True,
}, {
# https://www.lateja.cr/el-mundo/video-china-aprueba-con-condiciones-su-primera/dfcbfa57-527f-45ff-a69b-35fe71054143/video/
'url': 'arcpublishing:gruponacion:dfcbfa57-527f-45ff-a69b-35fe71054143',
'only_matching': True,
}, {
# https://www.fifthdomain.com/video/2018/03/09/is-america-vulnerable-to-a-cyber-attack/
'url': 'arcpublishing:mco:aa0ca6fe-1127-46d4-b32c-be0d6fdb8055',
'only_matching': True,
}, {
# https://www.vl.no/kultur/2020/12/09/en-melding-fra-en-lytter-endret-julelista-til-lewi-bergrud/
'url': 'arcpublishing:mentormedier:47a12084-650b-4011-bfd0-3699b6947b2d',
'only_matching': True,
}, {
# https://www.14news.com/2020/12/30/whiskey-theft-caught-camera-henderson-liquor-store/
'url': 'arcpublishing:raycom:b89f61f8-79fa-4c09-8255-e64237119bf7',
'only_matching': True,
}, {
# https://www.theglobeandmail.com/world/video-ethiopian-woman-who-became-symbol-of-integration-in-italy-killed-on/
'url': 'arcpublishing:tgam:411b34c1-8701-4036-9831-26964711664b',
'only_matching': True,
}, {
# https://www.pilotonline.com/460f2931-8130-4719-8ea1-ffcb2d7cb685-132.html
'url': 'arcpublishing:tronc:460f2931-8130-4719-8ea1-ffcb2d7cb685',
'only_matching': True,
}]
_POWA_DEFAULTS = [
(['cmg', 'prisa'], '%s-config-prod.api.cdn.arcpublishing.com/video'),
([
'adn', 'advancelocal', 'answers', 'bonnier', 'bostonglobe', 'demo',
'gmg', 'gruponacion', 'infobae', 'mco', 'nzme', 'pmn', 'raycom',
'spectator', 'tbt', 'tgam', 'tronc', 'wapo', 'wweek',
], 'video-api-cdn.%s.arcpublishing.com/api'),
]
@staticmethod
def _extract_urls(webpage):
entries = []
# https://arcpublishing.atlassian.net/wiki/spaces/POWA/overview
for powa_el in re.findall(r'(<div[^>]+class="[^"]*\bpowa\b[^"]*"[^>]+data-uuid="%s"[^>]*>)' % ArcPublishingIE._UUID_REGEX, webpage):
powa = extract_attributes(powa_el) or {}
org = powa.get('data-org')
uuid = powa.get('data-uuid')
if org and uuid:
entries.append('arcpublishing:%s:%s' % (org, uuid))
return entries
def _real_extract(self, url):
org, uuid = re.match(self._VALID_URL, url).groups()
for orgs, tmpl in self._POWA_DEFAULTS:
if org in orgs:
base_api_tmpl = tmpl
break
else:
base_api_tmpl = '%s-prod-cdn.video-api.arcpublishing.com/api'
if org == 'wapo':
org = 'washpost'
video = self._download_json(
'https://%s/v1/ansvideos/findByUuid' % (base_api_tmpl % org),
uuid, query={'uuid': uuid})[0]
title = video['headlines']['basic']
is_live = video.get('status') == 'live'
urls = []
formats = []
for s in video.get('streams', []):
s_url = s.get('url')
if not s_url or s_url in urls:
continue
urls.append(s_url)
stream_type = s.get('stream_type')
if stream_type == 'smil':
smil_formats = self._extract_smil_formats(
s_url, uuid, fatal=False)
for f in smil_formats:
if f['url'].endswith('/cfx/st'):
f['app'] = 'cfx/st'
if not f['play_path'].startswith('mp4:'):
f['play_path'] = 'mp4:' + f['play_path']
if isinstance(f['tbr'], float):
f['vbr'] = f['tbr'] * 1000
del f['tbr']
f['format_id'] = 'rtmp-%d' % f['vbr']
formats.extend(smil_formats)
elif stream_type in ('ts', 'hls'):
m3u8_formats = self._extract_m3u8_formats(
s_url, uuid, 'mp4', 'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False)
if all([f.get('acodec') == 'none' for f in m3u8_formats]):
continue
for f in m3u8_formats:
if f.get('acodec') == 'none':
f['preference'] = -40
elif f.get('vcodec') == 'none':
f['preference'] = -50
height = f.get('height')
if not height:
continue
vbr = self._search_regex(
r'[_x]%d[_-](\d+)' % height, f['url'], 'vbr', default=None)
if vbr:
f['vbr'] = int(vbr)
formats.extend(m3u8_formats)
else:
vbr = int_or_none(s.get('bitrate'))
formats.append({
'format_id': '%s-%d' % (stream_type, vbr) if vbr else stream_type,
'vbr': vbr,
'width': int_or_none(s.get('width')),
'height': int_or_none(s.get('height')),
'filesize': int_or_none(s.get('filesize')),
'url': s_url,
'preference': -1,
})
self._sort_formats(
formats, ('preference', 'width', 'height', 'vbr', 'filesize', 'tbr', 'ext', 'format_id'))
subtitles = {}
for subtitle in (try_get(video, lambda x: x['subtitles']['urls'], list) or []):
subtitle_url = subtitle.get('url')
if subtitle_url:
subtitles.setdefault('en', []).append({'url': subtitle_url})
return {
'id': uuid,
'title': self._live_title(title) if is_live else title,
'thumbnail': try_get(video, lambda x: x['promo_image']['url']),
'description': try_get(video, lambda x: x['subheadlines']['basic']),
'formats': formats,
'duration': int_or_none(video.get('duration'), 100),
'timestamp': parse_iso8601(video.get('created_date')),
'subtitles': subtitles,
'is_live': is_live,
}

View File

@@ -49,22 +49,17 @@ class BBCCoUkIE(InfoExtractor):
_LOGIN_URL = 'https://account.bbc.com/signin' _LOGIN_URL = 'https://account.bbc.com/signin'
_NETRC_MACHINE = 'bbc' _NETRC_MACHINE = 'bbc'
_MEDIASELECTOR_URLS = [ _MEDIA_SELECTOR_URL_TEMPL = 'https://open.live.bbc.co.uk/mediaselector/6/select/version/2.0/mediaset/%s/vpid/%s'
_MEDIA_SETS = [
# Provides HQ HLS streams with even better quality that pc mediaset but fails # Provides HQ HLS streams with even better quality that pc mediaset but fails
# with geolocation in some cases when it's even not geo restricted at all (e.g. # with geolocation in some cases when it's even not geo restricted at all (e.g.
# http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable. # http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s', 'iptv-all',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s', 'pc',
] ]
_MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist' _EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
_NAMESPACES = (
_MEDIASELECTION_NS,
_EMP_PLAYLIST_NS,
)
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.bbc.co.uk/programmes/b039g8p7', 'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
@@ -261,8 +256,6 @@ class BBCCoUkIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
def _login(self): def _login(self):
username, password = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
@@ -307,22 +300,14 @@ class BBCCoUkIE(InfoExtractor):
def _extract_items(self, playlist): def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS) return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
def _findall_ns(self, element, xpath):
elements = []
for ns in self._NAMESPACES:
elements.extend(element.findall(xpath % ns))
return elements
def _extract_medias(self, media_selection): def _extract_medias(self, media_selection):
error = media_selection.find('./{%s}error' % self._MEDIASELECTION_NS) error = media_selection.get('result')
if error is None: if error:
media_selection.find('./{%s}error' % self._EMP_PLAYLIST_NS) raise BBCCoUkIE.MediaSelectionError(error)
if error is not None: return media_selection.get('media') or []
raise BBCCoUkIE.MediaSelectionError(error.get('id'))
return self._findall_ns(media_selection, './{%s}media')
def _extract_connections(self, media): def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection') return media.get('connection') or []
def _get_subtitles(self, media, programme_id): def _get_subtitles(self, media, programme_id):
subtitles = {} subtitles = {}
@@ -334,13 +319,13 @@ class BBCCoUkIE(InfoExtractor):
cc_url, programme_id, 'Downloading captions', fatal=False) cc_url, programme_id, 'Downloading captions', fatal=False)
if not isinstance(captions, compat_etree_Element): if not isinstance(captions, compat_etree_Element):
continue continue
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en') subtitles['en'] = [
subtitles[lang] = [
{ {
'url': connection.get('href'), 'url': connection.get('href'),
'ext': 'ttml', 'ext': 'ttml',
}, },
] ]
break
return subtitles return subtitles
def _raise_extractor_error(self, media_selection_error): def _raise_extractor_error(self, media_selection_error):
@@ -350,10 +335,10 @@ class BBCCoUkIE(InfoExtractor):
def _download_media_selector(self, programme_id): def _download_media_selector(self, programme_id):
last_exception = None last_exception = None
for mediaselector_url in self._MEDIASELECTOR_URLS: for media_set in self._MEDIA_SETS:
try: try:
return self._download_media_selector_url( return self._download_media_selector_url(
mediaselector_url % programme_id, programme_id) self._MEDIA_SELECTOR_URL_TEMPL % (media_set, programme_id), programme_id)
except BBCCoUkIE.MediaSelectionError as e: except BBCCoUkIE.MediaSelectionError as e:
if e.id in ('notukerror', 'geolocation', 'selectionunavailable'): if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
last_exception = e last_exception = e
@@ -362,8 +347,8 @@ class BBCCoUkIE(InfoExtractor):
self._raise_extractor_error(last_exception) self._raise_extractor_error(last_exception)
def _download_media_selector_url(self, url, programme_id=None): def _download_media_selector_url(self, url, programme_id=None):
media_selection = self._download_xml( media_selection = self._download_json(
url, programme_id, 'Downloading media selection XML', url, programme_id, 'Downloading media selection JSON',
expected_status=(403, 404)) expected_status=(403, 404))
return self._process_media_selector(media_selection, programme_id) return self._process_media_selector(media_selection, programme_id)
@@ -377,7 +362,6 @@ class BBCCoUkIE(InfoExtractor):
if kind in ('video', 'audio'): if kind in ('video', 'audio'):
bitrate = int_or_none(media.get('bitrate')) bitrate = int_or_none(media.get('bitrate'))
encoding = media.get('encoding') encoding = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width')) width = int_or_none(media.get('width'))
height = int_or_none(media.get('height')) height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size')) file_size = int_or_none(media.get('media_file_size'))
@@ -392,8 +376,6 @@ class BBCCoUkIE(InfoExtractor):
supplier = connection.get('supplier') supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat') transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist # ASX playlist
if supplier == 'asx': if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)): for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
@@ -408,20 +390,11 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native', href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for f in usp_formats:
if f.get('height') and f['height'] > 720:
continue
formats.append(f)
elif transfer_format == 'hds': elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False)) href, programme_id, f4m_id=format_id, fatal=False))
else: else:
if not service and not supplier and bitrate: if not supplier and bitrate:
format_id += '-%d' % bitrate format_id += '-%d' % bitrate
fmt = { fmt = {
'format_id': format_id, 'format_id': format_id,
@@ -554,7 +527,7 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page') webpage = self._download_webpage(url, group_id, 'Downloading video page')
error = self._search_regex( error = self._search_regex(
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<', r'<div\b[^>]+\bclass=["\'](?:smp|playout)__message delta["\'][^>]*>\s*([^<]+?)\s*<',
webpage, 'error', default=None) webpage, 'error', default=None)
if error: if error:
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
@@ -607,16 +580,9 @@ class BBCIE(BBCCoUkIE):
IE_DESC = 'BBC' IE_DESC = 'BBC'
_VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)' _VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
_MEDIASELECTOR_URLS = [ _MEDIA_SETS = [
# Provides HQ HLS streams but fails with geolocation in some cases when it's 'mobile-tablet-main',
# even not geo restricted at all 'pc',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
# Provides more formats, namely direct mp4 links, but fails on some videos with
# notukerror for non UK (?) users (e.g.
# http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
'http://open.live.bbc.co.uk/mediaselector/4/mtis/stream/%s',
# Provides fewer formats, but works everywhere for everybody (hopefully)
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/journalism-pc/vpid/%s',
] ]
_TESTS = [{ _TESTS = [{

View File

@@ -0,0 +1,60 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
try_get,
urlencode_postdata,
)
class BongaCamsIE(InfoExtractor):
_VALID_URL = r'https?://(?P<host>(?:[^/]+\.)?bongacams\d*\.com)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://de.bongacams.com/azumi-8',
'only_matching': True,
}, {
'url': 'https://cn.bongacams.com/azumi-8',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
channel_id = mobj.group('id')
amf = self._download_json(
'https://%s/tools/amf.php' % host, channel_id,
data=urlencode_postdata((
('method', 'getRoomData'),
('args[]', channel_id),
('args[]', 'false'),
)), headers={'X-Requested-With': 'XMLHttpRequest'})
server_url = amf['localData']['videoServerUrl']
uploader_id = try_get(
amf, lambda x: x['performerData']['username'], compat_str) or channel_id
uploader = try_get(
amf, lambda x: x['performerData']['displayName'], compat_str)
like_count = int_or_none(try_get(
amf, lambda x: x['performerData']['loversCount']))
formats = self._extract_m3u8_formats(
'%s/hls/stream_%s/playlist.m3u8' % (server_url, uploader_id),
channel_id, 'mp4', m3u8_id='hls', live=True)
self._sort_formats(formats)
return {
'id': channel_id,
'title': self._live_title(uploader or uploader_id),
'uploader': uploader,
'uploader_id': uploader_id,
'like_count': like_count,
'age_limit': 18,
'is_live': True,
'formats': formats,
}

View File

@@ -28,6 +28,7 @@ from ..utils import (
parse_iso8601, parse_iso8601,
smuggle_url, smuggle_url,
str_or_none, str_or_none,
try_get,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
UnsupportedError, UnsupportedError,
@@ -470,13 +471,18 @@ class BrightcoveNewIE(AdobePassIE):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}): def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip() title = json_data['name'].strip()
num_drm_sources = 0
formats = [] formats = []
for source in json_data.get('sources', []): sources = json_data.get('sources') or []
for source in sources:
container = source.get('container') container = source.get('container')
ext = mimetype2ext(source.get('type')) ext = mimetype2ext(source.get('type'))
src = source.get('src') src = source.get('src')
# https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object # https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if ext == 'ism' or container == 'WVM' or source.get('key_systems'): if container == 'WVM' or source.get('key_systems'):
num_drm_sources += 1
continue
elif ext == 'ism':
continue continue
elif ext == 'm3u8' or container == 'M2TS': elif ext == 'm3u8' or container == 'M2TS':
if not src: if not src:
@@ -533,20 +539,15 @@ class BrightcoveNewIE(AdobePassIE):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
if not formats:
# for sonyliv.com DRM protected videos
s3_source_url = json_data.get('custom_fields', {}).get('s3sourceurl')
if s3_source_url:
formats.append({
'url': s3_source_url,
'format_id': 'source',
})
errors = json_data.get('errors') if not formats:
if not formats and errors: errors = json_data.get('errors')
error = errors[0] if errors:
raise ExtractorError( error = errors[0]
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True) raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
if sources and num_drm_sources == len(sources):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats) self._sort_formats(formats)
@@ -600,24 +601,27 @@ class BrightcoveNewIE(AdobePassIE):
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x) store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
def extract_policy_key(): def extract_policy_key():
webpage = self._download_webpage( base_url = 'http://players.brightcove.net/%s/%s_%s/' % (account_id, player_id, embed)
'http://players.brightcove.net/%s/%s_%s/index.min.js' config = self._download_json(
% (account_id, player_id, embed), video_id) base_url + 'config.json', video_id, fatal=False) or {}
policy_key = try_get(
policy_key = None config, lambda x: x['video_cloud']['policy_key'])
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key: if not policy_key:
policy_key = self._search_regex( webpage = self._download_webpage(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1', base_url + 'index.min.js', video_id)
webpage, 'policy key', group='pk')
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
store_pk(policy_key) store_pk(policy_key)
return policy_key return policy_key

View File

@@ -11,7 +11,47 @@ from ..utils import (
class CBSLocalIE(AnvatoIE): class CBSLocalIE(AnvatoIE):
_VALID_URL = r'https?://[a-z]+\.cbslocal\.com/(?:\d+/\d+/\d+|video)/(?P<id>[0-9a-z-]+)' _VALID_URL_BASE = r'https?://[a-z]+\.cbslocal\.com/'
_VALID_URL = _VALID_URL_BASE + r'video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
mcp_id = self._match_id(url)
return self.url_result(
'anvato:anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67:' + mcp_id, 'Anvato', mcp_id)
class CBSLocalArticleIE(AnvatoIE):
_VALID_URL = CBSLocalIE._VALID_URL_BASE + r'\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
_TESTS = [{ _TESTS = [{
# Anvato backend # Anvato backend
@@ -52,31 +92,6 @@ class CBSLocalIE(AnvatoIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -0,0 +1,52 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/(?P<id>(?:show|movie)s/[^/]+/[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ctv.ca/shows/your-morning/wednesday-december-23-2020-s5e88',
'info_dict': {
'id': '2102249',
'ext': 'flv',
'title': 'Wednesday, December 23, 2020',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'Your Morning delivers original perspectives and unique insights into the headlines of the day.',
'timestamp': 1608732000,
'upload_date': '20201223',
'series': 'Your Morning',
'season': '2020-2021',
'season_number': 5,
'episode_number': 88,
'tags': ['Your Morning'],
'categories': ['Talk Show'],
'duration': 7467.126,
},
}, {
'url': 'https://www.ctv.ca/movies/adam-sandlers-eight-crazy-nights/adam-sandlers-eight-crazy-nights',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
content = self._download_json(
'https://www.ctv.ca/space-graphql/graphql', display_id, query={
'query': '''{
resolvedPath(path: "/%s") {
lastSegment {
content {
... on AxisContent {
axisId
videoPlayerDestCode
}
}
}
}
}''' % display_id,
})['data']['resolvedPath']['lastSegment']['content']
video_id = content['axisId']
return self.url_result(
'9c9media:%s:%s' % (content['videoPlayerDestCode'], video_id),
'NineCNineMedia', video_id)

View File

@@ -33,6 +33,8 @@ from .aenetworks import (
AENetworksCollectionIE, AENetworksCollectionIE,
AENetworksShowIE, AENetworksShowIE,
HistoryTopicIE, HistoryTopicIE,
HistoryPlayerIE,
BiographyIE,
) )
from .afreecatv import AfreecaTVIE from .afreecatv import AfreecaTVIE
from .airmozilla import AirMozillaIE from .airmozilla import AirMozillaIE
@@ -53,7 +55,9 @@ from .appletrailers import (
AppleTrailersIE, AppleTrailersIE,
AppleTrailersSectionIE, AppleTrailersSectionIE,
) )
from .applepodcasts import ApplePodcastsIE
from .archiveorg import ArchiveOrgIE from .archiveorg import ArchiveOrgIE
from .arcpublishing import ArcPublishingIE
from .arkena import ArkenaIE from .arkena import ArkenaIE
from .ard import ( from .ard import (
ARDBetaMediathekIE, ARDBetaMediathekIE,
@@ -119,6 +123,7 @@ from .bleacherreport import (
from .blinkx import BlinkxIE from .blinkx import BlinkxIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE
from .bostonglobe import BostonGlobeIE from .bostonglobe import BostonGlobeIE
from .box import BoxIE from .box import BoxIE
from .bpb import BpbIE from .bpb import BpbIE
@@ -163,7 +168,10 @@ from .cbc import (
CBCOlympicsIE, CBCOlympicsIE,
) )
from .cbs import CBSIE from .cbs import CBSIE
from .cbslocal import CBSLocalIE from .cbslocal import (
CBSLocalIE,
CBSLocalArticleIE,
)
from .cbsinteractive import CBSInteractiveIE from .cbsinteractive import CBSInteractiveIE
from .cbsnews import ( from .cbsnews import (
CBSNewsEmbedIE, CBSNewsEmbedIE,
@@ -241,6 +249,7 @@ from .crunchyroll import (
) )
from .cspan import CSpanIE from .cspan import CSpanIE
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import ( from .curiositystream import (
@@ -394,7 +403,6 @@ from .fujitv import FujiTVFODPlus7IE
from .funimation import FunimationIE from .funimation import FunimationIE
from .funk import FunkIE from .funk import FunkIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE from .gaia import GaiaIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
@@ -415,7 +423,10 @@ from .go import GoIE
from .godtube import GodTubeIE from .godtube import GodTubeIE
from .golem import GolemIE from .golem import GolemIE
from .googledrive import GoogleDriveIE from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE from .googlepodcasts import (
GooglePodcastsIE,
GooglePodcastsFeedIE,
)
from .googlesearch import GoogleSearchIE from .googlesearch import GoogleSearchIE
from .goshgay import GoshgayIE from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE from .gputechconf import GPUTechConfIE
@@ -456,6 +467,10 @@ from .ign import (
OneUPIE, OneUPIE,
PCMagIE, PCMagIE,
) )
from .iheart import (
IHeartRadioIE,
IHeartRadioPodcastIE,
)
from .imdb import ( from .imdb import (
ImdbIE, ImdbIE,
ImdbListIE ImdbListIE
@@ -686,7 +701,6 @@ from .nba import (
NBAChannelIE, NBAChannelIE,
) )
from .nbc import ( from .nbc import (
CSNNEIE,
NBCIE, NBCIE,
NBCNewsIE, NBCNewsIE,
NBCOlympicsIE, NBCOlympicsIE,
@@ -784,6 +798,7 @@ from .nrk import (
NRKSkoleIE, NRKSkoleIE,
NRKTVIE, NRKTVIE,
NRKTVDirekteIE, NRKTVDirekteIE,
NRKRadioPodkastIE,
NRKTVEpisodeIE, NRKTVEpisodeIE,
NRKTVEpisodesIE, NRKTVEpisodesIE,
NRKTVSeasonIE, NRKTVSeasonIE,
@@ -1047,6 +1062,7 @@ from .skynewsarabia import (
from .sky import ( from .sky import (
SkyNewsIE, SkyNewsIE,
SkySportsIE, SkySportsIE,
SkySportsNewsIE,
) )
from .slideshare import SlideshareIE from .slideshare import SlideshareIE
from .slideslive import SlidesLiveIE from .slideslive import SlidesLiveIE
@@ -1084,7 +1100,10 @@ from .spike import (
BellatorIE, BellatorIE,
ParamountNetworkIE, ParamountNetworkIE,
) )
from .stitcher import StitcherIE from .stitcher import (
StitcherIE,
StitcherShowIE,
)
from .sport5 import Sport5IE from .sport5 import Sport5IE
from .sportbox import SportBoxIE from .sportbox import SportBoxIE
from .sportdeutschland import SportDeutschlandIE from .sportdeutschland import SportDeutschlandIE
@@ -1420,7 +1439,10 @@ from .vshare import VShareIE
from .medialaan import MedialaanIE from .medialaan import MedialaanIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE from .vvvvid import (
VVVVIDIE,
VVVVIDShowIE,
)
from .vyborymos import VyboryMosIE from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .wakanim import WakanimIE from .wakanim import WakanimIE
@@ -1451,7 +1473,10 @@ from .weibo import (
WeiboMobileIE WeiboMobileIE
) )
from .weiqitv import WeiqiTVIE from .weiqitv import WeiqiTVIE
from .wistia import WistiaIE from .wistia import (
WistiaIE,
WistiaPlaylistIE,
)
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .wsj import ( from .wsj import (
WSJIE, WSJIE,

View File

@@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
import socket import socket
@@ -8,6 +9,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_http_client, compat_http_client,
compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
@@ -47,7 +49,8 @@ class FacebookIE(InfoExtractor):
)\?(?:.*?)(?:v|video_id|story_fbid)=| )\?(?:.*?)(?:v|video_id|story_fbid)=|
[^/]+/videos/(?:[^/]+/)?| [^/]+/videos/(?:[^/]+/)?|
[^/]+/posts/| [^/]+/posts/|
groups/[^/]+/permalink/ groups/[^/]+/permalink/|
watchparty/
)| )|
facebook: facebook:
) )
@@ -280,8 +283,18 @@ class FacebookIE(InfoExtractor):
# data.video.creation_story.attachments[].media # data.video.creation_story.attachments[].media
'url': 'https://www.facebook.com/watch/live/?v=1823658634322275', 'url': 'https://www.facebook.com/watch/live/?v=1823658634322275',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.facebook.com/watchparty/211641140192478',
'info_dict': {
'id': '211641140192478',
},
'playlist_count': 1,
'skip': 'Requires logging in',
}] }]
_SUPPORTED_PAGLETS_REGEX = r'(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_[0-9a-f]+)' _SUPPORTED_PAGLETS_REGEX = r'(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_[0-9a-f]+)'
_api_config = {
'graphURI': '/api/graphql/'
}
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@@ -405,6 +418,17 @@ class FacebookIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
def extract_relay_data(_filter):
return self._parse_json(self._search_regex(
r'handleWithCustomApplyEach\([^,]+,\s*({.*?%s.*?})\);' % _filter,
webpage, 'replay data', default='{}'), video_id, fatal=False) or {}
def extract_relay_prefetched_data(_filter):
replay_data = extract_relay_data(_filter)
for require in (replay_data.get('require') or []):
if require[0] == 'RelayPrefetchedStreamCache':
return try_get(require, lambda x: x[3][1]['__bbox']['result']['data'], dict) or {}
if not video_data: if not video_data:
server_js_data = self._parse_json(self._search_regex([ server_js_data = self._parse_json(self._search_regex([
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+' + self._SUPPORTED_PAGLETS_REGEX, r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+' + self._SUPPORTED_PAGLETS_REGEX,
@@ -413,87 +437,83 @@ class FacebookIE(InfoExtractor):
video_data = extract_from_jsmods_instances(server_js_data) video_data = extract_from_jsmods_instances(server_js_data)
if not video_data: if not video_data:
graphql_data = self._parse_json(self._search_regex( data = extract_relay_prefetched_data(
r'handleWithCustomApplyEach\([^,]+,\s*({.*?"(?:dash_manifest|playable_url(?:_quality_hd)?)"\s*:\s*"[^"]+".*?})\);', r'"(?:dash_manifest|playable_url(?:_quality_hd)?)"\s*:\s*"[^"]+"')
webpage, 'graphql data', default='{}'), video_id, fatal=False) or {} if data:
for require in (graphql_data.get('require') or []): entries = []
if require[0] == 'RelayPrefetchedStreamCache':
entries = []
def parse_graphql_video(video): def parse_graphql_video(video):
formats = [] formats = []
q = qualities(['sd', 'hd']) q = qualities(['sd', 'hd'])
for (suffix, format_id) in [('', 'sd'), ('_quality_hd', 'hd')]: for (suffix, format_id) in [('', 'sd'), ('_quality_hd', 'hd')]:
playable_url = video.get('playable_url' + suffix) playable_url = video.get('playable_url' + suffix)
if not playable_url: if not playable_url:
continue continue
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'quality': q(format_id), 'quality': q(format_id),
'url': playable_url, 'url': playable_url,
}) })
extract_dash_manifest(video, formats) extract_dash_manifest(video, formats)
process_formats(formats) process_formats(formats)
v_id = video.get('videoId') or video.get('id') or video_id v_id = video.get('videoId') or video.get('id') or video_id
info = { info = {
'id': v_id, 'id': v_id,
'formats': formats, 'formats': formats,
'thumbnail': try_get(video, lambda x: x['thumbnailImage']['uri']), 'thumbnail': try_get(video, lambda x: x['thumbnailImage']['uri']),
'uploader_id': try_get(video, lambda x: x['owner']['id']), 'uploader_id': try_get(video, lambda x: x['owner']['id']),
'timestamp': int_or_none(video.get('publish_time')), 'timestamp': int_or_none(video.get('publish_time')),
'duration': float_or_none(video.get('playable_duration_in_ms'), 1000), 'duration': float_or_none(video.get('playable_duration_in_ms'), 1000),
} }
description = try_get(video, lambda x: x['savable_description']['text']) description = try_get(video, lambda x: x['savable_description']['text'])
title = video.get('name') title = video.get('name')
if title: if title:
info.update({ info.update({
'title': title, 'title': title,
'description': description, 'description': description,
}) })
else: else:
info['title'] = description or 'Facebook video #%s' % v_id info['title'] = description or 'Facebook video #%s' % v_id
entries.append(info) entries.append(info)
def parse_attachment(attachment, key='media'): def parse_attachment(attachment, key='media'):
media = attachment.get(key) or {} media = attachment.get(key) or {}
if media.get('__typename') == 'Video': if media.get('__typename') == 'Video':
return parse_graphql_video(media) return parse_graphql_video(media)
data = try_get(require, lambda x: x[3][1]['__bbox']['result']['data'], dict) or {} nodes = data.get('nodes') or []
node = data.get('node') or {}
if not nodes and node:
nodes.append(node)
for node in nodes:
story = try_get(node, lambda x: x['comet_sections']['content']['story'], dict) or {}
attachments = try_get(story, [
lambda x: x['attached_story']['attachments'],
lambda x: x['attachments']
], list) or []
for attachment in attachments:
attachment = try_get(attachment, lambda x: x['style_type_renderer']['attachment'], dict)
ns = try_get(attachment, lambda x: x['all_subattachments']['nodes'], list) or []
for n in ns:
parse_attachment(n)
parse_attachment(attachment)
nodes = data.get('nodes') or [] edges = try_get(data, lambda x: x['mediaset']['currMedia']['edges'], list) or []
node = data.get('node') or {} for edge in edges:
if not nodes and node: parse_attachment(edge, key='node')
nodes.append(node)
for node in nodes:
story = try_get(node, lambda x: x['comet_sections']['content']['story'], dict) or {}
attachments = try_get(story, [
lambda x: x['attached_story']['attachments'],
lambda x: x['attachments']
], list) or []
for attachment in attachments:
attachment = try_get(attachment, lambda x: x['style_type_renderer']['attachment'], dict)
ns = try_get(attachment, lambda x: x['all_subattachments']['nodes'], list) or []
for n in ns:
parse_attachment(n)
parse_attachment(attachment)
edges = try_get(data, lambda x: x['mediaset']['currMedia']['edges'], list) or [] video = data.get('video') or {}
for edge in edges: if video:
parse_attachment(edge, key='node') attachments = try_get(video, [
lambda x: x['story']['attachments'],
lambda x: x['creation_story']['attachments']
], list) or []
for attachment in attachments:
parse_attachment(attachment)
if not entries:
parse_graphql_video(video)
video = data.get('video') or {} return self.playlist_result(entries, video_id)
if video:
attachments = try_get(video, [
lambda x: x['story']['attachments'],
lambda x: x['creation_story']['attachments']
], list) or []
for attachment in attachments:
parse_attachment(attachment)
if not entries:
parse_graphql_video(video)
return self.playlist_result(entries, video_id)
if not video_data: if not video_data:
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage) m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
@@ -504,6 +524,43 @@ class FacebookIE(InfoExtractor):
elif '>You must log in to continue' in webpage: elif '>You must log in to continue' in webpage:
self.raise_login_required() self.raise_login_required()
if not video_data and '/watchparty/' in url:
post_data = {
'doc_id': 3731964053542869,
'variables': json.dumps({
'livingRoomID': video_id,
}),
}
prefetched_data = extract_relay_prefetched_data(r'"login_data"\s*:\s*{')
if prefetched_data:
lsd = try_get(prefetched_data, lambda x: x['login_data']['lsd'], dict)
if lsd:
post_data[lsd['name']] = lsd['value']
relay_data = extract_relay_data(r'\[\s*"RelayAPIConfigDefaults"\s*,')
for define in (relay_data.get('define') or []):
if define[0] == 'RelayAPIConfigDefaults':
self._api_config = define[2]
living_room = self._download_json(
urljoin(url, self._api_config['graphURI']), video_id,
data=urlencode_postdata(post_data))['data']['living_room']
entries = []
for edge in (try_get(living_room, lambda x: x['recap']['watched_content']['edges']) or []):
video = try_get(edge, lambda x: x['node']['video']) or {}
v_id = video.get('id')
if not v_id:
continue
v_id = compat_str(v_id)
entries.append(self.url_result(
self._VIDEO_PAGE_TEMPLATE % v_id,
self.ie_key(), v_id, video.get('name')))
return self.playlist_result(entries, video_id)
if not video_data:
# Video info not in first request, do a secondary request using # Video info not in first request, do a secondary request using
# tahoe player specific URL # tahoe player specific URL
tahoe_data = self._download_webpage( tahoe_data = self._download_webpage(

View File

@@ -1,77 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .adobepass import AdobePassIE
from ..utils import (
extract_attributes,
int_or_none,
parse_age_limit,
smuggle_url,
update_url_query,
)
class FXNetworksIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?(?:fxnetworks|simpsonsworld)\.com/video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.fxnetworks.com/video/1032565827847',
'md5': '8d99b97b4aa7a202f55b6ed47ea7e703',
'info_dict': {
'id': 'dRzwHC_MMqIv',
'ext': 'mp4',
'title': 'First Look: Better Things - Season 2',
'description': 'Because real life is like a fart. Watch this FIRST LOOK to see what inspired the new season of Better Things.',
'age_limit': 14,
'uploader': 'NEWA-FNG-FX',
'upload_date': '20170825',
'timestamp': 1503686274,
'episode_number': 0,
'season_number': 2,
'series': 'Better Things',
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.simpsonsworld.com/video/716094019682',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if 'The content you are trying to access is not available in your region.' in webpage:
self.raise_geo_restricted()
video_data = extract_attributes(self._search_regex(
r'(<a.+?rel="https?://link\.theplatform\.com/s/.+?</a>)', webpage, 'video data'))
player_type = self._search_regex(r'playerType\s*=\s*[\'"]([^\'"]+)', webpage, 'player type', default=None)
release_url = video_data['rel']
title = video_data['data-title']
rating = video_data.get('data-rating')
query = {
'mbr': 'true',
}
if player_type == 'movies':
query.update({
'manifest': 'm3u',
})
else:
query.update({
'switch': 'http',
})
if video_data.get('data-req-auth') == '1':
resource = self._get_mvpd_resource(
video_data['data-channel'], title,
video_data.get('data-guid'), rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fx', resource)
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'series': video_data.get('data-show-title'),
'episode_number': int_or_none(video_data.get('data-episode')),
'season_number': int_or_none(video_data.get('data-season')),
'thumbnail': video_data.get('data-large-thumb'),
'age_limit': parse_age_limit(rating),
'ie_key': 'ThePlatform',
}

View File

@@ -67,7 +67,10 @@ from .tube8 import Tube8IE
from .mofosex import MofosexEmbedIE from .mofosex import MofosexEmbedIE
from .spankwire import SpankwireIE from .spankwire import SpankwireIE
from .youporn import YouPornIE from .youporn import YouPornIE
from .vimeo import VimeoIE from .vimeo import (
VimeoIE,
VHXEmbedIE,
)
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
@@ -124,6 +127,7 @@ from .expressen import ExpressenIE
from .zype import ZypeIE from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .arcpublishing import ArcPublishingIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -2024,22 +2028,6 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': [SpringboardPlatformIE.ie_key()], 'add_ie': [SpringboardPlatformIE.ie_key()],
}, },
{
'url': 'https://www.youtube.com/shared?ci=1nEzmT-M4fU',
'info_dict': {
'id': 'uPDB5I9wfp8',
'ext': 'webm',
'title': 'Pocoyo: 90 minutos de episódios completos Português para crianças - PARTE 3',
'description': 'md5:d9e4d9346a2dfff4c7dc4c8cec0f546d',
'upload_date': '20160219',
'uploader': 'Pocoyo - Português (BR)',
'uploader_id': 'PocoyoBrazil',
},
'add_ie': [YoutubeIE.ie_key()],
'params': {
'skip_download': True,
},
},
{ {
'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html', 'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html',
'info_dict': { 'info_dict': {
@@ -2209,7 +2197,32 @@ class GenericIE(InfoExtractor):
# 'params': { # 'params': {
# 'force_generic_extractor': True, # 'force_generic_extractor': True,
# }, # },
# } # },
{
# VHX Embed
'url': 'https://demo.vhx.tv/category-c/videos/file-example-mp4-480-1-5mg-copy',
'info_dict': {
'id': '858208',
'ext': 'mp4',
'title': 'Untitled',
'uploader_id': 'user80538407',
'uploader': 'OTT Videos',
},
},
{
# ArcPublishing PoWa video player
'url': 'https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/',
'md5': 'b03b2fac8680e1e5a7cc81a5c27e71b3',
'info_dict': {
'id': '8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'ext': 'mp4',
'title': 'Senate candidates wave to voters on Anchorage streets',
'description': 'md5:91f51a6511f090617353dc720318b20e',
'timestamp': 1604378735,
'upload_date': '20201103',
'duration': 1581,
},
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@@ -2576,6 +2589,10 @@ class GenericIE(InfoExtractor):
if tp_urls: if tp_urls:
return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform') return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
arc_urls = ArcPublishingIE._extract_urls(webpage)
if arc_urls:
return self.playlist_from_matches(arc_urls, video_id, video_title, ie=ArcPublishingIE.ie_key())
# Look for embedded rtl.nl player # Look for embedded rtl.nl player
matches = re.findall( matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"', r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@@ -2587,6 +2604,10 @@ class GenericIE(InfoExtractor):
if vimeo_urls: if vimeo_urls:
return self.playlist_from_matches(vimeo_urls, video_id, video_title, ie=VimeoIE.ie_key()) return self.playlist_from_matches(vimeo_urls, video_id, video_title, ie=VimeoIE.ie_key())
vhx_url = VHXEmbedIE._extract_url(webpage)
if vhx_url:
return self.url_result(vhx_url, VHXEmbedIE.ie_key())
vid_me_embed_url = self._search_regex( vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]', r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',
webpage, 'vid.me embed', default=None) webpage, 'vid.me embed', default=None)

View File

@@ -38,13 +38,17 @@ class GoIE(AdobePassIE):
'disneynow': { 'disneynow': {
'brand': '011', 'brand': '011',
'resource_id': 'Disney', 'resource_id': 'Disney',
} },
'fxnow.fxnetworks': {
'brand': '025',
'requestor_id': 'dtci',
},
} }
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:(?P<sub_domain>%s)\.)?go| (?:(?P<sub_domain>%s)\.)?go|
(?P<sub_domain_2>abc|freeform|disneynow) (?P<sub_domain_2>abc|freeform|disneynow|fxnow\.fxnetworks)
)\.com/ )\.com/
(?: (?:
(?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)| (?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)|
@@ -99,6 +103,19 @@ class GoIE(AdobePassIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://fxnow.fxnetworks.com/shows/better-things/video/vdka12782841',
'info_dict': {
'id': 'VDKA12782841',
'ext': 'mp4',
'title': 'First Look: Better Things - Season 2',
'description': 'md5:fa73584a95761c605d9d54904e35b407',
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True, 'only_matching': True,

View File

@@ -1,73 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import codecs
from .common import InfoExtractor
from ..utils import unified_strdate
class GooglePlusIE(InfoExtractor):
IE_DESC = 'Google Plus'
_VALID_URL = r'https?://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
IE_NAME = 'plus.google'
_TEST = {
'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH',
'info_dict': {
'id': 'ZButuJc6CtH',
'ext': 'flv',
'title': '嘆きの天使 降臨',
'upload_date': '20120613',
'uploader': '井上ヨシマサ',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# Step 1, Retrieve post webpage to extract further information
webpage = self._download_webpage(url, video_id, 'Downloading entry webpage')
title = self._og_search_description(webpage).splitlines()[0]
upload_date = unified_strdate(self._html_search_regex(
r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, 'upload date', fatal=False, flags=re.VERBOSE))
uploader = self._html_search_regex(
r'rel="author".*?>(.*?)</a>', webpage, 'uploader', fatal=False)
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com/'
video_page = self._search_regex(
r'<a href="((?:%s)?photos/.*?)"' % re.escape(DOMAIN),
webpage, 'video page URL')
if not video_page.startswith(DOMAIN):
video_page = DOMAIN + video_page
webpage = self._download_webpage(video_page, video_id, 'Downloading video page')
def unicode_escape(s):
decoder = codecs.getdecoder('unicode_escape')
return re.sub(
r'\\u[0-9a-fA-F]{4,}',
lambda m: decoder(m.group(0))[0],
s)
# Extract video links all sizes
formats = [{
'url': unicode_escape(video_url),
'ext': 'flv',
'width': int(width),
'height': int(height),
} for width, height, video_url in re.findall(
r'\d+,(\d+),(\d+),"(https?://[^.]+\.googleusercontent\.com.*?)"', webpage)]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'uploader': uploader,
'upload_date': upload_date,
'formats': formats,
}

View File

@@ -0,0 +1,88 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_podcast_url,
int_or_none,
try_get,
urlencode_postdata,
)
class GooglePodcastsBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://podcasts\.google\.com/feed/'
def _batch_execute(self, func_id, video_id, params):
return json.loads(self._download_json(
'https://podcasts.google.com/_/PodcastsUi/data/batchexecute',
video_id, data=urlencode_postdata({
'f.req': json.dumps([[[func_id, json.dumps(params), None, '1']]]),
}), transform_source=lambda x: self._search_regex(r'(?s)(\[.+\])', x, 'data'))[0][2])
def _extract_episode(self, episode):
return {
'id': episode[4][3],
'title': episode[8],
'url': clean_podcast_url(episode[13]),
'thumbnail': episode[2],
'description': episode[9],
'creator': try_get(episode, lambda x: x[14]),
'timestamp': int_or_none(episode[11]),
'duration': int_or_none(episode[12]),
'series': episode[1],
}
class GooglePodcastsIE(GooglePodcastsBaseIE):
IE_NAME = 'google:podcasts'
_VALID_URL = GooglePodcastsBaseIE._VALID_URL_BASE + r'(?P<feed_url>[^/]+)/episode/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5ucHIub3JnLzM0NDA5ODUzOS9wb2RjYXN0LnhtbA/episode/MzBlNWRlN2UtOWE4Yy00ODcwLTk2M2MtM2JlMmUyNmViOTRh',
'md5': 'fa56b2ee8bd0703e27e42d4b104c4766',
'info_dict': {
'id': '30e5de7e-9a8c-4870-963c-3be2e26eb94a',
'ext': 'mp3',
'title': 'WWDTM New Year 2021',
'description': 'We say goodbye to 2020 with Christine Baranksi, Doug Jones, Jonna Mendez, and Kellee Edwards.',
'upload_date': '20210102',
'timestamp': 1609606800,
'duration': 2901,
'series': "Wait Wait... Don't Tell Me!",
}
}
def _real_extract(self, url):
b64_feed_url, b64_guid = re.match(self._VALID_URL, url).groups()
episode = self._batch_execute(
'oNjqVe', b64_guid, [b64_feed_url, b64_guid])[1]
return self._extract_episode(episode)
class GooglePodcastsFeedIE(GooglePodcastsBaseIE):
IE_NAME = 'google:podcasts:feed'
_VALID_URL = GooglePodcastsBaseIE._VALID_URL_BASE + r'(?P<id>[^/?&#]+)/?(?:[?#&]|$)'
_TEST = {
'url': 'https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5ucHIub3JnLzM0NDA5ODUzOS9wb2RjYXN0LnhtbA',
'info_dict': {
'title': "Wait Wait... Don't Tell Me!",
'description': "NPR's weekly current events quiz. Have a laugh and test your news knowledge while figuring out what's real and what we've made up.",
},
'playlist_mincount': 20,
}
def _real_extract(self, url):
b64_feed_url = self._match_id(url)
data = self._batch_execute('ncqJEe', b64_feed_url, [b64_feed_url])
entries = []
for episode in (try_get(data, lambda x: x[1][0]) or []):
entries.append(self._extract_episode(episode))
feed = try_get(data, lambda x: x[3]) or []
return self.playlist_result(
entries, playlist_title=try_get(feed, lambda x: x[0]),
playlist_description=try_get(feed, lambda x: x[2]))

View File

@@ -0,0 +1,97 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
clean_podcast_url,
int_or_none,
str_or_none,
)
class IHeartRadioBaseIE(InfoExtractor):
def _call_api(self, path, video_id, fatal=True, query=None):
return self._download_json(
'https://api.iheart.com/api/v3/podcast/' + path,
video_id, fatal=fatal, query=query)
def _extract_episode(self, episode):
return {
'thumbnail': episode.get('imageUrl'),
'description': clean_html(episode.get('description')),
'timestamp': int_or_none(episode.get('startDate'), 1000),
'duration': int_or_none(episode.get('duration')),
}
class IHeartRadioIE(IHeartRadioBaseIE):
IENAME = 'iheartradio'
_VALID_URL = r'(?:https?://(?:www\.)?iheart\.com/podcast/[^/]+/episode/(?P<display_id>[^/?&#]+)-|iheartradio:)(?P<id>\d+)'
_TEST = {
'url': 'https://www.iheart.com/podcast/105-behind-the-bastards-29236323/episode/part-one-alexander-lukashenko-the-dictator-70346499/?embed=true',
'md5': 'c8609c92c8688dcb69d8541042b8abca',
'info_dict': {
'id': '70346499',
'ext': 'mp3',
'title': 'Part One: Alexander Lukashenko: The Dictator of Belarus',
'description': 'md5:96cc7297b3a5a9ebae28643801c96fae',
'timestamp': 1597741200,
'upload_date': '20200818',
}
}
def _real_extract(self, url):
episode_id = self._match_id(url)
episode = self._call_api(
'episodes/' + episode_id, episode_id)['episode']
info = self._extract_episode(episode)
info.update({
'id': episode_id,
'title': episode['title'],
'url': clean_podcast_url(episode['mediaUrl']),
})
return info
class IHeartRadioPodcastIE(IHeartRadioBaseIE):
IE_NAME = 'iheartradio:podcast'
_VALID_URL = r'https?://(?:www\.)?iheart(?:podcastnetwork)?\.com/podcast/[^/?&#]+-(?P<id>\d+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://www.iheart.com/podcast/1119-it-could-happen-here-30717896/',
'info_dict': {
'id': '30717896',
'title': 'It Could Happen Here',
'description': 'md5:5842117412a967eb0b01f8088eb663e2',
},
'playlist_mincount': 11,
}, {
'url': 'https://www.iheartpodcastnetwork.com/podcast/105-stuff-you-should-know-26940277',
'only_matching': True,
}]
def _real_extract(self, url):
podcast_id = self._match_id(url)
path = 'podcasts/' + podcast_id
episodes = self._call_api(
path + '/episodes', podcast_id, query={'limit': 1000000000})['data']
entries = []
for episode in episodes:
episode_id = str_or_none(episode.get('id'))
if not episode_id:
continue
info = self._extract_episode(episode)
info.update({
'_type': 'url',
'id': episode_id,
'title': episode.get('title'),
'url': 'iheartradio:' + episode_id,
'ie_key': IHeartRadioIE.ie_key(),
})
entries.append(info)
podcast = self._call_api(path, podcast_id, False) or {}
return self.playlist_result(
entries, podcast_id, podcast.get('title'), podcast.get('description'))

View File

@@ -22,7 +22,7 @@ from ..utils import (
class InstagramIE(InfoExtractor): class InstagramIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv)/(?P<id>[^/?#&]+))' _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv|reel)/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc', 'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516', 'md5': '0d2da106a9d2631273e192b372806516',
@@ -35,7 +35,7 @@ class InstagramIE(InfoExtractor):
'timestamp': 1371748545, 'timestamp': 1371748545,
'upload_date': '20130620', 'upload_date': '20130620',
'uploader_id': 'naomipq', 'uploader_id': 'naomipq',
'uploader': 'Naomi Leonor Phan-Quang', 'uploader': 'B E A U T Y F O R A S H E S',
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'comments': list, 'comments': list,
@@ -95,6 +95,9 @@ class InstagramIE(InfoExtractor):
}, { }, {
'url': 'https://www.instagram.com/tv/aye83DjauH/', 'url': 'https://www.instagram.com/tv/aye83DjauH/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.instagram.com/reel/CDUMkliABpa/',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@@ -122,9 +125,9 @@ class InstagramIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
(video_url, description, thumbnail, timestamp, uploader, (media, video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count, comments, height, uploader_id, like_count, comment_count, comments, height,
width) = [None] * 11 width) = [None] * 12
shared_data = self._parse_json( shared_data = self._parse_json(
self._search_regex( self._search_regex(
@@ -137,59 +140,77 @@ class InstagramIE(InfoExtractor):
(lambda x: x['entry_data']['PostPage'][0]['graphql']['shortcode_media'], (lambda x: x['entry_data']['PostPage'][0]['graphql']['shortcode_media'],
lambda x: x['entry_data']['PostPage'][0]['media']), lambda x: x['entry_data']['PostPage'][0]['media']),
dict) dict)
if media: # _sharedData.entry_data.PostPage is empty when authenticated (see
video_url = media.get('video_url') # https://github.com/ytdl-org/youtube-dl/pull/22880)
height = int_or_none(media.get('dimensions', {}).get('height')) if not media:
width = int_or_none(media.get('dimensions', {}).get('width')) additional_data = self._parse_json(
description = try_get( self._search_regex(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'], r'window\.__additionalDataLoaded\s*\(\s*[^,]+,\s*({.+?})\s*\)\s*;',
compat_str) or media.get('caption') webpage, 'additional data', default='{}'),
thumbnail = media.get('display_src') video_id, fatal=False)
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date')) if additional_data:
uploader = media.get('owner', {}).get('full_name') media = try_get(
uploader_id = media.get('owner', {}).get('username') additional_data, lambda x: x['graphql']['shortcode_media'],
dict)
if media:
video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height'))
width = int_or_none(media.get('dimensions', {}).get('width'))
description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption')
thumbnail = media.get('display_src') or media.get('display_url')
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
def get_count(key, kind): def get_count(keys, kind):
return int_or_none(try_get( if not isinstance(keys, (list, tuple)):
keys = [keys]
for key in keys:
count = int_or_none(try_get(
media, (lambda x: x['edge_media_%s' % key]['count'], media, (lambda x: x['edge_media_%s' % key]['count'],
lambda x: x['%ss' % kind]['count']))) lambda x: x['%ss' % kind]['count'])))
like_count = get_count('preview_like', 'like') if count is not None:
comment_count = get_count('to_comment', 'comment') return count
like_count = get_count('preview_like', 'like')
comment_count = get_count(
('preview_comment', 'to_comment', 'to_parent_comment'), 'comment')
comments = [{ comments = [{
'author': comment.get('user', {}).get('username'), 'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'), 'author_id': comment.get('user', {}).get('id'),
'id': comment.get('id'), 'id': comment.get('id'),
'text': comment.get('text'), 'text': comment.get('text'),
'timestamp': int_or_none(comment.get('created_at')), 'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get( } for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')] 'comments', {}).get('nodes', []) if comment.get('text')]
if not video_url: if not video_url:
edges = try_get( edges = try_get(
media, lambda x: x['edge_sidecar_to_children']['edges'], media, lambda x: x['edge_sidecar_to_children']['edges'],
list) or [] list) or []
if edges: if edges:
entries = [] entries = []
for edge_num, edge in enumerate(edges, start=1): for edge_num, edge in enumerate(edges, start=1):
node = try_get(edge, lambda x: x['node'], dict) node = try_get(edge, lambda x: x['node'], dict)
if not node: if not node:
continue continue
node_video_url = url_or_none(node.get('video_url')) node_video_url = url_or_none(node.get('video_url'))
if not node_video_url: if not node_video_url:
continue continue
entries.append({ entries.append({
'id': node.get('shortcode') or node['id'], 'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num, 'title': 'Video %d' % edge_num,
'url': node_video_url, 'url': node_video_url,
'thumbnail': node.get('display_url'), 'thumbnail': node.get('display_url'),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])), 'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])), 'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')), 'view_count': int_or_none(node.get('video_view_count')),
}) })
return self.playlist_result( return self.playlist_result(
entries, video_id, entries, video_id,
'Post by %s' % uploader_id if uploader_id else None, 'Post by %s' % uploader_id if uploader_id else None,
description) description)
if not video_url: if not video_url:
video_url = self._og_search_video_url(webpage, secure=False) video_url = self._og_search_video_url(webpage, secure=False)

View File

@@ -1,15 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .telecinco import TelecincoIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
smuggle_url,
) )
class MiTeleIE(InfoExtractor): class MiTeleIE(TelecincoIE):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player' _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
@@ -31,7 +30,6 @@ class MiTeleIE(InfoExtractor):
'timestamp': 1471209401, 'timestamp': 1471209401,
'upload_date': '20160814', 'upload_date': '20160814',
}, },
'add_ie': ['Ooyala'],
}, { }, {
# no explicit title # no explicit title
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player', 'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
@@ -54,7 +52,6 @@ class MiTeleIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['Ooyala'],
}, { }, {
'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player', 'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
'only_matching': True, 'only_matching': True,
@@ -70,16 +67,11 @@ class MiTeleIE(InfoExtractor):
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})', r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
webpage, 'Pre Player'), display_id)['prePlayer'] webpage, 'Pre Player'), display_id)['prePlayer']
title = pre_player['title'] title = pre_player['title']
video = pre_player['video'] video_info = self._parse_content(pre_player['video'], url)
video_id = video['dataMediaId']
content = pre_player.get('content') or {} content = pre_player.get('content') or {}
info = content.get('info') or {} info = content.get('info') or {}
return { video_info.update({
'_type': 'url_transparent',
# for some reason only HLS is supported
'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}),
'id': video_id,
'title': title, 'title': title,
'description': info.get('synopsis'), 'description': info.get('synopsis'),
'series': content.get('title'), 'series': content.get('title'),
@@ -87,7 +79,7 @@ class MiTeleIE(InfoExtractor):
'episode': content.get('subtitle'), 'episode': content.get('subtitle'),
'episode_number': int_or_none(info.get('episode_number')), 'episode_number': int_or_none(info.get('episode_number')),
'duration': int_or_none(info.get('duration')), 'duration': int_or_none(info.get('duration')),
'thumbnail': video.get('dataPoster'),
'age_limit': int_or_none(info.get('rating')), 'age_limit': int_or_none(info.get('rating')),
'timestamp': parse_iso8601(pre_player.get('publishedTime')), 'timestamp': parse_iso8601(pre_player.get('publishedTime')),
} })
return video_info

View File

@@ -158,7 +158,8 @@ class NBCIE(AdobePassIE):
class NBCSportsVPlayerIE(InfoExtractor): class NBCSportsVPlayerIE(InfoExtractor):
_VALID_URL = r'https?://vplayer\.nbcsports\.com/(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)' _VALID_URL_BASE = r'https?://(?:vplayer\.nbcsports\.com|(?:www\.)?nbcsports\.com/vplayer)/'
_VALID_URL = _VALID_URL_BASE + r'(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/9CsDKds0kvHI', 'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/9CsDKds0kvHI',
@@ -174,12 +175,15 @@ class NBCSportsVPlayerIE(InfoExtractor):
}, { }, {
'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/_hqLjQ95yx8Z', 'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/_hqLjQ95yx8Z',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.nbcsports.com/vplayer/p/BxmELC/nbcsports/select/PHJSaFWbrTY9?form=html&autoPlay=true',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
iframe_m = re.search( iframe_m = re.search(
r'<iframe[^>]+src="(?P<url>https?://vplayer\.nbcsports\.com/[^"]+)"', webpage) r'<(?:iframe[^>]+|div[^>]+data-(?:mpx-)?)src="(?P<url>%s[^"]+)"' % NBCSportsVPlayerIE._VALID_URL_BASE, webpage)
if iframe_m: if iframe_m:
return iframe_m.group('url') return iframe_m.group('url')
@@ -192,21 +196,29 @@ class NBCSportsVPlayerIE(InfoExtractor):
class NBCSportsIE(InfoExtractor): class NBCSportsIE(InfoExtractor):
# Does not include https because its certificate is invalid _VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?!vplayer/)(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TEST = { _TESTS = [{
# iframe src
'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke', 'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke',
'info_dict': { 'info_dict': {
'id': 'PHJSaFWbrTY9', 'id': 'PHJSaFWbrTY9',
'ext': 'flv', 'ext': 'mp4',
'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke', 'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113', 'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
'uploader': 'NBCU-SPORTS', 'uploader': 'NBCU-SPORTS',
'upload_date': '20150330', 'upload_date': '20150330',
'timestamp': 1427726529, 'timestamp': 1427726529,
} }
} }, {
# data-mpx-src
'url': 'https://www.nbcsports.com/philadelphia/philadelphia-phillies/bruce-bochy-hector-neris-hes-idiot',
'only_matching': True,
}, {
# data-src
'url': 'https://www.nbcsports.com/boston/video/report-card-pats-secondary-no-match-josh-allen',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -274,33 +286,6 @@ class NBCSportsStreamIE(AdobePassIE):
} }
class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
'info_dict': {
'id': 'yvBLLUgQ8WU0',
'ext': 'mp4',
'title': 'SNC evening update: Wright named Red Sox\' No. 5 starter.',
'description': 'md5:1753cfee40d9352b19b4c9b3e589b9e3',
'timestamp': 1459369979,
'upload_date': '20160330',
'uploader': 'NBCU-SPORTS',
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': self._html_search_meta('twitter:player:stream', webpage),
'display_id': display_id,
}
class NBCNewsIE(ThePlatformIE): class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/([^/]+/)*(?:.*-)?(?P<id>[^/?]+)' _VALID_URL = r'(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/([^/]+/)*(?:.*-)?(?P<id>[^/?]+)'

View File

@@ -90,7 +90,7 @@ class NhkVodIE(NhkBaseIE):
_TESTS = [{ _TESTS = [{
# video clip # video clip
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999011/', 'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999011/',
'md5': '256a1be14f48d960a7e61e2532d95ec3', 'md5': '7a90abcfe610ec22a6bfe15bd46b30ca',
'info_dict': { 'info_dict': {
'id': 'a95j5iza', 'id': 'a95j5iza',
'ext': 'mp4', 'ext': 'mp4',

View File

@@ -5,10 +5,11 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
parse_iso8601,
float_or_none,
ExtractorError, ExtractorError,
float_or_none,
int_or_none, int_or_none,
parse_iso8601,
try_get,
) )
@@ -35,7 +36,7 @@ class NineCNineMediaIE(InfoExtractor):
'$include': '[HasClosedCaptions]', '$include': '[HasClosedCaptions]',
}) })
if content_package.get('Constraints', {}).get('Security', {}).get('Type'): if try_get(content_package, lambda x: x['Constraints']['Security']['Type']):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
manifest_base_url = content_package_url + 'manifest.' manifest_base_url = content_package_url + 'manifest.'
@@ -52,7 +53,7 @@ class NineCNineMediaIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [] thumbnails = []
for image in content.get('Images', []): for image in (content.get('Images') or []):
image_url = image.get('Url') image_url = image.get('Url')
if not image_url: if not image_url:
continue continue
@@ -70,7 +71,7 @@ class NineCNineMediaIE(InfoExtractor):
continue continue
container.append(e_name) container.append(e_name)
season = content.get('Season', {}) season = content.get('Season') or {}
info = { info = {
'id': content_id, 'id': content_id,
@@ -79,13 +80,14 @@ class NineCNineMediaIE(InfoExtractor):
'timestamp': parse_iso8601(content.get('BroadcastDateTime')), 'timestamp': parse_iso8601(content.get('BroadcastDateTime')),
'episode_number': int_or_none(content.get('Episode')), 'episode_number': int_or_none(content.get('Episode')),
'season': season.get('Name'), 'season': season.get('Name'),
'season_number': season.get('Number'), 'season_number': int_or_none(season.get('Number')),
'season_id': season.get('Id'), 'season_id': season.get('Id'),
'series': content.get('Media', {}).get('Name'), 'series': try_get(content, lambda x: x['Media']['Name']),
'tags': tags, 'tags': tags,
'categories': categories, 'categories': categories,
'duration': float_or_none(content_package.get('Duration')), 'duration': float_or_none(content_package.get('Duration')),
'formats': formats, 'formats': formats,
'thumbnails': thumbnails,
} }
if content_package.get('HasClosedCaptions'): if content_package.get('HasClosedCaptions'):

View File

@@ -6,16 +6,13 @@ import random
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
parse_age_limit,
parse_duration, parse_duration,
str_or_none,
try_get, try_get,
urljoin, urljoin,
url_or_none, url_or_none,
@@ -63,7 +60,8 @@ class NRKBaseIE(InfoExtractor):
return self._download_json( return self._download_json(
urljoin('http://psapi.nrk.no/', path), urljoin('http://psapi.nrk.no/', path),
video_id, note or 'Downloading %s JSON' % item, video_id, note or 'Downloading %s JSON' % item,
fatal=fatal, query=query) fatal=fatal, query=query,
headers={'Accept-Encoding': 'gzip, deflate, br'})
class NRKIE(NRKBaseIE): class NRKIE(NRKBaseIE):
@@ -116,9 +114,39 @@ class NRKIE(NRKBaseIE):
}, { }, {
'url': 'https://www.nrk.no/video/humor/kommentatorboksen-reiser-til-sjos_d1fda11f-a4ad-437a-a374-0398bc84e999', 'url': 'https://www.nrk.no/video/humor/kommentatorboksen-reiser-til-sjos_d1fda11f-a4ad-437a-a374-0398bc84e999',
'only_matching': True, 'only_matching': True,
}, {
# podcast
'url': 'nrk:l_96f4f1b0-de54-4e6a-b4f1-b0de54fe6af8',
'only_matching': True,
}, {
'url': 'nrk:podcast/l_96f4f1b0-de54-4e6a-b4f1-b0de54fe6af8',
'only_matching': True,
}, {
# clip
'url': 'nrk:150533',
'only_matching': True,
}, {
'url': 'nrk:clip/150533',
'only_matching': True,
}, {
# program
'url': 'nrk:MDDP12000117',
'only_matching': True,
}, {
'url': 'nrk:program/ENRK10100318',
'only_matching': True,
}, {
# direkte
'url': 'nrk:nrk1',
'only_matching': True,
}, {
'url': 'nrk:channel/nrk1',
'only_matching': True,
}] }]
def _extract_from_playback(self, video_id): def _real_extract(self, url):
video_id = self._match_id(url).split('/')[-1]
path_templ = 'playback/%s/' + video_id path_templ = 'playback/%s/' + video_id
def call_playback_api(item, query=None): def call_playback_api(item, query=None):
@@ -126,6 +154,8 @@ class NRKIE(NRKBaseIE):
# known values for preferredCdn: akamai, iponly, minicdn and telenor # known values for preferredCdn: akamai, iponly, minicdn and telenor
manifest = call_playback_api('manifest', {'preferredCdn': 'akamai'}) manifest = call_playback_api('manifest', {'preferredCdn': 'akamai'})
video_id = try_get(manifest, lambda x: x['id'], compat_str) or video_id
if manifest.get('playability') == 'nonPlayable': if manifest.get('playability') == 'nonPlayable':
self._raise_error(manifest['nonPlayable']) self._raise_error(manifest['nonPlayable'])
@@ -140,8 +170,15 @@ class NRKIE(NRKBaseIE):
format_url = url_or_none(asset.get('url')) format_url = url_or_none(asset.get('url'))
if not format_url: if not format_url:
continue continue
if asset.get('format') == 'HLS' or determine_ext(format_url) == 'm3u8': asset_format = (asset.get('format') or '').lower()
if asset_format == 'hls' or determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_nrk_formats(format_url, video_id)) formats.extend(self._extract_nrk_formats(format_url, video_id))
elif asset_format == 'mp3':
formats.append({
'url': format_url,
'format_id': asset_format,
'vcodec': 'none',
})
self._sort_formats(formats) self._sort_formats(formats)
data = call_playback_api('metadata') data = call_playback_api('metadata')
@@ -168,59 +205,133 @@ class NRKIE(NRKBaseIE):
'height': int_or_none(image.get('pixelHeight')), 'height': int_or_none(image.get('pixelHeight')),
}) })
return { subtitles = {}
for sub in try_get(playable, lambda x: x['subtitles'], list) or []:
if not isinstance(sub, dict):
continue
sub_url = url_or_none(sub.get('webVtt'))
if not sub_url:
continue
sub_key = str_or_none(sub.get('language')) or 'nb'
sub_type = str_or_none(sub.get('type'))
if sub_type:
sub_key += '-%s' % sub_type
subtitles.setdefault(sub_key, []).append({
'url': sub_url,
})
legal_age = try_get(
data, lambda x: x['legalAge']['body']['rating']['code'], compat_str)
# https://en.wikipedia.org/wiki/Norwegian_Media_Authority
age_limit = None
if legal_age:
if legal_age == 'A':
age_limit = 0
elif legal_age.isdigit():
age_limit = int_or_none(legal_age)
is_series = try_get(data, lambda x: x['_links']['series']['name']) == 'series'
info = {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'alt_title': alt_title, 'alt_title': alt_title,
'description': description, 'description': description,
'duration': duration, 'duration': duration,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'age_limit': age_limit,
'formats': formats, 'formats': formats,
'subtitles': subtitles,
} }
def _real_extract(self, url): if is_series:
video_id = self._match_id(url) series = season_id = season_number = episode = episode_number = None
return self._extract_from_playback(video_id) programs = self._call_api(
'programs/%s' % video_id, video_id, 'programs', fatal=False)
if programs and isinstance(programs, dict):
series = str_or_none(programs.get('seriesTitle'))
season_id = str_or_none(programs.get('seasonId'))
season_number = int_or_none(programs.get('seasonNumber'))
episode = str_or_none(programs.get('episodeTitle'))
episode_number = int_or_none(programs.get('episodeNumber'))
if not series:
series = title
if alt_title:
title += ' - %s' % alt_title
if not season_number:
season_number = int_or_none(self._search_regex(
r'Sesong\s+(\d+)', description or '', 'season number',
default=None))
if not episode:
episode = alt_title if is_series else None
if not episode_number:
episode_number = int_or_none(self._search_regex(
r'^(\d+)\.', episode or '', 'episode number',
default=None))
if not episode_number:
episode_number = int_or_none(self._search_regex(
r'\((\d+)\s*:\s*\d+\)', description or '',
'episode number', default=None))
info.update({
'title': title,
'series': series,
'season_id': season_id,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
})
return info
class NRKTVIE(NRKBaseIE): class NRKTVIE(InfoExtractor):
IE_DESC = 'NRK TV and NRK Radio' IE_DESC = 'NRK TV and NRK Radio'
_EPISODE_RE = r'(?P<id>[a-zA-Z]{4}\d{8})' _EPISODE_RE = r'(?P<id>[a-zA-Z]{4}\d{8})'
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:[^/]+/)*%s' % _EPISODE_RE _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:[^/]+/)*%s' % _EPISODE_RE
_API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/program/MDDP12000117', 'url': 'https://tv.nrk.no/program/MDDP12000117',
'md5': 'c4a5960f1b00b40d47db65c1064e0ab1', 'md5': 'c4a5960f1b00b40d47db65c1064e0ab1',
'info_dict': { 'info_dict': {
'id': 'MDDP12000117AA', 'id': 'MDDP12000117',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Alarm Trolltunga', 'title': 'Alarm Trolltunga',
'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce', 'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
'duration': 2223.44, 'duration': 2223.44,
'age_limit': 6, 'age_limit': 6,
'subtitles': {
'nb-nor': [{
'ext': 'vtt',
}],
'nb-ttv': [{
'ext': 'vtt',
}]
},
}, },
}, { }, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014', 'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '8d40dab61cea8ab0114e090b029a0565', 'md5': '8d40dab61cea8ab0114e090b029a0565',
'info_dict': { 'info_dict': {
'id': 'MUHH48000314AA', 'id': 'MUHH48000314',
'ext': 'mp4', 'ext': 'mp4',
'title': '20 spørsmål 23.05.2014', 'title': '20 spørsmål - 23. mai 2014',
'alt_title': '23. mai 2014',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a', 'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741, 'duration': 1741,
'series': '20 spørsmål', 'series': '20 spørsmål',
'episode': '23.05.2014', 'episode': '23. mai 2014',
'age_limit': 0,
}, },
}, { }, {
'url': 'https://tv.nrk.no/program/mdfp15000514', 'url': 'https://tv.nrk.no/program/mdfp15000514',
'info_dict': { 'info_dict': {
'id': 'MDFP15000514CA', 'id': 'MDFP15000514',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014', 'title': 'Kunnskapskanalen - Grunnlovsjubiléet - Stor ståhei for ingenting',
'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db', 'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
'duration': 4605.08, 'duration': 4605.08,
'series': 'Kunnskapskanalen', 'series': 'Kunnskapskanalen',
'episode': '24.05.2014', 'episode': 'Grunnlovsjubiléet - Stor ståhei for ingenting',
'age_limit': 0,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -229,10 +340,11 @@ class NRKTVIE(NRKBaseIE):
# single playlist video # single playlist video
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2', 'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'info_dict': { 'info_dict': {
'id': 'MSPO40010515AH', 'id': 'MSPO40010515',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015', 'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015',
'description': 'md5:c03aba1e917561eface5214020551b7a', 'description': 'md5:c03aba1e917561eface5214020551b7a',
'age_limit': 0,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -242,24 +354,27 @@ class NRKTVIE(NRKBaseIE):
}, { }, {
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015', 'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'info_dict': { 'info_dict': {
'id': 'MSPO40010515AH', 'id': 'MSPO40010515',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015', 'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015',
'description': 'md5:c03aba1e917561eface5214020551b7a', 'description': 'md5:c03aba1e917561eface5214020551b7a',
'age_limit': 0,
}, },
'expected_warnings': ['Failed to download m3u8 information'], 'expected_warnings': ['Failed to download m3u8 information'],
'skip': 'Ikke tilgjengelig utenfor Norge',
}, { }, {
'url': 'https://tv.nrk.no/serie/anno/KMTE50001317/sesong-3/episode-13', 'url': 'https://tv.nrk.no/serie/anno/KMTE50001317/sesong-3/episode-13',
'info_dict': { 'info_dict': {
'id': 'KMTE50001317AA', 'id': 'KMTE50001317',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Anno 13:30', 'title': 'Anno - 13. episode',
'description': 'md5:11d9613661a8dbe6f9bef54e3a4cbbfa', 'description': 'md5:11d9613661a8dbe6f9bef54e3a4cbbfa',
'duration': 2340, 'duration': 2340,
'series': 'Anno', 'series': 'Anno',
'episode': '13:30', 'episode': '13. episode',
'season_number': 3, 'season_number': 3,
'episode_number': 13, 'episode_number': 13,
'age_limit': 0,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -267,13 +382,14 @@ class NRKTVIE(NRKBaseIE):
}, { }, {
'url': 'https://tv.nrk.no/serie/nytt-paa-nytt/MUHH46000317/27-01-2017', 'url': 'https://tv.nrk.no/serie/nytt-paa-nytt/MUHH46000317/27-01-2017',
'info_dict': { 'info_dict': {
'id': 'MUHH46000317AA', 'id': 'MUHH46000317',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Nytt på Nytt 27.01.2017', 'title': 'Nytt på Nytt 27.01.2017',
'description': 'md5:5358d6388fba0ea6f0b6d11c48b9eb4b', 'description': 'md5:5358d6388fba0ea6f0b6d11c48b9eb4b',
'duration': 1796, 'duration': 1796,
'series': 'Nytt på nytt', 'series': 'Nytt på nytt',
'episode': '27.01.2017', 'episode': '27.01.2017',
'age_limit': 0,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -290,180 +406,26 @@ class NRKTVIE(NRKBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
_api_host = None
def _extract_from_mediaelement(self, video_id):
api_hosts = (self._api_host, ) if self._api_host else self._API_HOSTS
for api_host in api_hosts:
data = self._download_json(
'http://%s/mediaelement/%s' % (api_host, video_id),
video_id, 'Downloading mediaelement JSON',
fatal=api_host == api_hosts[-1])
if not data:
continue
self._api_host = api_host
break
title = data.get('fullTitle') or data.get('mainTitle') or data['title']
video_id = data.get('id') or video_id
urls = []
entries = []
conviva = data.get('convivaStatistics') or {}
live = (data.get('mediaElementType') == 'Live'
or data.get('isLive') is True or conviva.get('isLive'))
def make_title(t):
return self._live_title(t) if live else t
media_assets = data.get('mediaAssets')
if media_assets and isinstance(media_assets, list):
def video_id_and_title(idx):
return ((video_id, title) if len(media_assets) == 1
else ('%s-%d' % (video_id, idx), '%s (Part %d)' % (title, idx)))
for num, asset in enumerate(media_assets, 1):
asset_url = asset.get('url')
if not asset_url or asset_url in urls:
continue
urls.append(asset_url)
formats = self._extract_nrk_formats(asset_url, video_id)
if not formats:
continue
self._sort_formats(formats)
entry_id, entry_title = video_id_and_title(num)
duration = parse_duration(asset.get('duration'))
subtitles = {}
for subtitle in ('webVtt', 'timedText'):
subtitle_url = asset.get('%sSubtitlesUrl' % subtitle)
if subtitle_url:
subtitles.setdefault('no', []).append({
'url': compat_urllib_parse_unquote(subtitle_url)
})
entries.append({
'id': asset.get('carrierId') or entry_id,
'title': make_title(entry_title),
'duration': duration,
'subtitles': subtitles,
'formats': formats,
'is_live': live,
})
if not entries:
media_url = data.get('mediaUrl')
if media_url and media_url not in urls:
formats = self._extract_nrk_formats(media_url, video_id)
if formats:
self._sort_formats(formats)
duration = parse_duration(data.get('duration'))
entries = [{
'id': video_id,
'title': make_title(title),
'duration': duration,
'formats': formats,
'is_live': live,
}]
if not entries:
self._raise_error(data)
series = conviva.get('seriesName') or data.get('seriesTitle')
episode = conviva.get('episodeName') or data.get('episodeNumberOrDate')
season_number = None
episode_number = None
if data.get('mediaElementType') == 'Episode':
_season_episode = data.get('scoresStatistics', {}).get('springStreamStream') or \
data.get('relativeOriginUrl', '')
EPISODENUM_RE = [
r'/s(?P<season>\d{,2})e(?P<episode>\d{,2})\.',
r'/sesong-(?P<season>\d{,2})/episode-(?P<episode>\d{,2})',
]
season_number = int_or_none(self._search_regex(
EPISODENUM_RE, _season_episode, 'season number',
default=None, group='season'))
episode_number = int_or_none(self._search_regex(
EPISODENUM_RE, _season_episode, 'episode number',
default=None, group='episode'))
thumbnails = None
images = data.get('images')
if images and isinstance(images, dict):
web_images = images.get('webImages')
if isinstance(web_images, list):
thumbnails = [{
'url': image['imageUrl'],
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
} for image in web_images if image.get('imageUrl')]
description = data.get('description')
category = data.get('mediaAnalytics', {}).get('category')
common_info = {
'description': description,
'series': series,
'episode': episode,
'season_number': season_number,
'episode_number': episode_number,
'categories': [category] if category else None,
'age_limit': parse_age_limit(data.get('legalAge')),
'thumbnails': thumbnails,
}
vcodec = 'none' if data.get('mediaType') == 'Audio' else None
for entry in entries:
entry.update(common_info)
for f in entry['formats']:
f['vcodec'] = vcodec
points = data.get('shortIndexPoints')
if isinstance(points, list):
chapters = []
for next_num, point in enumerate(points, start=1):
if not isinstance(point, dict):
continue
start_time = parse_duration(point.get('startPoint'))
if start_time is None:
continue
end_time = parse_duration(
data.get('duration')
if next_num == len(points)
else points[next_num].get('startPoint'))
if end_time is None:
continue
chapters.append({
'start_time': start_time,
'end_time': end_time,
'title': point.get('title'),
})
if chapters and len(entries) == 1:
entries[0]['chapters'] = chapters
return self.playlist_result(entries, video_id, title, description)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self._extract_from_mediaelement(video_id) return self.url_result(
'nrk:%s' % video_id, ie=NRKIE.ie_key(), video_id=video_id)
class NRKTVEpisodeIE(InfoExtractor): class NRKTVEpisodeIE(InfoExtractor):
_VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)' _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/(?P<season_number>\d+)/episode/(?P<episode_number>\d+))'
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2', 'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
'info_dict': { 'info_dict': {
'id': 'MUHH36005220BA', 'id': 'MUHH36005220',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Kro, krig og kjærlighet 2:6', 'title': 'Hellums kro - 2. Kro, krig og kjærlighet',
'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350', 'description': 'md5:ad92ddffc04cea8ce14b415deef81787',
'duration': 1563, 'duration': 1563.92,
'series': 'Hellums kro', 'series': 'Hellums kro',
'season_number': 1, 'season_number': 1,
'episode_number': 2, 'episode_number': 2,
'episode': '2:6', 'episode': '2. Kro, krig og kjærlighet',
'age_limit': 6, 'age_limit': 6,
}, },
'params': { 'params': {
@@ -472,15 +434,16 @@ class NRKTVEpisodeIE(InfoExtractor):
}, { }, {
'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8', 'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
'info_dict': { 'info_dict': {
'id': 'MSUI14000816AA', 'id': 'MSUI14000816',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Backstage 8:30', 'title': 'Backstage - 8. episode',
'description': 'md5:de6ca5d5a2d56849e4021f2bf2850df4', 'description': 'md5:de6ca5d5a2d56849e4021f2bf2850df4',
'duration': 1320, 'duration': 1320,
'series': 'Backstage', 'series': 'Backstage',
'season_number': 1, 'season_number': 1,
'episode_number': 8, 'episode_number': 8,
'episode': '8:30', 'episode': '8. episode',
'age_limit': 0,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -489,7 +452,7 @@ class NRKTVEpisodeIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id, season_number, episode_number = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
@@ -501,10 +464,12 @@ class NRKTVEpisodeIE(InfoExtractor):
assert re.match(NRKTVIE._EPISODE_RE, nrk_id) assert re.match(NRKTVIE._EPISODE_RE, nrk_id)
info.update({ info.update({
'_type': 'url_transparent', '_type': 'url',
'id': nrk_id, 'id': nrk_id,
'url': 'nrk:%s' % nrk_id, 'url': 'nrk:%s' % nrk_id,
'ie_key': NRKIE.ie_key(), 'ie_key': NRKIE.ie_key(),
'season_number': int(season_number),
'episode_number': int(episode_number),
}) })
return info return info
@@ -518,8 +483,6 @@ class NRKTVSerieBaseIE(NRKBaseIE):
nrk_id = episode.get('prfId') or episode.get('episodeId') nrk_id = episode.get('prfId') or episode.get('episodeId')
if not nrk_id or not isinstance(nrk_id, compat_str): if not nrk_id or not isinstance(nrk_id, compat_str):
continue continue
if not re.match(NRKTVIE._EPISODE_RE, nrk_id):
continue
entries.append(self.url_result( entries.append(self.url_result(
'nrk:%s' % nrk_id, ie=NRKIE.ie_key(), video_id=nrk_id)) 'nrk:%s' % nrk_id, ie=NRKIE.ie_key(), video_id=nrk_id))
return entries return entries
@@ -531,6 +494,10 @@ class NRKTVSerieBaseIE(NRKBaseIE):
if embedded.get(asset_key): if embedded.get(asset_key):
return asset_key return asset_key
@staticmethod
def _catalog_name(serie_kind):
return 'podcast' if serie_kind in ('podcast', 'podkast') else 'series'
def _entries(self, data, display_id): def _entries(self, data, display_id):
for page_num in itertools.count(1): for page_num in itertools.count(1):
embedded = data.get('_embedded') or data embedded = data.get('_embedded') or data
@@ -564,7 +531,16 @@ class NRKTVSerieBaseIE(NRKBaseIE):
class NRKTVSeasonIE(NRKTVSerieBaseIE): class NRKTVSeasonIE(NRKTVSerieBaseIE):
_VALID_URL = r'https?://(?P<domain>tv|radio)\.nrk\.no/serie/(?P<serie>[^/]+)/(?:sesong/)?(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://
(?P<domain>tv|radio)\.nrk\.no/
(?P<serie_kind>serie|pod[ck]ast)/
(?P<serie>[^/]+)/
(?:
(?:sesong/)?(?P<id>\d+)|
sesong/(?P<id_2>[^/?#&]+)
)
'''
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/serie/backstage/sesong/1', 'url': 'https://tv.nrk.no/serie/backstage/sesong/1',
'info_dict': { 'info_dict': {
@@ -600,19 +576,34 @@ class NRKTVSeasonIE(NRKTVSerieBaseIE):
# 180 entries, single page # 180 entries, single page
'url': 'https://tv.nrk.no/serie/spangas/sesong/1', 'url': 'https://tv.nrk.no/serie/spangas/sesong/1',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://radio.nrk.no/podkast/hele_historien/sesong/diagnose-kverulant',
'info_dict': {
'id': 'hele_historien/diagnose-kverulant',
'title': 'Diagnose kverulant',
},
'playlist_mincount': 3,
}, {
'url': 'https://radio.nrk.no/podkast/loerdagsraadet/sesong/202101',
'only_matching': True,
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return (False if NRKTVIE.suitable(url) or NRKTVEpisodeIE.suitable(url) return (False if NRKTVIE.suitable(url) or NRKTVEpisodeIE.suitable(url) or NRKRadioPodkastIE.suitable(url)
else super(NRKTVSeasonIE, cls).suitable(url)) else super(NRKTVSeasonIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
domain, serie, season_id = re.match(self._VALID_URL, url).groups() mobj = re.match(self._VALID_URL, url)
domain = mobj.group('domain')
serie_kind = mobj.group('serie_kind')
serie = mobj.group('serie')
season_id = mobj.group('id') or mobj.group('id_2')
display_id = '%s/%s' % (serie, season_id) display_id = '%s/%s' % (serie, season_id)
data = self._call_api( data = self._call_api(
'%s/catalog/series/%s/seasons/%s' % (domain, serie, season_id), '%s/catalog/%s/%s/seasons/%s'
% (domain, self._catalog_name(serie_kind), serie, season_id),
display_id, 'season', query={'pageSize': 50}) display_id, 'season', query={'pageSize': 50})
title = try_get(data, lambda x: x['titles']['title'], compat_str) or display_id title = try_get(data, lambda x: x['titles']['title'], compat_str) or display_id
@@ -622,7 +613,7 @@ class NRKTVSeasonIE(NRKTVSerieBaseIE):
class NRKTVSeriesIE(NRKTVSerieBaseIE): class NRKTVSeriesIE(NRKTVSerieBaseIE):
_VALID_URL = r'https?://(?P<domain>(?:tv|radio)\.nrk|(?:tv\.)?nrksuper)\.no/serie/(?P<id>[^/]+)' _VALID_URL = r'https?://(?P<domain>(?:tv|radio)\.nrk|(?:tv\.)?nrksuper)\.no/(?P<serie_kind>serie|pod[ck]ast)/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
# new layout, instalments # new layout, instalments
'url': 'https://tv.nrk.no/serie/groenn-glede', 'url': 'https://tv.nrk.no/serie/groenn-glede',
@@ -682,23 +673,33 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
}, { }, {
'url': 'https://nrksuper.no/serie/labyrint', 'url': 'https://nrksuper.no/serie/labyrint',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://radio.nrk.no/podkast/ulrikkes_univers',
'info_dict': {
'id': 'ulrikkes_univers',
},
'playlist_mincount': 10,
}, {
'url': 'https://radio.nrk.no/podkast/ulrikkes_univers/nrkno-poddkast-26588-134079-05042018030000',
'only_matching': True,
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return ( return (
False if any(ie.suitable(url) False if any(ie.suitable(url)
for ie in (NRKTVIE, NRKTVEpisodeIE, NRKTVSeasonIE)) for ie in (NRKTVIE, NRKTVEpisodeIE, NRKRadioPodkastIE, NRKTVSeasonIE))
else super(NRKTVSeriesIE, cls).suitable(url)) else super(NRKTVSeriesIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
site, series_id = re.match(self._VALID_URL, url).groups() site, serie_kind, series_id = re.match(self._VALID_URL, url).groups()
is_radio = site == 'radio.nrk' is_radio = site == 'radio.nrk'
domain = 'radio' if is_radio else 'tv' domain = 'radio' if is_radio else 'tv'
size_prefix = 'p' if is_radio else 'embeddedInstalmentsP' size_prefix = 'p' if is_radio else 'embeddedInstalmentsP'
series = self._call_api( series = self._call_api(
'%s/catalog/series/%s' % (domain, series_id), '%s/catalog/%s/%s'
% (domain, self._catalog_name(serie_kind), series_id),
series_id, 'serie', query={size_prefix + 'ageSize': 50}) series_id, 'serie', query={size_prefix + 'ageSize': 50})
titles = try_get(series, [ titles = try_get(series, [
lambda x: x['titles'], lambda x: x['titles'],
@@ -713,12 +714,14 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
embedded_seasons = embedded.get('seasons') or [] embedded_seasons = embedded.get('seasons') or []
if len(linked_seasons) > len(embedded_seasons): if len(linked_seasons) > len(embedded_seasons):
for season in linked_seasons: for season in linked_seasons:
season_name = season.get('name') season_url = urljoin(url, season.get('href'))
if season_name and isinstance(season_name, compat_str): if not season_url:
season_name = season.get('name')
if season_name and isinstance(season_name, compat_str):
season_url = 'https://%s.nrk.no/serie/%s/sesong/%s' % (domain, series_id, season_name)
if season_url:
entries.append(self.url_result( entries.append(self.url_result(
'https://%s.nrk.no/serie/%s/sesong/%s' season_url, ie=NRKTVSeasonIE.ie_key(),
% (domain, series_id, season_name),
ie=NRKTVSeasonIE.ie_key(),
video_title=season.get('title'))) video_title=season.get('title')))
else: else:
for season in embedded_seasons: for season in embedded_seasons:
@@ -743,6 +746,38 @@ class NRKTVDirekteIE(NRKTVIE):
}] }]
class NRKRadioPodkastIE(InfoExtractor):
_VALID_URL = r'https?://radio\.nrk\.no/pod[ck]ast/(?:[^/]+/)+(?P<id>l_[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'https://radio.nrk.no/podkast/ulrikkes_univers/l_96f4f1b0-de54-4e6a-b4f1-b0de54fe6af8',
'md5': '8d40dab61cea8ab0114e090b029a0565',
'info_dict': {
'id': 'MUHH48000314AA',
'ext': 'mp4',
'title': '20 spørsmål 23.05.2014',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741,
'series': '20 spørsmål',
'episode': '23.05.2014',
},
}, {
'url': 'https://radio.nrk.no/podcast/ulrikkes_univers/l_96f4f1b0-de54-4e6a-b4f1-b0de54fe6af8',
'only_matching': True,
}, {
'url': 'https://radio.nrk.no/podkast/ulrikkes_univers/sesong/1/l_96f4f1b0-de54-4e6a-b4f1-b0de54fe6af8',
'only_matching': True,
}, {
'url': 'https://radio.nrk.no/podkast/hele_historien/sesong/bortfoert-i-bergen/l_774d1a2c-7aa7-4965-8d1a-2c7aa7d9652c',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'nrk:%s' % video_id, ie=NRKIE.ie_key(), video_id=video_id)
class NRKPlaylistBaseIE(InfoExtractor): class NRKPlaylistBaseIE(InfoExtractor):
def _extract_description(self, webpage): def _extract_description(self, webpage):
pass pass

View File

@@ -6,16 +6,33 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError,
dict_get, dict_get,
ExtractorError,
int_or_none, int_or_none,
unescapeHTML,
parse_iso8601, parse_iso8601,
try_get,
unescapeHTML,
) )
class PikselIE(InfoExtractor): class PikselIE(InfoExtractor):
_VALID_URL = r'https?://player\.piksel\.com/v/(?:refid/[^/]+/prefid/)?(?P<id>[a-z0-9_]+)' _VALID_URL = r'''(?x)https?://
(?:
(?:
player\.
(?:
olympusattelecom|
vibebyvista
)|
(?:api|player)\.multicastmedia|
(?:api-ovp|player)\.piksel
)\.com|
(?:
mz-edge\.stream\.co|
movie-s\.nhk\.or
)\.jp|
vidego\.baltimorecity\.gov
)/v/(?:refid/(?P<refid>[^/]+)/prefid/)?(?P<id>[\w-]+)'''
_TESTS = [ _TESTS = [
{ {
'url': 'http://player.piksel.com/v/ums2867l', 'url': 'http://player.piksel.com/v/ums2867l',
@@ -56,46 +73,41 @@ class PikselIE(InfoExtractor):
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')
def _call_api(self, app_token, resource, display_id, query, fatal=True):
response = (self._download_json(
'http://player.piksel.com/ws/ws_%s/api/%s/mode/json/apiv/5' % (resource, app_token),
display_id, query=query, fatal=fatal) or {}).get('response')
failure = try_get(response, lambda x: x['failure']['reason'])
if failure:
if fatal:
raise ExtractorError(failure, expected=True)
self.report_warning(failure)
return response
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) ref_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-de-program-uuid=[\'"]([a-z0-9]+)',
webpage, 'program uuid', default=display_id)
app_token = self._search_regex([ app_token = self._search_regex([
r'clientAPI\s*:\s*"([^"]+)"', r'clientAPI\s*:\s*"([^"]+)"',
r'data-de-api-key\s*=\s*"([^"]+)"' r'data-de-api-key\s*=\s*"([^"]+)"'
], webpage, 'app token') ], webpage, 'app token')
response = self._download_json( query = {'refid': ref_id, 'prefid': display_id} if ref_id else {'v': display_id}
'http://player.piksel.com/ws/ws_program/api/%s/mode/json/apiv/5' % app_token, program = self._call_api(
video_id, query={ app_token, 'program', display_id, query)['WsProgramResponse']['program']
'v': video_id video_id = program['uuid']
})['response'] video_data = program['asset']
failure = response.get('failure')
if failure:
raise ExtractorError(response['failure']['reason'], expected=True)
video_data = response['WsProgramResponse']['program']['asset']
title = video_data['title'] title = video_data['title']
asset_type = dict_get(video_data, ['assetType', 'asset_type'])
formats = [] formats = []
m3u8_url = dict_get(video_data, [ def process_asset_file(asset_file):
'm3u8iPadURL', if not asset_file:
'ipadM3u8Url', return
'm3u8AndroidURL',
'm3u8iPhoneURL',
'iphoneM3u8Url'])
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
asset_type = dict_get(video_data, ['assetType', 'asset_type'])
for asset_file in video_data.get('assetFiles', []):
# TODO: extract rtmp formats # TODO: extract rtmp formats
http_url = asset_file.get('http_url') http_url = asset_file.get('http_url')
if not http_url: if not http_url:
continue return
tbr = None tbr = None
vbr = int_or_none(asset_file.get('videoBitrate'), 1024) vbr = int_or_none(asset_file.get('videoBitrate'), 1024)
abr = int_or_none(asset_file.get('audioBitrate'), 1024) abr = int_or_none(asset_file.get('audioBitrate'), 1024)
@@ -118,6 +130,43 @@ class PikselIE(InfoExtractor):
'filesize': int_or_none(asset_file.get('filesize')), 'filesize': int_or_none(asset_file.get('filesize')),
'tbr': tbr, 'tbr': tbr,
}) })
def process_asset_files(asset_files):
for asset_file in (asset_files or []):
process_asset_file(asset_file)
process_asset_files(video_data.get('assetFiles'))
process_asset_file(video_data.get('referenceFile'))
if not formats:
asset_id = video_data.get('assetid') or program.get('assetid')
if asset_id:
process_asset_files(try_get(self._call_api(
app_token, 'asset_file', display_id, {
'assetid': asset_id,
}, False), lambda x: x['WsAssetFileResponse']['AssetFiles']))
m3u8_url = dict_get(video_data, [
'm3u8iPadURL',
'ipadM3u8Url',
'm3u8AndroidURL',
'm3u8iPhoneURL',
'iphoneM3u8Url'])
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
smil_url = dict_get(video_data, ['httpSmil', 'hdSmil', 'rtmpSmil'])
if smil_url:
transform_source = None
if ref_id == 'nhkworld':
# TODO: figure out if this is something to be fixed in urljoin,
# _parse_smil_formats or keep it here
transform_source = lambda x: x.replace('src="/', 'src="').replace('/media"', '/media/"')
formats.extend(self._extract_smil_formats(
re.sub(r'/od/[^/]+/', '/od/http/', smil_url), video_id,
transform_source=transform_source, fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@@ -288,14 +288,24 @@ class PornHubIE(PornHubBaseIE):
video_urls.append((v_url, None)) video_urls.append((v_url, None))
video_urls_set.add(v_url) video_urls_set.add(v_url)
def parse_quality_items(quality_items):
q_items = self._parse_json(quality_items, video_id, fatal=False)
if not isinstance(q_items, list):
return
for item in q_items:
if isinstance(item, dict):
add_video_url(item.get('url'))
if not video_urls: if not video_urls:
FORMAT_PREFIXES = ('media', 'quality') FORMAT_PREFIXES = ('media', 'quality', 'qualityItems')
js_vars = extract_js_vars( js_vars = extract_js_vars(
webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES), webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
default=None) default=None)
if js_vars: if js_vars:
for key, format_url in js_vars.items(): for key, format_url in js_vars.items():
if any(key.startswith(p) for p in FORMAT_PREFIXES): if key.startswith(FORMAT_PREFIXES[-1]):
parse_quality_items(format_url)
elif any(key.startswith(p) for p in FORMAT_PREFIXES[:2]):
add_video_url(format_url) add_video_url(format_url)
if not video_urls and re.search( if not video_urls and re.search(
r'<[^>]+\bid=["\']lockedPlayer', webpage): r'<[^>]+\bid=["\']lockedPlayer', webpage):
@@ -351,12 +361,16 @@ class PornHubIE(PornHubBaseIE):
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
webpage, 'uploader', default=None) webpage, 'uploader', default=None)
def extract_vote_count(kind, name):
return self._extract_count(
(r'<span[^>]+\bclass="votes%s"[^>]*>([\d,\.]+)</span>' % kind,
r'<span[^>]+\bclass=["\']votes%s["\'][^>]*\bdata-rating=["\'](\d+)' % kind),
webpage, name)
view_count = self._extract_count( view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view') r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view')
like_count = self._extract_count( like_count = extract_vote_count('Up', 'like')
r'<span[^>]+class="votesUp"[^>]*>([\d,\.]+)</span>', webpage, 'like') dislike_count = extract_vote_count('Down', 'dislike')
dislike_count = self._extract_count(
r'<span[^>]+class="votesDown"[^>]*>([\d,\.]+)</span>', webpage, 'dislike')
comment_count = self._extract_count( comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment') r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')

View File

@@ -8,6 +8,7 @@ from ..utils import (
int_or_none, int_or_none,
float_or_none, float_or_none,
try_get, try_get,
unescapeHTML,
url_or_none, url_or_none,
) )
@@ -56,7 +57,8 @@ class RedditRIE(InfoExtractor):
'id': 'zv89llsvexdz', 'id': 'zv89llsvexdz',
'ext': 'mp4', 'ext': 'mp4',
'title': 'That small heart attack.', 'title': 'That small heart attack.',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:4',
'timestamp': 1501941939, 'timestamp': 1501941939,
'upload_date': '20170805', 'upload_date': '20170805',
'uploader': 'Antw87', 'uploader': 'Antw87',
@@ -118,11 +120,34 @@ class RedditRIE(InfoExtractor):
else: else:
age_limit = None age_limit = None
thumbnails = []
def add_thumbnail(src):
if not isinstance(src, dict):
return
thumbnail_url = url_or_none(src.get('url'))
if not thumbnail_url:
return
thumbnails.append({
'url': unescapeHTML(thumbnail_url),
'width': int_or_none(src.get('width')),
'height': int_or_none(src.get('height')),
})
for image in try_get(data, lambda x: x['preview']['images']) or []:
if not isinstance(image, dict):
continue
add_thumbnail(image.get('source'))
resolutions = image.get('resolutions')
if isinstance(resolutions, list):
for resolution in resolutions:
add_thumbnail(resolution)
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': video_url, 'url': video_url,
'title': data.get('title'), 'title': data.get('title'),
'thumbnail': url_or_none(data.get('thumbnail')), 'thumbnails': thumbnails,
'timestamp': float_or_none(data.get('created_utc')), 'timestamp': float_or_none(data.get('created_utc')),
'uploader': data.get('author'), 'uploader': data.get('author'),
'duration': int_or_none(try_get( 'duration': int_or_none(try_get(

View File

@@ -4,8 +4,12 @@ from __future__ import unicode_literals
import re import re
from .brightcove import BrightcoveNewIE from .brightcove import BrightcoveNewIE
from ..compat import compat_str from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import ( from ..utils import (
ExtractorError,
try_get, try_get,
update_url_query, update_url_query,
) )
@@ -41,16 +45,22 @@ class SevenPlusIE(BrightcoveNewIE):
def _real_extract(self, url): def _real_extract(self, url):
path, episode_id = re.match(self._VALID_URL, url).groups() path, episode_id = re.match(self._VALID_URL, url).groups()
media = self._download_json( try:
'https://videoservice.swm.digital/playback', episode_id, query={ media = self._download_json(
'appId': '7plus', 'https://videoservice.swm.digital/playback', episode_id, query={
'deviceType': 'web', 'appId': '7plus',
'platformType': 'web', 'deviceType': 'web',
'accountId': 5303576322001, 'platformType': 'web',
'referenceId': 'ref:' + episode_id, 'accountId': 5303576322001,
'deliveryId': 'csai', 'referenceId': 'ref:' + episode_id,
'videoType': 'vod', 'deliveryId': 'csai',
})['media'] 'videoType': 'vod',
})['media']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
raise ExtractorError(self._parse_json(
e.cause.read().decode(), episode_id)[0]['error_code'], expected=True)
raise
for source in media.get('sources', {}): for source in media.get('sources', {}):
src = source.get('src') src = source.get('src')

View File

@@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
extract_attributes, extract_attributes,
@@ -11,36 +13,59 @@ from ..utils import (
class SkyBaseIE(InfoExtractor): class SkyBaseIE(InfoExtractor):
def _real_extract(self, url): BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
video_id = self._match_id(url) _SDC_EL_REGEX = r'(?s)(<div[^>]+data-(?:component-name|fn)="sdc-(?:articl|sit)e-video"[^>]*>)'
webpage = self._download_webpage(url, video_id)
video_data = extract_attributes(self._search_regex(
r'(<div.+?class="[^"]*sdc-article-video__media-ooyala[^"]*"[^>]+>)',
webpage, 'video data'))
video_url = 'ooyala:%s' % video_data['data-video-id'] def _process_ooyala_element(self, webpage, sdc_el, url):
if video_data.get('data-token-required') == 'true': sdc = extract_attributes(sdc_el)
token_fetch_options = self._parse_json(video_data.get( provider = sdc.get('data-provider')
'data-token-fetch-options', '{}'), video_id, fatal=False) or {} if provider == 'ooyala':
token_fetch_url = token_fetch_options.get('url') video_id = sdc['data-sdc-video-id']
if token_fetch_url: video_url = 'ooyala:%s' % video_id
embed_token = self._download_webpage(urljoin( ie_key = 'Ooyala'
url, token_fetch_url), video_id, fatal=False) ooyala_el = self._search_regex(
if embed_token: r'(<div[^>]+class="[^"]*\bsdc-article-video__media-ooyala\b[^"]*"[^>]+data-video-id="%s"[^>]*>)' % video_id,
video_url = smuggle_url( webpage, 'video data', fatal=False)
video_url, {'embed_token': embed_token.strip('"')}) if ooyala_el:
ooyala_attrs = extract_attributes(ooyala_el) or {}
if ooyala_attrs.get('data-token-required') == 'true':
token_fetch_url = (self._parse_json(ooyala_attrs.get(
'data-token-fetch-options', '{}'),
video_id, fatal=False) or {}).get('url')
if token_fetch_url:
embed_token = self._download_json(urljoin(
url, token_fetch_url), video_id, fatal=False)
if embed_token:
video_url = smuggle_url(
video_url, {'embed_token': embed_token})
elif provider == 'brightcove':
video_id = sdc['data-video-id']
account_id = sdc.get('data-account-id') or '6058004172001'
player_id = sdc.get('data-player-id') or 'RC9PQUaJ6'
video_url = self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id)
ie_key = 'BrightcoveNew'
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'ie_key': ie_key,
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info = self._process_ooyala_element(webpage, self._search_regex(
self._SDC_EL_REGEX, webpage, 'sdc element'), url)
info.update({
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'description': strip_or_none(self._og_search_description(webpage)), 'description': strip_or_none(self._og_search_description(webpage)),
'ie_key': 'Ooyala', })
} return info
class SkySportsIE(SkyBaseIE): class SkySportsIE(SkyBaseIE):
IE_NAME = 'sky:sports'
_VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/([^/]+/)*(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/([^/]+/)*(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine', 'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine',
@@ -62,15 +87,45 @@ class SkySportsIE(SkyBaseIE):
class SkyNewsIE(SkyBaseIE): class SkyNewsIE(SkyBaseIE):
IE_NAME = 'sky:news'
_VALID_URL = r'https?://news\.sky\.com/video/[0-9a-z-]+-(?P<id>[0-9]+)' _VALID_URL = r'https?://news\.sky\.com/video/[0-9a-z-]+-(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'https://news.sky.com/video/russian-plane-inspected-after-deadly-fire-11712962', 'url': 'https://news.sky.com/video/russian-plane-inspected-after-deadly-fire-11712962',
'md5': 'd6327e581473cea9976a3236ded370cd', 'md5': '411e8893fd216c75eaf7e4c65d364115',
'info_dict': { 'info_dict': {
'id': '1ua21xaDE6lCtZDmbYfl8kwsKLooJbNM', 'id': 'ref:1ua21xaDE6lCtZDmbYfl8kwsKLooJbNM',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Russian plane inspected after deadly fire', 'title': 'Russian plane inspected after deadly fire',
'description': 'The Russian Investigative Committee has released video of the wreckage of a passenger plane which caught fire near Moscow.', 'description': 'The Russian Investigative Committee has released video of the wreckage of a passenger plane which caught fire near Moscow.',
'uploader_id': '6058004172001',
'timestamp': 1567112345,
'upload_date': '20190829',
}, },
'add_ie': ['Ooyala'], 'add_ie': ['BrightcoveNew'],
} }
class SkySportsNewsIE(SkyBaseIE):
IE_NAME = 'sky:sports:news'
_VALID_URL = r'https?://(?:www\.)?skysports\.com/([^/]+/)*news/\d+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.skysports.com/golf/news/12176/10871916/dustin-johnson-ready-to-conquer-players-championship-at-tpc-sawgrass',
'info_dict': {
'id': '10871916',
'title': 'Dustin Johnson ready to conquer Players Championship at TPC Sawgrass',
'description': 'Dustin Johnson is confident he can continue his dominant form in 2017 by adding the Players Championship to his list of victories.',
},
'playlist_count': 2,
}
def _real_extract(self, url):
article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id)
entries = []
for sdc_el in re.findall(self._SDC_EL_REGEX, webpage):
entries.append(self._process_ooyala_element(webpage, sdc_el, url))
return self.playlist_result(
entries, article_id, self._og_search_title(webpage),
self._html_search_meta(['og:description', 'description'], webpage))

View File

@@ -1,40 +1,112 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import time
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
)
class SonyLIVIE(InfoExtractor): class SonyLIVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?sonyliv\.com/details/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?sonyliv\.com/(?:s(?:how|port)s/[^/]+|movies|clip|trailer|music-videos)/[^/?#&]+-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight", 'url': 'https://www.sonyliv.com/shows/bachelors-delight-1700000113/achaari-cheese-toast-1000022678?watch=true',
'info_dict': { 'info_dict': {
'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight", 'title': 'Bachelors Delight - Achaari Cheese Toast',
'id': 'ref:5024612095001', 'id': '1000022678',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20170923', 'upload_date': '20200411',
'description': 'md5:7f28509a148d5be9d0782b4d5106410d', 'description': 'md5:3957fa31d9309bf336ceb3f37ad5b7cb',
'uploader_id': '5182475815001', 'timestamp': 1586632091,
'timestamp': 1506200547, 'duration': 185,
'season_number': 1,
'episode': 'Achaari Cheese Toast',
'episode_number': 1,
'release_year': 2016,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['BrightcoveNew'],
}, { }, {
'url': 'http://www.sonyliv.com/details/full%20movie/4951168986001/Sei-Raat-(Bangla)', 'url': 'https://www.sonyliv.com/movies/tahalka-1000050121?watch=true',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/clip/jigarbaaz-1000098925',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/trailer/sandwiched-forever-1000100286?watch=true',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/sports/india-tour-of-australia-2020-21-1700000286/cricket-hls-day-3-1st-test-aus-vs-ind-19-dec-2020-1000100959?watch=true',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/music-videos/yeh-un-dinon-ki-baat-hai-1000018779',
'only_matching': True, 'only_matching': True,
}] }]
_GEO_COUNTRIES = ['IN']
_TOKEN = None
# BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s' def _call_api(self, version, path, video_id):
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5182475815001/default_default/index.html?videoId=ref:%s' headers = {}
if self._TOKEN:
headers['security_token'] = self._TOKEN
try:
return self._download_json(
'https://apiv2.sonyliv.com/AGL/%s/A/ENG/WEB/%s' % (version, path),
video_id, headers=headers)['resultObj']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
message = self._parse_json(
e.cause.read().decode(), video_id)['message']
if message == 'Geoblocked Country':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(message)
raise
def _real_initialize(self):
self._TOKEN = self._call_api('1.4', 'ALL/GETTOKEN', None)
def _real_extract(self, url): def _real_extract(self, url):
brightcove_id = self._match_id(url) video_id = self._match_id(url)
return self.url_result( content = self._call_api(
smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, { '1.5', 'IN/CONTENT/VIDEOURL/VOD/' + video_id, video_id)
'geo_countries': ['IN'], if content.get('isEncrypted'):
'referrer': url, raise ExtractorError('This video is DRM protected.', expected=True)
}), dash_url = content['videoURL']
'BrightcoveNew', brightcove_id) headers = {
'x-playback-session-id': '%s-%d' % (uuid.uuid4().hex, time.time() * 1000)
}
formats = self._extract_mpd_formats(
dash_url, video_id, mpd_id='dash', headers=headers, fatal=False)
formats.extend(self._extract_m3u8_formats(
dash_url.replace('.mpd', '.m3u8').replace('/DASH/', '/HLS/'),
video_id, 'mp4', m3u8_id='hls', headers=headers, fatal=False))
for f in formats:
f.setdefault('http_headers', {}).update(headers)
self._sort_formats(formats)
metadata = self._call_api(
'1.6', 'IN/DETAIL/' + video_id, video_id)['containers'][0]['metadata']
title = metadata['title']
episode = metadata.get('episodeTitle')
if episode and title != episode:
title += ' - ' + episode
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': content.get('posterURL'),
'description': metadata.get('longDescription') or metadata.get('shortDescription'),
'timestamp': int_or_none(metadata.get('creationDate'), 1000),
'duration': int_or_none(metadata.get('duration')),
'season_number': int_or_none(metadata.get('season')),
'episode': episode,
'episode_number': int_or_none(metadata.get('episodeNumber')),
'release_year': int_or_none(metadata.get('year')),
}

View File

@@ -7,17 +7,24 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
merge_dicts, merge_dicts,
orderedSet,
parse_duration, parse_duration,
parse_resolution, parse_resolution,
str_to_int, str_to_int,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin,
) )
class SpankBangIE(InfoExtractor): class SpankBangIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?spankbang\.com/(?P<id>[\da-z]+)/(?:video|play|embed)\b' _VALID_URL = r'''(?x)
https?://
(?:[^/]+\.)?spankbang\.com/
(?:
(?P<id>[\da-z]+)/(?:video|play|embed)\b|
[\da-z]+-(?P<id_2>[\da-z]+)/playlist/[^/?#&]+
)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://spankbang.com/3vvn/video/fantasy+solo', 'url': 'http://spankbang.com/3vvn/video/fantasy+solo',
'md5': '1cc433e1d6aa14bc376535b8679302f7', 'md5': '1cc433e1d6aa14bc376535b8679302f7',
@@ -57,10 +64,14 @@ class SpankBangIE(InfoExtractor):
}, { }, {
'url': 'https://spankbang.com/2y3td/embed/', 'url': 'https://spankbang.com/2y3td/embed/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://spankbang.com/2v7ik-7ecbgu/playlist/latina+booty',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') or mobj.group('id_2')
webpage = self._download_webpage( webpage = self._download_webpage(
url.replace('/%s/embed' % video_id, '/%s/video' % video_id), url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
video_id, headers={'Cookie': 'country=US'}) video_id, headers={'Cookie': 'country=US'})
@@ -155,30 +166,33 @@ class SpankBangIE(InfoExtractor):
class SpankBangPlaylistIE(InfoExtractor): class SpankBangPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?spankbang\.com/(?P<id>[\da-z]+)/playlist/[^/]+' _VALID_URL = r'https?://(?:[^/]+\.)?spankbang\.com/(?P<id>[\da-z]+)/playlist/(?P<display_id>[^/]+)'
_TEST = { _TEST = {
'url': 'https://spankbang.com/ug0k/playlist/big+ass+titties', 'url': 'https://spankbang.com/ug0k/playlist/big+ass+titties',
'info_dict': { 'info_dict': {
'id': 'ug0k', 'id': 'ug0k',
'title': 'Big Ass Titties', 'title': 'Big Ass Titties',
}, },
'playlist_mincount': 50, 'playlist_mincount': 40,
} }
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage( webpage = self._download_webpage(
url, playlist_id, headers={'Cookie': 'country=US; mobile=on'}) url, playlist_id, headers={'Cookie': 'country=US; mobile=on'})
entries = [self.url_result( entries = [self.url_result(
'https://spankbang.com/%s/video' % video_id, urljoin(url, mobj.group('path')),
ie=SpankBangIE.ie_key(), video_id=video_id) ie=SpankBangIE.ie_key(), video_id=mobj.group('id'))
for video_id in orderedSet(re.findall( for mobj in re.finditer(
r'<a[^>]+\bhref=["\']/?([\da-z]+)/play/', webpage))] r'<a[^>]+\bhref=(["\'])(?P<path>/?[\da-z]+-(?P<id>[\da-z]+)/playlist/%s(?:(?!\1).)*)\1'
% re.escape(display_id), webpage)]
title = self._html_search_regex( title = self._html_search_regex(
r'<h1>([^<]+)\s+playlist</h1>', webpage, 'playlist title', r'<h1>([^<]+)\s+playlist\s*<', webpage, 'playlist title',
fatal=False) fatal=False)
return self.playlist_result(entries, playlist_id, title) return self.playlist_result(entries, playlist_id, title)

View File

@@ -3,50 +3,62 @@ from __future__ import unicode_literals
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..utils import ( from ..utils import (
extract_attributes, int_or_none,
update_url_query,
smuggle_url, smuggle_url,
update_url_query,
) )
class SproutIE(AdobePassIE): class SproutIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?sproutonline\.com/watch/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?:sproutonline|universalkids)\.com/(?:watch|(?:[^/]+/)*videos)/(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'http://www.sproutonline.com/watch/cowboy-adventure', 'url': 'https://www.universalkids.com/shows/remy-and-boo/season/1/videos/robot-bike-race',
'md5': '74bf14128578d1e040c3ebc82088f45f',
'info_dict': { 'info_dict': {
'id': '9dexnwtmh8_X', 'id': 'bm0foJFaTKqb',
'ext': 'mp4', 'ext': 'mp4',
'title': 'A Cowboy Adventure', 'title': 'Robot Bike Race',
'description': 'Ruff-Ruff, Tweet and Dave get to be cowboys for the day at Six Cow Corral.', 'description': 'md5:436b1d97117cc437f54c383f4debc66d',
'timestamp': 1437758640, 'timestamp': 1606148940,
'upload_date': '20150724', 'upload_date': '20201123',
'uploader': 'NBCU-SPROUT-NEW', 'uploader': 'NBCU-MPAT',
} },
} 'params': {
'skip_download': True,
},
}, {
'url': 'http://www.sproutonline.com/watch/cowboy-adventure',
'only_matching': True,
}, {
'url': 'https://www.universalkids.com/watch/robot-bike-race',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) mpx_metadata = self._download_json(
video_component = self._search_regex( # http://nbcuunikidsprod.apps.nbcuni.com/networks/universalkids/content/videos/
r'(?s)(<div[^>]+data-component="video"[^>]*?>)', 'https://www.universalkids.com/_api/videos/' + display_id,
webpage, 'video component', default=None) display_id)['mpxMetadata']
if video_component: media_pid = mpx_metadata['mediaPid']
options = self._parse_json(extract_attributes( theplatform_url = 'https://link.theplatform.com/s/HNK2IC/' + media_pid
video_component)['data-options'], video_id) query = {
theplatform_url = options['video'] 'mbr': 'true',
query = { 'manifest': 'm3u',
'mbr': 'true', }
'manifest': 'm3u', if mpx_metadata.get('entitlement') == 'auth':
} query['auth'] = self._extract_mvpd_auth(url, media_pid, 'sprout', 'sprout')
if options.get('protected'): theplatform_url = smuggle_url(
query['auth'] = self._extract_mvpd_auth(url, options['pid'], 'sprout', 'sprout') update_url_query(theplatform_url, query), {
theplatform_url = smuggle_url(update_url_query( 'force_smil_url': True,
theplatform_url, query), {'force_smil_url': True}) 'geo_countries': self._GEO_COUNTRIES,
else: })
iframe = self._search_regex( return {
r'(<iframe[^>]+id="sproutVideoIframe"[^>]*?>)', '_type': 'url_transparent',
webpage, 'iframe') 'id': media_pid,
theplatform_url = extract_attributes(iframe)['src'] 'url': theplatform_url,
'series': mpx_metadata.get('seriesName'),
return self.url_result(theplatform_url, 'ThePlatform') 'season_number': int_or_none(mpx_metadata.get('seasonNumber')),
'episode_number': int_or_none(mpx_metadata.get('episodeNumber')),
'ie_key': 'ThePlatform',
}

View File

@@ -1,28 +1,74 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, clean_html,
clean_podcast_url,
ExtractorError,
int_or_none, int_or_none,
js_to_json, str_or_none,
unescapeHTML, try_get,
url_or_none,
) )
class StitcherIE(InfoExtractor): class StitcherBaseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?stitcher\.com/podcast/(?:[^/]+/)+e/(?:(?P<display_id>[^/#?&]+?)-)?(?P<id>\d+)(?:[/#?&]|$)' _VALID_URL_BASE = r'https?://(?:www\.)?stitcher\.com/(?:podcast|show)/'
def _call_api(self, path, video_id, query):
resp = self._download_json(
'https://api.prod.stitcher.com/' + path,
video_id, query=query)
error_massage = try_get(resp, lambda x: x['errors'][0]['message'])
if error_massage:
raise ExtractorError(error_massage, expected=True)
return resp['data']
def _extract_description(self, data):
return clean_html(data.get('html_description') or data.get('description'))
def _extract_audio_url(self, episode):
return url_or_none(episode.get('audio_url') or episode.get('guid'))
def _extract_show_info(self, show):
return {
'thumbnail': show.get('image_base_url'),
'series': show.get('title'),
}
def _extract_episode(self, episode, audio_url, show_info):
info = {
'id': compat_str(episode['id']),
'display_id': episode.get('slug'),
'title': episode['title'].strip(),
'description': self._extract_description(episode),
'duration': int_or_none(episode.get('duration')),
'url': clean_podcast_url(audio_url),
'vcodec': 'none',
'timestamp': int_or_none(episode.get('date_published')),
'season_number': int_or_none(episode.get('season')),
'season_id': str_or_none(episode.get('season_id')),
}
info.update(show_info)
return info
class StitcherIE(StitcherBaseIE):
_VALID_URL = StitcherBaseIE._VALID_URL_BASE + r'(?:[^/]+/)+e(?:pisode)?/(?:[^/#?&]+-)?(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true', 'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true',
'md5': '391dd4e021e6edeb7b8e68fbf2e9e940', 'md5': 'e9635098e0da10b21a0e2b85585530f6',
'info_dict': { 'info_dict': {
'id': '40789481', 'id': '40789481',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Machine Learning Mastery and Cancer Clusters', 'title': 'Machine Learning Mastery and Cancer Clusters',
'description': 'md5:55163197a44e915a14a1ac3a1de0f2d3', 'description': 'md5:547adb4081864be114ae3831b4c2b42f',
'duration': 1604, 'duration': 1604,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'upload_date': '20151008',
'timestamp': 1444285800,
'series': 'Talking Machines',
}, },
}, { }, {
'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true', 'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true',
@@ -38,6 +84,7 @@ class StitcherIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Page Not Found',
}, { }, {
# escaped title # escaped title
'url': 'http://www.stitcher.com/podcast/marketplace-on-stitcher/e/40910226?autoplay=true', 'url': 'http://www.stitcher.com/podcast/marketplace-on-stitcher/e/40910226?autoplay=true',
@@ -45,37 +92,53 @@ class StitcherIE(InfoExtractor):
}, { }, {
'url': 'http://www.stitcher.com/podcast/panoply/getting-in/e/episode-2a-how-many-extracurriculars-should-i-have-40876278?autoplay=true', 'url': 'http://www.stitcher.com/podcast/panoply/getting-in/e/episode-2a-how-many-extracurriculars-should-i-have-40876278?autoplay=true',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.stitcher.com/show/threedom/episode/circles-on-a-stick-200212584',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) audio_id = self._match_id(url)
audio_id = mobj.group('id') data = self._call_api(
display_id = mobj.group('display_id') or audio_id 'shows/episodes', audio_id, {'episode_ids': audio_id})
episode = data['episodes'][0]
audio_url = self._extract_audio_url(episode)
if not audio_url:
self.raise_login_required()
show = try_get(data, lambda x: x['shows'][0], dict) or {}
return self._extract_episode(
episode, audio_url, self._extract_show_info(show))
webpage = self._download_webpage(url, display_id)
episode = self._parse_json( class StitcherShowIE(StitcherBaseIE):
js_to_json(self._search_regex( _VALID_URL = StitcherBaseIE._VALID_URL_BASE + r'(?P<id>[^/#?&]+)/?(?:[?#&]|$)'
r'(?s)var\s+stitcher(?:Config)?\s*=\s*({.+?});\n', webpage, 'episode config')), _TESTS = [{
display_id)['config']['episode'] 'url': 'http://www.stitcher.com/podcast/the-talking-machines',
'info_dict': {
'id': 'the-talking-machines',
'title': 'Talking Machines',
'description': 'md5:831f0995e40f26c10231af39cf1ebf0b',
},
'playlist_mincount': 106,
}, {
'url': 'https://www.stitcher.com/show/the-talking-machines',
'only_matching': True,
}]
title = unescapeHTML(episode['title']) def _real_extract(self, url):
formats = [{ show_slug = self._match_id(url)
'url': episode[episode_key], data = self._call_api(
'ext': determine_ext(episode[episode_key]) or 'mp3', 'search/show/%s/allEpisodes' % show_slug, show_slug, {'count': 10000})
'vcodec': 'none', show = try_get(data, lambda x: x['shows'][0], dict) or {}
} for episode_key in ('episodeURL',) if episode.get(episode_key)] show_info = self._extract_show_info(show)
description = self._search_regex(
r'Episode Info:\s*</span>([^<]+)<', webpage, 'description', fatal=False)
duration = int_or_none(episode.get('duration'))
thumbnail = episode.get('episodeImage')
return { entries = []
'id': audio_id, for episode in (data.get('episodes') or []):
'display_id': display_id, audio_url = self._extract_audio_url(episode)
'title': title, if not audio_url:
'description': description, continue
'duration': duration, entries.append(self._extract_episode(episode, audio_url, show_info))
'thumbnail': thumbnail,
'formats': formats, return self.playlist_result(
} entries, show_slug, show.get('title'),
self._extract_description(show))

View File

@@ -2,25 +2,40 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..utils import (
from ..utils import unified_strdate int_or_none,
parse_iso8601,
str_or_none,
strip_or_none,
try_get,
urljoin,
)
class StreetVoiceIE(InfoExtractor): class StreetVoiceIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?streetvoice\.com/[^/]+/songs/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:.+?\.)?streetvoice\.com/[^/]+/songs/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://streetvoice.com/skippylu/songs/94440/', 'url': 'https://streetvoice.com/skippylu/songs/123688/',
'md5': '15974627fc01a29e492c98593c2fd472', 'md5': '0eb535970629a5195685355f3ed60bfd',
'info_dict': { 'info_dict': {
'id': '94440', 'id': '123688',
'ext': 'mp3', 'ext': 'mp3',
'title': '', 'title': '流浪',
'description': 'Crispy脆樂團 - 輸', 'description': 'md5:8eb0bfcc9dcd8aa82bd6efca66e3fea6',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 260, 'duration': 270,
'upload_date': '20091018', 'upload_date': '20100923',
'uploader': 'Crispy脆樂團', 'uploader': 'Crispy脆樂團',
'uploader_id': '627810', 'uploader_id': '627810',
'uploader_url': 're:^https?://streetvoice.com/skippylu/',
'timestamp': 1285261661,
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
'track': '流浪',
'track_id': '123688',
'album': '2010',
} }
}, { }, {
'url': 'http://tw.streetvoice.com/skippylu/songs/94440/', 'url': 'http://tw.streetvoice.com/skippylu/songs/94440/',
@@ -29,21 +44,57 @@ class StreetVoiceIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
song_id = self._match_id(url) song_id = self._match_id(url)
base_url = 'https://streetvoice.com/api/v4/song/%s/' % song_id
song = self._download_json( song = self._download_json(base_url, song_id, query={
'https://streetvoice.com/api/v1/public/song/%s/' % song_id, song_id, data=b'') 'fields': 'album,comments_count,created_at,id,image,length,likes_count,name,nickname,plays_count,profile,share_count,synopsis,user,username',
})
title = song['name'] title = song['name']
author = song['user']['nickname']
formats = []
for suffix, format_id in [('hls/file', 'hls'), ('file', 'http'), ('file/original', 'original')]:
f_url = (self._download_json(
base_url + suffix + '/', song_id,
'Downloading %s format URL' % format_id,
data=b'', fatal=False) or {}).get('file')
if not f_url:
continue
f = {
'ext': 'mp3',
'format_id': format_id,
'url': f_url,
'vcodec': 'none',
}
if format_id == 'hls':
f['protocol'] = 'm3u8_native'
abr = self._search_regex(r'\.mp3\.(\d+)k', f_url, 'bitrate', default=None)
if abr:
abr = int(abr)
f.update({
'abr': abr,
'tbr': abr,
})
formats.append(f)
user = song.get('user') or {}
username = user.get('username')
get_count = lambda x: int_or_none(song.get(x + '_count'))
return { return {
'id': song_id, 'id': song_id,
'url': song['file'], 'formats': formats,
'title': title, 'title': title,
'description': '%s - %s' % (author, title), 'description': strip_or_none(song.get('synopsis')),
'thumbnail': self._proto_relative_url(song.get('image'), 'http:'), 'thumbnail': song.get('image'),
'duration': song.get('length'), 'duration': int_or_none(song.get('length')),
'upload_date': unified_strdate(song.get('created_at')), 'timestamp': parse_iso8601(song.get('created_at')),
'uploader': author, 'uploader': try_get(user, lambda x: x['profile']['nickname']),
'uploader_id': compat_str(song['user']['id']), 'uploader_id': str_or_none(user.get('id')),
'uploader_url': urljoin(url, '/%s/' % username) if username else None,
'view_count': get_count('plays'),
'like_count': get_count('likes'),
'comment_count': get_count('comments'),
'repost_count': get_count('share'),
'track': title,
'track_id': song_id,
'album': try_get(song, lambda x: x['album']['name']),
} }

View File

@@ -8,13 +8,17 @@ from ..utils import (
compat_str, compat_str,
float_or_none, float_or_none,
int_or_none, int_or_none,
smuggle_url,
str_or_none,
try_get,
) )
class STVPlayerIE(InfoExtractor): class STVPlayerIE(InfoExtractor):
IE_NAME = 'stv:player' IE_NAME = 'stv:player'
_VALID_URL = r'https?://player\.stv\.tv/(?P<type>episode|video)/(?P<id>[a-z0-9]{4})' _VALID_URL = r'https?://player\.stv\.tv/(?P<type>episode|video)/(?P<id>[a-z0-9]{4})'
_TEST = { _TESTS = [{
# shortform
'url': 'https://player.stv.tv/video/4gwd/emmerdale/60-seconds-on-set-with-laura-norton/', 'url': 'https://player.stv.tv/video/4gwd/emmerdale/60-seconds-on-set-with-laura-norton/',
'md5': '5adf9439c31d554f8be0707c7abe7e0a', 'md5': '5adf9439c31d554f8be0707c7abe7e0a',
'info_dict': { 'info_dict': {
@@ -27,7 +31,11 @@ class STVPlayerIE(InfoExtractor):
'uploader_id': '1486976045', 'uploader_id': '1486976045',
}, },
'skip': 'this resource is unavailable outside of the UK', 'skip': 'this resource is unavailable outside of the UK',
} }, {
# episodes
'url': 'https://player.stv.tv/episode/4125/jennifer-saunders-memory-lane',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1486976045/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1486976045/default_default/index.html?videoId=%s'
_PTYPE_MAP = { _PTYPE_MAP = {
'episode': 'episodes', 'episode': 'episodes',
@@ -36,11 +44,31 @@ class STVPlayerIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
ptype, video_id = re.match(self._VALID_URL, url).groups() ptype, video_id = re.match(self._VALID_URL, url).groups()
resp = self._download_json(
'https://player.api.stv.tv/v1/%s/%s' % (self._PTYPE_MAP[ptype], video_id),
video_id)
result = resp['results'] webpage = self._download_webpage(url, video_id, fatal=False) or ''
props = (self._parse_json(self._search_regex(
r'<script[^>]+id="__NEXT_DATA__"[^>]*>({.+?})</script>',
webpage, 'next data', default='{}'), video_id,
fatal=False) or {}).get('props') or {}
player_api_cache = try_get(
props, lambda x: x['initialReduxState']['playerApiCache']) or {}
api_path, resp = None, {}
for k, v in player_api_cache.items():
if k.startswith('/episodes/') or k.startswith('/shortform/'):
api_path, resp = k, v
break
else:
episode_id = str_or_none(try_get(
props, lambda x: x['pageProps']['episodeId']))
api_path = '/%s/%s' % (self._PTYPE_MAP[ptype], episode_id or video_id)
result = resp.get('results')
if not result:
resp = self._download_json(
'https://player.api.stv.tv/v1' + api_path, video_id)
result = resp['results']
video = result['video'] video = result['video']
video_id = compat_str(video['id']) video_id = compat_str(video['id'])
@@ -57,7 +85,7 @@ class STVPlayerIE(InfoExtractor):
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id, 'id': video_id,
'url': self.BRIGHTCOVE_URL_TEMPLATE % video_id, 'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % video_id, {'geo_countries': ['GB']}),
'description': result.get('summary'), 'description': result.get('summary'),
'duration': float_or_none(video.get('length'), 1000), 'duration': float_or_none(video.get('length'), 1000),
'subtitles': subtitles, 'subtitles': subtitles,

View File

@@ -140,7 +140,7 @@ class TeachableIE(TeachableBaseIE):
@staticmethod @staticmethod
def _is_teachable(webpage): def _is_teachable(webpage):
return 'teachableTracker.linker:autoLink' in webpage and re.search( return 'teachableTracker.linker:autoLink' in webpage and re.search(
r'<link[^>]+href=["\']https?://process\.fs\.teachablecdn\.com', r'<link[^>]+href=["\']https?://(?:process\.fs|assets)\.teachablecdn\.com',
webpage) webpage)
@staticmethod @staticmethod

View File

@@ -5,14 +5,11 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext,
int_or_none, int_or_none,
str_or_none, str_or_none,
try_get, try_get,
urljoin,
) )
@@ -28,7 +25,7 @@ class TelecincoIE(InfoExtractor):
'description': 'md5:716caf5601e25c3c5ab6605b1ae71529', 'description': 'md5:716caf5601e25c3c5ab6605b1ae71529',
}, },
'playlist': [{ 'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7', 'md5': '7ee56d665cfd241c0e6d80fd175068b0',
'info_dict': { 'info_dict': {
'id': 'JEA5ijCnF6p5W08A1rNKn7', 'id': 'JEA5ijCnF6p5W08A1rNKn7',
'ext': 'mp4', 'ext': 'mp4',
@@ -38,7 +35,7 @@ class TelecincoIE(InfoExtractor):
}] }]
}, { }, {
'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html', 'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html',
'md5': '9468140ebc300fbb8b9d65dc6e5c4b43', 'md5': 'c86fe0d99e3bdb46b7950d38bf6ef12a',
'info_dict': { 'info_dict': {
'id': 'jn24Od1zGLG4XUZcnUnZB6', 'id': 'jn24Od1zGLG4XUZcnUnZB6',
'ext': 'mp4', 'ext': 'mp4',
@@ -48,7 +45,7 @@ class TelecincoIE(InfoExtractor):
}, },
}, { }, {
'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html', 'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html',
'md5': 'ae2dc6b7b50b2392076a51c0f70e01f6', 'md5': 'eddb50291df704ce23c74821b995bcac',
'info_dict': { 'info_dict': {
'id': 'aywerkD2Sv1vGNqq9b85Q2', 'id': 'aywerkD2Sv1vGNqq9b85Q2',
'ext': 'mp4', 'ext': 'mp4',
@@ -90,58 +87,24 @@ class TelecincoIE(InfoExtractor):
def _parse_content(self, content, url): def _parse_content(self, content, url):
video_id = content['dataMediaId'] video_id = content['dataMediaId']
if content.get('dataCmsId') == 'ooyala':
return self.url_result(
'ooyala:%s' % video_id, OoyalaIE.ie_key(), video_id)
config_url = urljoin(url, content['dataConfig'])
config = self._download_json( config = self._download_json(
config_url, video_id, 'Downloading config JSON') content['dataConfig'], video_id, 'Downloading config JSON')
title = config['info']['title'] title = config['info']['title']
services = config['services']
def mmc_url(mmc_type): caronte = self._download_json(services['caronte'], video_id)
return re.sub( stream = caronte['dls'][0]['stream']
r'/(?:flash|html5)\.json', '/%s.json' % mmc_type, headers = self.geo_verification_headers()
config['services']['mmc']) headers.update({
'Content-Type': 'application/json;charset=UTF-8',
duration = None 'Origin': re.match(r'https?://[^/]+', url).group(0),
formats = [] })
for mmc_type in ('flash', 'html5'): cdn = self._download_json(
mmc = self._download_json( caronte['cerbero'], video_id, data=json.dumps({
mmc_url(mmc_type), video_id, 'bbx': caronte['bbx'],
'Downloading %s mmc JSON' % mmc_type, fatal=False) 'gbx': self._download_json(services['gbx'], video_id)['gbx'],
if not mmc: }).encode(), headers=headers)['tokens']['1']['cdn']
continue formats = self._extract_m3u8_formats(
if not duration: stream + '?' + cdn, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
gcp = location.get('gcp')
ogn = location.get('ogn')
if None in (gat, gcp, ogn):
continue
token_data = {
'gcp': gcp,
'ogn': ogn,
'sta': 0,
}
media = self._download_json(
gat, video_id, data=json.dumps(token_data).encode('utf-8'),
headers={
'Content-Type': 'application/json;charset=utf-8',
'Referer': url,
}, fatal=False) or {}
stream = media.get('stream') or media.get('file')
if not stream:
continue
ext = determine_ext(stream)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return { return {
@@ -149,7 +112,7 @@ class TelecincoIE(InfoExtractor):
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'thumbnail': content.get('dataPoster') or config.get('poster', {}).get('imageUrl'), 'thumbnail': content.get('dataPoster') or config.get('poster', {}).get('imageUrl'),
'duration': duration, 'duration': int_or_none(content.get('dataDuration')),
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -3,9 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
HEADRequest,
parse_age_limit, parse_age_limit,
parse_iso8601, parse_iso8601,
smuggle_url, # smuggle_url,
) )
@@ -24,14 +25,16 @@ class TenPlayIE(InfoExtractor):
'uploader_id': '2199827728001', 'uploader_id': '2199827728001',
}, },
'params': { 'params': {
'format': 'bestvideo', # 'format': 'bestvideo',
'skip_download': True, 'skip_download': True,
} }
}, { }, {
'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc', 'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
'only_matching': True, 'only_matching': True,
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s' # BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
_GEO_BYPASS = False
_FASTLY_URL_TEMPL = 'https://10-selector.global.ssl.fastly.net/s/kYEXFC/media/%s?mbr=true&manifest=m3u&format=redirect'
def _real_extract(self, url): def _real_extract(self, url):
content_id = self._match_id(url) content_id = self._match_id(url)
@@ -40,19 +43,28 @@ class TenPlayIE(InfoExtractor):
video = data.get('video') or {} video = data.get('video') or {}
metadata = data.get('metaData') or {} metadata = data.get('metaData') or {}
brightcove_id = video.get('videoId') or metadata['showContentVideoId'] brightcove_id = video.get('videoId') or metadata['showContentVideoId']
brightcove_url = smuggle_url( # brightcove_url = smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, # self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': ['AU']}) # {'geo_countries': ['AU']})
m3u8_url = self._request_webpage(HEADRequest(
self._FASTLY_URL_TEMPL % brightcove_id), brightcove_id).geturl()
if '10play-not-in-oz' in m3u8_url:
self.raise_geo_restricted(countries=['AU'])
formats = self._extract_m3u8_formats(m3u8_url, brightcove_id, 'mp4')
self._sort_formats(formats)
return { return {
'_type': 'url_transparent', # '_type': 'url_transparent',
'url': brightcove_url, # 'url': brightcove_url,
'id': content_id, 'formats': formats,
'title': video.get('title') or metadata.get('pageContentName') or metadata.get('showContentName'), 'id': brightcove_id,
'title': video.get('title') or metadata.get('pageContentName') or metadata['showContentName'],
'description': video.get('description'), 'description': video.get('description'),
'age_limit': parse_age_limit(video.get('showRatingClassification') or metadata.get('showProgramClassification')), 'age_limit': parse_age_limit(video.get('showRatingClassification') or metadata.get('showProgramClassification')),
'series': metadata.get('showName'), 'series': metadata.get('showName'),
'season': metadata.get('showContentSeason'), 'season': metadata.get('showContentSeason'),
'timestamp': parse_iso8601(metadata.get('contentPublishDate') or metadata.get('pageContentPublishDate')), 'timestamp': parse_iso8601(metadata.get('contentPublishDate') or metadata.get('pageContentPublishDate')),
'ie_key': 'BrightcoveNew', 'thumbnail': video.get('poster'),
'uploader_id': '2199827728001',
# 'ie_key': 'BrightcoveNew',
} }

View File

@@ -234,6 +234,9 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
provider_id = mobj.group('provider_id') provider_id = mobj.group('provider_id')

View File

@@ -1,18 +1,22 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
parse_duration, parse_duration,
parse_iso8601,
) )
class TheWeatherChannelIE(ThePlatformIE): class TheWeatherChannelIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?weather\.com/(?:[^/]+/)*video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?weather\.com(?P<asset_name>(?:/(?P<locale>[a-z]{2}-[A-Z]{2}))?/(?:[^/]+/)*video/(?P<id>[^/?#]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://weather.com/series/great-outdoors/video/ice-climber-is-in-for-a-shock', 'url': 'https://weather.com/series/great-outdoors/video/ice-climber-is-in-for-a-shock',
'md5': 'ab924ac9574e79689c24c6b95e957def', 'md5': 'c4cbe74c9c17c5676b704b950b73dd92',
'info_dict': { 'info_dict': {
'id': 'cc82397e-cc3f-4d11-9390-a785add090e8', 'id': 'cc82397e-cc3f-4d11-9390-a785add090e8',
'ext': 'mp4', 'ext': 'mp4',
@@ -20,18 +24,33 @@ class TheWeatherChannelIE(ThePlatformIE):
'description': 'md5:55606ce1378d4c72e6545e160c9d9695', 'description': 'md5:55606ce1378d4c72e6545e160c9d9695',
'uploader': 'TWC - Digital (No Distro)', 'uploader': 'TWC - Digital (No Distro)',
'uploader_id': '6ccd5455-16bb-46f2-9c57-ff858bb9f62c', 'uploader_id': '6ccd5455-16bb-46f2-9c57-ff858bb9f62c',
'upload_date': '20160720',
'timestamp': 1469018835,
} }
}, {
'url': 'https://weather.com/en-CA/international/videos/video/unidentified-object-falls-from-sky-in-india',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) asset_name, locale, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) if not locale:
drupal_settings = self._parse_json(self._search_regex( locale = 'en-US'
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', video_data = list(self._download_json(
webpage, 'drupal settings'), display_id) 'https://weather.com/api/v1/p/redux-dal', display_id, data=json.dumps([{
video_id = drupal_settings['twc']['contexts']['node']['uuid'] 'name': 'getCMSAssetsUrlConfig',
video_data = self._download_json( 'params': {
'https://dsx.weather.com/cms/v4/asset-collection/en_US/' + video_id, video_id) 'language': locale.replace('-', '_'),
'query': {
'assetName': {
'$in': asset_name,
},
},
}
}]).encode(), headers={
'Content-Type': 'application/json',
})['dal']['getCMSAssetsUrlConfig'].values())[0]['data'][0]
video_id = video_data['id']
seo_meta = video_data.get('seometa', {}) seo_meta = video_data.get('seometa', {})
title = video_data.get('title') or seo_meta['title'] title = video_data.get('title') or seo_meta['title']
@@ -66,6 +85,8 @@ class TheWeatherChannelIE(ThePlatformIE):
}) })
self._sort_formats(formats) self._sort_formats(formats)
cc_url = video_data.get('cc_url')
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
@@ -74,6 +95,8 @@ class TheWeatherChannelIE(ThePlatformIE):
'duration': parse_duration(video_data.get('duration')), 'duration': parse_duration(video_data.get('duration')),
'uploader': video_data.get('providername'), 'uploader': video_data.get('providername'),
'uploader_id': video_data.get('providerid'), 'uploader_id': video_data.get('providerid'),
'timestamp': parse_iso8601(video_data.get('publishdate')),
'subtitles': {locale[:2]: [{'url': cc_url}]} if cc_url else None,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'formats': formats, 'formats': formats,
} }

View File

@@ -200,7 +200,7 @@ class ToggleIE(InfoExtractor):
class MeWatchIE(InfoExtractor): class MeWatchIE(InfoExtractor):
IE_NAME = 'mewatch' IE_NAME = 'mewatch'
_VALID_URL = r'https?://(?:www\.)?mewatch\.sg/watch/[0-9a-zA-Z-]+-(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:(?:www|live)\.)?mewatch\.sg/watch/[^/?#&]+-(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.mewatch.sg/watch/Recipe-Of-Life-E1-179371', 'url': 'https://www.mewatch.sg/watch/Recipe-Of-Life-E1-179371',
'info_dict': { 'info_dict': {
@@ -214,6 +214,15 @@ class MeWatchIE(InfoExtractor):
'params': { 'params': {
'skip_download': 'm3u8 download', 'skip_download': 'm3u8 download',
}, },
}, {
'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-搜密。打卡。小红点-S2-E1-176232',
'only_matching': True,
}, {
'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-%E6%90%9C%E5%AF%86%E3%80%82%E6%89%93%E5%8D%A1%E3%80%82%E5%B0%8F%E7%BA%A2%E7%82%B9-S2-E1-176232',
'only_matching': True,
}, {
'url': 'https://live.mewatch.sg/watch/Recipe-Of-Life-E41-189759',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -251,10 +251,10 @@ class TwitterIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '700207533655363584', 'id': '700207533655363584',
'ext': 'mp4', 'ext': 'mp4',
'title': 'simon vetugo - BEAT PROD: @suhmeduh #Damndaniel', 'title': 'simon vertugo - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ', 'description': 'BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'uploader': 'simon vetugo', 'uploader': 'simon vertugo',
'uploader_id': 'simonvertugo', 'uploader_id': 'simonvertugo',
'duration': 30.0, 'duration': 30.0,
'timestamp': 1455777459, 'timestamp': 1455777459,
@@ -312,6 +312,7 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1492000653, 'timestamp': 1492000653,
'upload_date': '20170412', 'upload_date': '20170412',
}, },
'skip': 'Account suspended',
}, { }, {
'url': 'https://twitter.com/i/web/status/910031516746514432', 'url': 'https://twitter.com/i/web/status/910031516746514432',
'info_dict': { 'info_dict': {
@@ -380,6 +381,14 @@ class TwitterIE(TwitterBaseIE):
# promo_video_website card # promo_video_website card
'url': 'https://twitter.com/GunB1g/status/1163218564784017422', 'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
'only_matching': True, 'only_matching': True,
}, {
# promo_video_convo card
'url': 'https://twitter.com/poco_dandy/status/1047395834013384704',
'only_matching': True,
}, {
# appplayer card
'url': 'https://twitter.com/poco_dandy/status/1150646424461176832',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -462,7 +471,30 @@ class TwitterIE(TwitterBaseIE):
return try_get(o, lambda x: x[x['type'].lower() + '_value']) return try_get(o, lambda x: x[x['type'].lower() + '_value'])
card_name = card['name'].split(':')[-1] card_name = card['name'].split(':')[-1]
if card_name in ('amplify', 'promo_video_website'): if card_name == 'player':
info.update({
'_type': 'url',
'url': get_binding_value('player_url'),
})
elif card_name == 'periscope_broadcast':
info.update({
'_type': 'url',
'url': get_binding_value('url') or get_binding_value('player_url'),
'ie_key': PeriscopeIE.ie_key(),
})
elif card_name == 'broadcast':
info.update({
'_type': 'url',
'url': get_binding_value('broadcast_url'),
'ie_key': TwitterBroadcastIE.ie_key(),
})
elif card_name == 'summary':
info.update({
'_type': 'url',
'url': get_binding_value('card_url'),
})
# amplify, promo_video_website, promo_video_convo, appplayer, ...
else:
is_amplify = card_name == 'amplify' is_amplify = card_name == 'amplify'
vmap_url = get_binding_value('amplify_url_vmap') if is_amplify else get_binding_value('player_stream_url') vmap_url = get_binding_value('amplify_url_vmap') if is_amplify else get_binding_value('player_stream_url')
content_id = get_binding_value('%s_content_id' % (card_name if is_amplify else 'player')) content_id = get_binding_value('%s_content_id' % (card_name if is_amplify else 'player'))
@@ -488,25 +520,6 @@ class TwitterIE(TwitterBaseIE):
'duration': int_or_none(get_binding_value( 'duration': int_or_none(get_binding_value(
'content_duration_seconds')), 'content_duration_seconds')),
}) })
elif card_name == 'player':
info.update({
'_type': 'url',
'url': get_binding_value('player_url'),
})
elif card_name == 'periscope_broadcast':
info.update({
'_type': 'url',
'url': get_binding_value('url') or get_binding_value('player_url'),
'ie_key': PeriscopeIE.ie_key(),
})
elif card_name == 'broadcast':
info.update({
'_type': 'url',
'url': get_binding_value('broadcast_url'),
'ie_key': TwitterBroadcastIE.ie_key(),
})
else:
raise ExtractorError('Unsupported Twitter Card.')
else: else:
expanded_url = try_get(status, lambda x: x['entities']['urls'][0]['expanded_url']) expanded_url = try_get(status, lambda x: x['entities']['urls'][0]['expanded_url'])
if not expanded_url: if not expanded_url:

View File

@@ -5,10 +5,9 @@ from .common import InfoExtractor
class UKTVPlayIE(InfoExtractor): class UKTVPlayIE(InfoExtractor):
_VALID_URL = r'https?://uktvplay\.uktv\.co\.uk/.+?\?.*?\bvideo=(?P<id>\d+)' _VALID_URL = r'https?://uktvplay\.uktv\.co\.uk/(?:.+?\?.*?\bvideo=|([^/]+/)*watch-online/)(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'https://uktvplay.uktv.co.uk/shows/world-at-war/c/200/watch-online/?video=2117008346001', 'url': 'https://uktvplay.uktv.co.uk/shows/world-at-war/c/200/watch-online/?video=2117008346001',
'md5': '',
'info_dict': { 'info_dict': {
'id': '2117008346001', 'id': '2117008346001',
'ext': 'mp4', 'ext': 'mp4',
@@ -23,7 +22,11 @@ class UKTVPlayIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Failed to download MPD manifest'] 'expected_warnings': ['Failed to download MPD manifest']
} }, {
'url': 'https://uktvplay.uktv.co.uk/shows/africa/watch-online/5983349675001',
'only_matching': True,
}]
# BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/1242911124001/OrCyvJ2gyL_default/index.html?videoId=%s'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911124001/H1xnMOqP_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911124001/H1xnMOqP_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1119,6 +1119,12 @@ class VHXEmbedIE(VimeoBaseInfoExtractor):
IE_NAME = 'vhx:embed' IE_NAME = 'vhx:embed'
_VALID_URL = r'https?://embed\.vhx\.tv/videos/(?P<id>\d+)' _VALID_URL = r'https?://embed\.vhx\.tv/videos/(?P<id>\d+)'
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src="(https?://embed\.vhx\.tv/videos/\d+[^"]*)"', webpage)
return unescapeHTML(mobj.group(1)) if mobj else None
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@@ -1127,5 +1133,6 @@ class VHXEmbedIE(VimeoBaseInfoExtractor):
'ott data'), video_id, js_to_json)['config_url'] 'ott data'), video_id, js_to_json)['config_url']
config = self._download_json(config_url, video_id) config = self._download_json(config_url, video_id)
info = self._parse_config(config, video_id) info = self._parse_config(config, video_id)
info['id'] = video_id
self._vimeo_sort_formats(info['formats']) self._vimeo_sort_formats(info['formats'])
return info return info

View File

@@ -12,7 +12,8 @@ from ..utils import (
class VVVVIDIE(InfoExtractor): class VVVVIDIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vvvvid\.it/(?:#!)?(?:show|anime|film|series)/(?P<show_id>\d+)/[^/]+/(?P<season_id>\d+)/(?P<id>[0-9]+)' _VALID_URL_BASE = r'https?://(?:www\.)?vvvvid\.it/(?:#!)?(?:show|anime|film|series)/'
_VALID_URL = r'%s(?P<show_id>\d+)/[^/]+/(?P<season_id>\d+)/(?P<id>[0-9]+)' % _VALID_URL_BASE
_TESTS = [{ _TESTS = [{
# video_type == 'video/vvvvid' # video_type == 'video/vvvvid'
'url': 'https://www.vvvvid.it/#!show/434/perche-dovrei-guardarlo-di-dario-moccia/437/489048/ping-pong', 'url': 'https://www.vvvvid.it/#!show/434/perche-dovrei-guardarlo-di-dario-moccia/437/489048/ping-pong',
@@ -21,6 +22,15 @@ class VVVVIDIE(InfoExtractor):
'id': '489048', 'id': '489048',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ping Pong', 'title': 'Ping Pong',
'duration': 239,
'series': '"Perché dovrei guardarlo?" di Dario Moccia',
'season_id': '437',
'episode': 'Ping Pong',
'episode_number': 1,
'episode_id': '3334',
'view_count': int,
'like_count': int,
'repost_count': int,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -37,6 +47,9 @@ class VVVVIDIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://www.vvvvid.it/show/434/perche-dovrei-guardarlo-di-dario-moccia/437/489048',
'only_matching': True
}] }]
_conn_id = None _conn_id = None
@@ -45,20 +58,35 @@ class VVVVIDIE(InfoExtractor):
'https://www.vvvvid.it/user/login', 'https://www.vvvvid.it/user/login',
None, headers=self.geo_verification_headers())['data']['conn_id'] None, headers=self.geo_verification_headers())['data']['conn_id']
def _real_extract(self, url): def _download_info(self, show_id, path, video_id, fatal=True):
show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
response = self._download_json( response = self._download_json(
'https://www.vvvvid.it/vvvvid/ondemand/%s/season/%s' % (show_id, season_id), 'https://www.vvvvid.it/vvvvid/ondemand/%s/%s' % (show_id, path),
video_id, headers=self.geo_verification_headers(), query={ video_id, headers=self.geo_verification_headers(), query={
'conn_id': self._conn_id, 'conn_id': self._conn_id,
}) }, fatal=fatal)
if response['result'] == 'error': if not (response or fatal):
return
if response.get('result') == 'error':
raise ExtractorError('%s said: %s' % ( raise ExtractorError('%s said: %s' % (
self.IE_NAME, response['message']), expected=True) self.IE_NAME, response['message']), expected=True)
return response['data']
def _extract_common_video_info(self, video_data):
return {
'thumbnail': video_data.get('thumbnail'),
'episode_id': str_or_none(video_data.get('id')),
}
def _real_extract(self, url):
show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
response = self._download_info(
show_id, 'season/%s' % season_id, video_id)
vid = int(video_id) vid = int(video_id)
video_data = list(filter( video_data = list(filter(
lambda episode: episode.get('video_id') == vid, response['data']))[0] lambda episode: episode.get('video_id') == vid, response))[0]
title = video_data['title']
formats = [] formats = []
# vvvvid embed_info decryption algorithm is reverse engineered from function $ds(h) at vvvvid.js # vvvvid embed_info decryption algorithm is reverse engineered from function $ds(h) at vvvvid.js
@@ -115,6 +143,17 @@ class VVVVIDIE(InfoExtractor):
return d return d
info = {}
def metadata_from_url(r_url):
if not info and r_url:
mobj = re.search(r'_(?:S(\d+))?Ep(\d+)', r_url)
if mobj:
info['episode_number'] = int(mobj.group(2))
season_number = mobj.group(1)
if season_number:
info['season_number'] = int(season_number)
for quality in ('_sd', ''): for quality in ('_sd', ''):
embed_code = video_data.get('embed_info' + quality) embed_code = video_data.get('embed_info' + quality)
if not embed_code: if not embed_code:
@@ -122,7 +161,6 @@ class VVVVIDIE(InfoExtractor):
embed_code = ds(embed_code) embed_code = ds(embed_code)
video_type = video_data.get('video_type') video_type = video_data.get('video_type')
if video_type in ('video/rcs', 'video/kenc'): if video_type in ('video/rcs', 'video/kenc'):
embed_code = re.sub(r'https?://([^/]+)/z/', r'https://\1/i/', embed_code).replace('/manifest.f4m', '/master.m3u8')
if video_type == 'video/kenc': if video_type == 'video/kenc':
kenc = self._download_json( kenc = self._download_json(
'https://www.vvvvid.it/kenc', video_id, query={ 'https://www.vvvvid.it/kenc', video_id, query={
@@ -133,26 +171,75 @@ class VVVVIDIE(InfoExtractor):
kenc_message = kenc.get('message') kenc_message = kenc.get('message')
if kenc_message: if kenc_message:
embed_code += '?' + ds(kenc_message) embed_code += '?' + ds(kenc_message)
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_akamai_formats(embed_code, video_id))
embed_code, video_id, 'mp4',
m3u8_id='hls', fatal=False))
else: else:
formats.extend(self._extract_wowza_formats( formats.extend(self._extract_wowza_formats(
'http://sb.top-ix.org/videomg/_definst_/mp4:%s/playlist.m3u8' % embed_code, video_id)) 'http://sb.top-ix.org/videomg/_definst_/mp4:%s/playlist.m3u8' % embed_code, video_id))
metadata_from_url(embed_code)
self._sort_formats(formats) self._sort_formats(formats)
return { metadata_from_url(video_data.get('thumbnail'))
info.update(self._extract_common_video_info(video_data))
info.update({
'id': video_id, 'id': video_id,
'title': video_data['title'], 'title': title,
'formats': formats, 'formats': formats,
'thumbnail': video_data.get('thumbnail'),
'duration': int_or_none(video_data.get('length')), 'duration': int_or_none(video_data.get('length')),
'series': video_data.get('show_title'), 'series': video_data.get('show_title'),
'season_id': season_id, 'season_id': season_id,
'season_number': video_data.get('season_number'), 'episode': title,
'episode_id': str_or_none(video_data.get('id')),
'episode_number': int_or_none(video_data.get('number')),
'episode_title': video_data['title'],
'view_count': int_or_none(video_data.get('views')), 'view_count': int_or_none(video_data.get('views')),
'like_count': int_or_none(video_data.get('video_likes')), 'like_count': int_or_none(video_data.get('video_likes')),
} 'repost_count': int_or_none(video_data.get('video_shares')),
})
return info
class VVVVIDShowIE(VVVVIDIE):
_VALID_URL = r'(?P<base_url>%s(?P<id>\d+)(?:/(?P<show_title>[^/?&#]+))?)/?(?:[?#&]|$)' % VVVVIDIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.vvvvid.it/show/156/psyco-pass',
'info_dict': {
'id': '156',
'title': 'Psycho-Pass',
'description': 'md5:94d572c0bd85894b193b8aebc9a3a806',
},
'playlist_count': 46,
}, {
'url': 'https://www.vvvvid.it/show/156',
'only_matching': True,
}]
def _real_extract(self, url):
base_url, show_id, show_title = re.match(self._VALID_URL, url).groups()
seasons = self._download_info(
show_id, 'seasons/', show_title)
show_info = self._download_info(
show_id, 'info/', show_title, fatal=False)
entries = []
for season in (seasons or []):
episodes = season.get('episodes') or []
for episode in episodes:
if episode.get('playable') is False:
continue
season_id = str_or_none(episode.get('season_id'))
video_id = str_or_none(episode.get('video_id'))
if not (season_id and video_id):
continue
info = self._extract_common_video_info(episode)
info.update({
'_type': 'url',
'ie_key': VVVVIDIE.ie_key(),
'url': '/'.join([base_url, season_id, video_id]),
'title': episode.get('title'),
'description': episode.get('description'),
'season_id': season_id,
})
entries.append(info)
return self.playlist_result(
entries, show_id, show_info.get('title'), show_info.get('description'))

View File

@@ -4,17 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
int_or_none,
strip_jsonp,
)
class WashingtonPostIE(InfoExtractor): class WashingtonPostIE(InfoExtractor):
IE_NAME = 'washingtonpost' IE_NAME = 'washingtonpost'
_VALID_URL = r'(?:washingtonpost:|https?://(?:www\.)?washingtonpost\.com/video/(?:[^/]+/)*)(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})' _VALID_URL = r'(?:washingtonpost:|https?://(?:www\.)?washingtonpost\.com/(?:video|posttv)/(?:[^/]+/)*)(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_EMBED_URL = r'https?://(?:www\.)?washingtonpost\.com/video/c/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}' _EMBED_URL = r'https?://(?:www\.)?washingtonpost\.com/video/c/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
_TEST = { _TESTS = [{
'url': 'https://www.washingtonpost.com/video/c/video/480ba4ee-1ec7-11e6-82c2-a7dcb313287d', 'url': 'https://www.washingtonpost.com/video/c/video/480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
'md5': '6f537e1334b714eb15f9563bd4b9cdfa', 'md5': '6f537e1334b714eb15f9563bd4b9cdfa',
'info_dict': { 'info_dict': {
@@ -23,10 +19,15 @@ class WashingtonPostIE(InfoExtractor):
'title': 'Egypt finds belongings, debris from plane crash', 'title': 'Egypt finds belongings, debris from plane crash',
'description': 'md5:a17ceee432f215a5371388c1f680bd86', 'description': 'md5:a17ceee432f215a5371388c1f680bd86',
'upload_date': '20160520', 'upload_date': '20160520',
'uploader': 'Reuters', 'timestamp': 1463775187,
'timestamp': 1463778452,
}, },
} }, {
'url': 'https://www.washingtonpost.com/video/world/egypt-finds-belongings-debris-from-plane-crash/2016/05/20/480ba4ee-1ec7-11e6-82c2-a7dcb313287d_video.html',
'only_matching': True,
}, {
'url': 'https://www.washingtonpost.com/posttv/world/iraq-to-track-down-antiquities-after-islamic-state-museum-rampage/2015/02/28/7c57e916-bf86-11e4-9dfb-03366e719af8_video.html',
'only_matching': True,
}]
@classmethod @classmethod
def _extract_urls(cls, webpage): def _extract_urls(cls, webpage):
@@ -35,73 +36,8 @@ class WashingtonPostIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( return self.url_result(
'http://www.washingtonpost.com/posttv/c/videojson/%s?resType=jsonp' % video_id, 'arcpublishing:wapo:' + video_id, 'ArcPublishing', video_id)
video_id, transform_source=strip_jsonp)[0]['contentConfig']
title = video_data['title']
urls = []
formats = []
for s in video_data.get('streams', []):
s_url = s.get('url')
if not s_url or s_url in urls:
continue
urls.append(s_url)
video_type = s.get('type')
if video_type == 'smil':
continue
elif video_type in ('ts', 'hls') and ('_master.m3u8' in s_url or '_mobile.m3u8' in s_url):
m3u8_formats = self._extract_m3u8_formats(
s_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
for m3u8_format in m3u8_formats:
width = m3u8_format.get('width')
if not width:
continue
vbr = self._search_regex(
r'%d_%d_(\d+)' % (width, m3u8_format['height']), m3u8_format['url'], 'vbr', default=None)
if vbr:
m3u8_format.update({
'vbr': int_or_none(vbr),
})
formats.extend(m3u8_formats)
else:
width = int_or_none(s.get('width'))
vbr = int_or_none(s.get('bitrate'))
has_width = width != 0
formats.append({
'format_id': (
'%s-%d-%d' % (video_type, width, vbr)
if width
else video_type),
'vbr': vbr if has_width else None,
'width': width,
'height': int_or_none(s.get('height')),
'acodec': s.get('audioCodec'),
'vcodec': s.get('videoCodec') if has_width else 'none',
'filesize': int_or_none(s.get('fileSize')),
'url': s_url,
'ext': 'mp4',
'protocol': 'm3u8_native' if video_type in ('ts', 'hls') else None,
})
source_media_url = video_data.get('sourceMediaURL')
if source_media_url:
formats.append({
'format_id': 'source_media',
'url': source_media_url,
})
self._sort_formats(
formats, ('width', 'height', 'vbr', 'filesize', 'tbr', 'format_id'))
return {
'id': video_id,
'title': title,
'description': video_data.get('blurb'),
'uploader': video_data.get('credits', {}).get('source'),
'formats': formats,
'duration': int_or_none(video_data.get('videoDuration'), 100),
'timestamp': int_or_none(
video_data.get('dateConfig', {}).get('dateFirstPublished'), 1000),
}
class WashingtonPostArticleIE(InfoExtractor): class WashingtonPostArticleIE(InfoExtractor):
@@ -121,9 +57,8 @@ class WashingtonPostArticleIE(InfoExtractor):
'title': 'Breaking Points: The Paper Mine', 'title': 'Breaking Points: The Paper Mine',
'duration': 1290, 'duration': 1290,
'description': 'Overly complicated paper pushing is nothing new to government bureaucracy. But the way federal retirement applications are filed may be the most outdated. David Fahrenthold explains.', 'description': 'Overly complicated paper pushing is nothing new to government bureaucracy. But the way federal retirement applications are filed may be the most outdated. David Fahrenthold explains.',
'uploader': 'The Washington Post', 'timestamp': 1395440416,
'timestamp': 1395527908, 'upload_date': '20140321',
'upload_date': '20140322',
}, },
}, { }, {
'md5': '1fff6a689d8770966df78c8cb6c8c17c', 'md5': '1fff6a689d8770966df78c8cb6c8c17c',
@@ -133,9 +68,8 @@ class WashingtonPostArticleIE(InfoExtractor):
'title': 'The town bureaucracy sustains', 'title': 'The town bureaucracy sustains',
'description': 'Underneath the friendly town of Boyers is a sea of government paperwork. In a disused limestone mine, hundreds of locals now track, file and process retirement applications for the federal government. We set out to find out what it\'s like to do paperwork 230 feet underground.', 'description': 'Underneath the friendly town of Boyers is a sea of government paperwork. In a disused limestone mine, hundreds of locals now track, file and process retirement applications for the federal government. We set out to find out what it\'s like to do paperwork 230 feet underground.',
'duration': 2220, 'duration': 2220,
'timestamp': 1395528005, 'timestamp': 1395441819,
'upload_date': '20140322', 'upload_date': '20140321',
'uploader': 'The Washington Post',
}, },
}], }],
}, { }, {
@@ -151,8 +85,7 @@ class WashingtonPostArticleIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'description': 'Washington Post transportation reporter Ashley Halsey III explains why a plane\'s black box needs to be recovered from a crash site instead of having its information streamed in real time throughout the flight.', 'description': 'Washington Post transportation reporter Ashley Halsey III explains why a plane\'s black box needs to be recovered from a crash site instead of having its information streamed in real time throughout the flight.',
'upload_date': '20141230', 'upload_date': '20141230',
'uploader': 'The Washington Post', 'timestamp': 1419972442,
'timestamp': 1419974765,
'title': 'Why black boxes dont transmit data in real time', 'title': 'Why black boxes dont transmit data in real time',
} }
}] }]

View File

@@ -5,79 +5,34 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none,
float_or_none, float_or_none,
int_or_none,
try_get,
unescapeHTML, unescapeHTML,
) )
class WistiaIE(InfoExtractor): class WistiaBaseIE(InfoExtractor):
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})' _VALID_ID_REGEX = r'(?P<id>[a-z0-9]{10})'
_VALID_URL_BASE = r'https?://(?:fast\.)?wistia\.(?:net|com)/embed/'
_EMBED_BASE_URL = 'http://fast.wistia.com/embed/' _EMBED_BASE_URL = 'http://fast.wistia.com/embed/'
_TESTS = [{ def _download_embed_config(self, config_type, config_id, referer):
'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt', base_url = self._EMBED_BASE_URL + '%ss/%s' % (config_type, config_id)
'md5': 'cafeb56ec0c53c18c97405eecb3133df', embed_config = self._download_json(
'info_dict': { base_url + '.json', config_id, headers={
'id': 'sh7fpupwlt', 'Referer': referer if referer.startswith('http') else base_url, # Some videos require this.
'ext': 'mov',
'title': 'Being Resourceful',
'description': 'a Clients From Hell Video Series video from worldwidewebhosting',
'upload_date': '20131204',
'timestamp': 1386185018,
'duration': 117,
},
}, {
'url': 'wistia:sh7fpupwlt',
'only_matching': True,
}, {
# with hls video
'url': 'wistia:807fafadvk',
'only_matching': True,
}, {
'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
'only_matching': True,
}, {
'url': 'http://fast.wistia.net/embed/medias/sh7fpupwlt.json',
'only_matching': True,
}]
# https://wistia.com/support/embed-and-share/video-on-your-website
@staticmethod
def _extract_url(webpage):
urls = WistiaIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _extract_urls(webpage):
urls = []
for match in re.finditer(
r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage):
urls.append(unescapeHTML(match.group('url')))
for match in re.finditer(
r'''(?sx)
<div[^>]+class=(["'])(?:(?!\1).)*?\bwistia_async_(?P<id>[a-z0-9]{10})\b(?:(?!\1).)*?\1
''', webpage):
urls.append('wistia:%s' % match.group('id'))
for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):
urls.append('wistia:%s' % match.group('id'))
return urls
def _real_extract(self, url):
video_id = self._match_id(url)
data_json = self._download_json(
self._EMBED_BASE_URL + 'medias/%s.json' % video_id, video_id,
# Some videos require this.
headers={
'Referer': url if url.startswith('http') else self._EMBED_BASE_URL + 'iframe/' + video_id,
}) })
if data_json.get('error'): if isinstance(embed_config, dict) and embed_config.get('error'):
raise ExtractorError( raise ExtractorError(
'Error while getting the playlist', expected=True) 'Error while getting the playlist', expected=True)
data = data_json['media'] return embed_config
def _extract_media(self, embed_config):
data = embed_config['media']
video_id = data['hashedId']
title = data['name'] title = data['name']
formats = [] formats = []
@@ -160,3 +115,85 @@ class WistiaIE(InfoExtractor):
'timestamp': int_or_none(data.get('createdAt')), 'timestamp': int_or_none(data.get('createdAt')),
'subtitles': subtitles, 'subtitles': subtitles,
} }
class WistiaIE(WistiaBaseIE):
_VALID_URL = r'(?:wistia:|%s(?:iframe|medias)/)%s' % (WistiaBaseIE._VALID_URL_BASE, WistiaBaseIE._VALID_ID_REGEX)
_TESTS = [{
# with hls video
'url': 'wistia:807fafadvk',
'md5': 'daff0f3687a41d9a71b40e0e8c2610fe',
'info_dict': {
'id': '807fafadvk',
'ext': 'mp4',
'title': 'Drip Brennan Dunn Workshop',
'description': 'a JV Webinars video',
'upload_date': '20160518',
'timestamp': 1463607249,
'duration': 4987.11,
},
}, {
'url': 'wistia:sh7fpupwlt',
'only_matching': True,
}, {
'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
'only_matching': True,
}, {
'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
'only_matching': True,
}, {
'url': 'http://fast.wistia.net/embed/medias/sh7fpupwlt.json',
'only_matching': True,
}]
# https://wistia.com/support/embed-and-share/video-on-your-website
@staticmethod
def _extract_url(webpage):
urls = WistiaIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _extract_urls(webpage):
urls = []
for match in re.finditer(
r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage):
urls.append(unescapeHTML(match.group('url')))
for match in re.finditer(
r'''(?sx)
<div[^>]+class=(["'])(?:(?!\1).)*?\bwistia_async_(?P<id>[a-z0-9]{10})\b(?:(?!\1).)*?\1
''', webpage):
urls.append('wistia:%s' % match.group('id'))
for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):
urls.append('wistia:%s' % match.group('id'))
return urls
def _real_extract(self, url):
video_id = self._match_id(url)
embed_config = self._download_embed_config('media', video_id, url)
return self._extract_media(embed_config)
class WistiaPlaylistIE(WistiaBaseIE):
_VALID_URL = r'%splaylists/%s' % (WistiaIE._VALID_URL_BASE, WistiaIE._VALID_ID_REGEX)
_TEST = {
'url': 'https://fast.wistia.net/embed/playlists/aodt9etokc',
'info_dict': {
'id': 'aodt9etokc',
},
'playlist_count': 3,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
playlist = self._download_embed_config('playlist', playlist_id, url)
entries = []
for media in (try_get(playlist, lambda x: x[0]['medias']) or []):
embed_config = media.get('embed_config')
if not embed_config:
continue
entries.append(self._extract_media(embed_config))
return self.playlist_result(entries, playlist_id)

View File

@@ -45,6 +45,7 @@ def aa_decode(aa_code):
class XFileShareIE(InfoExtractor): class XFileShareIE(InfoExtractor):
_SITES = ( _SITES = (
(r'aparat\.cam', 'Aparat'),
(r'clipwatching\.com', 'ClipWatching'), (r'clipwatching\.com', 'ClipWatching'),
(r'gounlimited\.to', 'GoUnlimited'), (r'gounlimited\.to', 'GoUnlimited'),
(r'govid\.me', 'GoVid'), (r'govid\.me', 'GoVid'),
@@ -78,6 +79,9 @@ class XFileShareIE(InfoExtractor):
'title': 'sample', 'title': 'sample',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:http://.*\.jpg',
}, },
}, {
'url': 'https://aparat.cam/n4d6dh0wvlpr',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@@ -1,23 +1,43 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext,
try_get, try_get,
urlencode_postdata, urljoin,
) )
class YandexDiskIE(InfoExtractor): class YandexDiskIE(InfoExtractor):
_VALID_URL = r'https?://yadi\.sk/[di]/(?P<id>[^/?#&]+)' _VALID_URL = r'''(?x)https?://
(?P<domain>
yadi\.sk|
disk\.yandex\.
(?:
az|
by|
co(?:m(?:\.(?:am|ge|tr))?|\.il)|
ee|
fr|
k[gz]|
l[tv]|
md|
t[jm]|
u[az]|
ru
)
)/(?:[di]/|public.*?\bhash=)(?P<id>[^/?#&]+)'''
_TESTS = [{ _TESTS = [{
'url': 'https://yadi.sk/i/VdOeDou8eZs6Y', 'url': 'https://yadi.sk/i/VdOeDou8eZs6Y',
'md5': '33955d7ae052f15853dc41f35f17581c', 'md5': 'a4a8d52958c8fddcf9845935070402ae',
'info_dict': { 'info_dict': {
'id': 'VdOeDou8eZs6Y', 'id': 'VdOeDou8eZs6Y',
'ext': 'mp4', 'ext': 'mp4',
@@ -27,92 +47,101 @@ class YandexDiskIE(InfoExtractor):
'uploader_id': '300043621', 'uploader_id': '300043621',
'view_count': int, 'view_count': int,
}, },
'expected_warnings': ['Unable to download JSON metadata'],
}, { }, {
'url': 'https://yadi.sk/d/h3WAXvDS3Li3Ce', 'url': 'https://yadi.sk/d/h3WAXvDS3Li3Ce',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://yadi.sk/public?hash=5DZ296JK9GWCLp02f6jrObjnctjRxMs8L6%2B%2FuhNqk38%3D',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) domain, video_id = re.match(self._VALID_URL, url).groups()
status = self._download_webpage(
'https://disk.yandex.com/auth/status', video_id, query={
'urlOrigin': url,
'source': 'public',
'md5': 'false',
})
sk = self._search_regex(
r'(["\'])sk(?:External)?\1\s*:\s*(["\'])(?P<value>(?:(?!\2).)+)\2',
status, 'sk', group='value')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
store = self._parse_json(self._search_regex(
r'<script[^>]+id="store-prefetch"[^>]*>\s*({.+?})\s*</script>',
webpage, 'store'), video_id)
resource = store['resources'][store['rootResourceId']]
models = self._parse_json( title = resource['name']
self._search_regex( meta = resource.get('meta') or {}
r'<script[^>]+id=["\']models-client[^>]+>\s*(\[.+?\])\s*</script',
webpage, 'video JSON'),
video_id)
data = next( public_url = meta.get('short_url')
model['data'] for model in models if public_url:
if model.get('model') == 'resource') video_id = self._match_id(public_url)
video_hash = data['id'] source_url = (self._download_json(
title = data['name'] 'https://cloud-api.yandex.net/v1/disk/public/resources/download',
video_id, query={'public_key': url}, fatal=False) or {}).get('href')
video_streams = resource.get('videoStreams') or {}
video_hash = resource.get('hash') or url
environment = store.get('environment') or {}
sk = environment.get('sk')
yandexuid = environment.get('yandexuid')
if sk and yandexuid and not (source_url and video_streams):
self._set_cookie(domain, 'yandexuid', yandexuid)
models = self._download_json( def call_api(action):
'https://disk.yandex.com/models/', video_id, return (self._download_json(
data=urlencode_postdata({ urljoin(url, '/public/api/') + action, video_id, data=json.dumps({
'_model.0': 'videoInfo', 'hash': video_hash,
'id.0': video_hash, 'sk': sk,
'_model.1': 'do-get-resource-url', }).encode(), headers={
'id.1': video_hash, 'Content-Type': 'text/plain',
'version': '13.6', }, fatal=False) or {}).get('data') or {}
'sk': sk, if not source_url:
}), query={'_m': 'videoInfo'})['models'] # TODO: figure out how to detect if download limit has
# been reached and then avoid unnecessary source format
videos = try_get(models, lambda x: x[0]['data']['videos'], list) or [] # extraction requests
source_url = try_get( source_url = call_api('download-url').get('url')
models, lambda x: x[1]['data']['file'], compat_str) if not video_streams:
video_streams = call_api('get-video-streams')
formats = [] formats = []
if source_url: if source_url:
formats.append({ formats.append({
'url': source_url, 'url': source_url,
'format_id': 'source', 'format_id': 'source',
'ext': determine_ext(title, 'mp4'), 'ext': determine_ext(title, meta.get('ext') or mimetype2ext(meta.get('mime_type')) or 'mp4'),
'quality': 1, 'quality': 1,
'filesize': int_or_none(meta.get('size'))
}) })
for video in videos:
for video in (video_streams.get('videos') or []):
format_url = video.get('url') format_url = video.get('url')
if not format_url: if not format_url:
continue continue
if determine_ext(format_url) == 'm3u8': if video.get('dimension') == 'adaptive':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native', format_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
else: else:
size = video.get('size') or {}
height = int_or_none(size.get('height'))
format_id = 'hls'
if height:
format_id += '-%dp' % height
formats.append({ formats.append({
'ext': 'mp4',
'format_id': format_id,
'height': height,
'protocol': 'm3u8_native',
'url': format_url, 'url': format_url,
'width': int_or_none(size.get('width')),
}) })
self._sort_formats(formats) self._sort_formats(formats)
duration = float_or_none(try_get( uid = resource.get('uid')
models, lambda x: x[0]['data']['duration']), 1000) display_name = try_get(store, lambda x: x['users'][uid]['displayName'])
uploader = try_get(
data, lambda x: x['user']['display_name'], compat_str)
uploader_id = try_get(
data, lambda x: x['user']['uid'], compat_str)
view_count = int_or_none(try_get(
data, lambda x: x['meta']['views_counter']))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'duration': duration, 'duration': float_or_none(video_streams.get('duration'), 1000),
'uploader': uploader, 'uploader': display_name,
'uploader_id': uploader_id, 'uploader_id': uid,
'view_count': view_count, 'view_count': int_or_none(meta.get('views_counter')),
'formats': formats, 'formats': formats,
} }

View File

@@ -5,6 +5,7 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
try_get,
url_or_none, url_or_none,
) )
@@ -13,26 +14,30 @@ class YandexVideoIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
yandex\.ru(?:/portal/(?:video|efir))?/?\?.*?stream_id=| yandex\.ru(?:/(?:portal/(?:video|efir)|efir))?/?\?.*?stream_id=|
frontend\.vh\.yandex\.ru/player/ frontend\.vh\.yandex\.ru/player/
) )
(?P<id>[\da-f]+) (?P<id>(?:[\da-f]{32}|[\w-]{12}))
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://yandex.ru/portal/video?stream_id=4dbb262b4fe5cf15a215de4f34eee34d', 'url': 'https://yandex.ru/portal/video?stream_id=4dbb36ec4e0526d58f9f2dc8f0ecf374',
'md5': '33955d7ae052f15853dc41f35f17581c', 'md5': 'e02a05bfaf0d9615ef07ae3a10f4faf4',
'info_dict': { 'info_dict': {
'id': '4dbb262b4fe5cf15a215de4f34eee34d', 'id': '4dbb36ec4e0526d58f9f2dc8f0ecf374',
'ext': 'mp4', 'ext': 'mp4',
'title': 'В Нью-Йорке баржи и теплоход оторвались от причала и расплылись по Гудзону', 'title': 'Русский Вудсток - главный рок-фест в истории СССР / вДудь',
'description': '', 'description': 'md5:7d6b8d4bc4a3b9a56499916c1ea5b5fa',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://',
'timestamp': 0, 'timestamp': 1549972939,
'duration': 30, 'duration': 5575,
'age_limit': 18, 'age_limit': 18,
'upload_date': '20190212',
'view_count': int,
'like_count': int,
'dislike_count': int,
}, },
}, { }, {
'url': 'https://yandex.ru/portal/efir?stream_id=4dbb36ec4e0526d58f9f2dc8f0ecf374&from=morda', 'url': 'https://yandex.ru/portal/efir?stream_id=4dbb262b4fe5cf15a215de4f34eee34d&from=morda',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://yandex.ru/?stream_id=4dbb262b4fe5cf15a215de4f34eee34d', 'url': 'https://yandex.ru/?stream_id=4dbb262b4fe5cf15a215de4f34eee34d',
@@ -52,53 +57,88 @@ class YandexVideoIE(InfoExtractor):
# DASH with DRM # DASH with DRM
'url': 'https://yandex.ru/portal/video?from=morda&stream_id=485a92d94518d73a9d0ff778e13505f8', 'url': 'https://yandex.ru/portal/video?from=morda&stream_id=485a92d94518d73a9d0ff778e13505f8',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://yandex.ru/efir?stream_active=watching&stream_id=v7a2dZ-v5mSI&from_block=efir_newtab',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
content = self._download_json( player = try_get((self._download_json(
'https://frontend.vh.yandex.ru/v22/player/%s.json' % video_id, 'https://frontend.vh.yandex.ru/graphql', video_id, data=('''{
video_id, query={ player(content_id: "%s") {
'stream_options': 'hires', computed_title
'disable_trackings': 1, content_url
})['content'] description
dislikes
duration
likes
program_title
release_date
release_date_ut
release_year
restriction_age
season
start_time
streams
thumbnail
title
views_count
}
}''' % video_id).encode(), fatal=False)), lambda x: x['player']['content'])
if not player or player.get('error'):
player = self._download_json(
'https://frontend.vh.yandex.ru/v23/player/%s.json' % video_id,
video_id, query={
'stream_options': 'hires',
'disable_trackings': 1,
})
content = player['content']
content_url = url_or_none(content.get('content_url')) or url_or_none( title = content.get('title') or content['computed_title']
content['streams'][0]['url'])
title = content.get('title') or content.get('computed_title')
ext = determine_ext(content_url) formats = []
streams = content.get('streams') or []
if ext == 'm3u8': streams.append({'url': content.get('content_url')})
formats = self._extract_m3u8_formats( for stream in streams:
content_url, video_id, 'mp4', entry_protocol='m3u8_native', content_url = url_or_none(stream.get('url'))
m3u8_id='hls') if not content_url:
elif ext == 'mpd': continue
formats = self._extract_mpd_formats( ext = determine_ext(content_url)
content_url, video_id, mpd_id='dash') if ext == 'ismc':
else: continue
formats = [{'url': content_url}] elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
content_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
content_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({'url': content_url})
self._sort_formats(formats) self._sort_formats(formats)
description = content.get('description')
thumbnail = content.get('thumbnail')
timestamp = (int_or_none(content.get('release_date')) timestamp = (int_or_none(content.get('release_date'))
or int_or_none(content.get('release_date_ut')) or int_or_none(content.get('release_date_ut'))
or int_or_none(content.get('start_time'))) or int_or_none(content.get('start_time')))
duration = int_or_none(content.get('duration')) season = content.get('season') or {}
series = content.get('program_title')
age_limit = int_or_none(content.get('restriction_age'))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': content.get('description'),
'thumbnail': thumbnail, 'thumbnail': content.get('thumbnail'),
'timestamp': timestamp, 'timestamp': timestamp,
'duration': duration, 'duration': int_or_none(content.get('duration')),
'series': series, 'series': content.get('program_title'),
'age_limit': age_limit, 'age_limit': int_or_none(content.get('restriction_age')),
'view_count': int_or_none(content.get('views_count')),
'like_count': int_or_none(content.get('likes')),
'dislike_count': int_or_none(content.get('dislikes')),
'season_number': int_or_none(season.get('season_number')),
'season_id': season.get('id'),
'release_year': int_or_none(content.get('release_year')),
'formats': formats, 'formats': formats,
} }

View File

@@ -16,6 +16,7 @@ from ..jsinterp import JSInterpreter
from ..swfinterp import SWFInterpreter from ..swfinterp import SWFInterpreter
from ..compat import ( from ..compat import (
compat_chr, compat_chr,
compat_HTTPError,
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
@@ -279,6 +280,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_YT_INITIAL_DATA_RE = r'(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;' _YT_INITIAL_DATA_RE = r'(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;'
_YT_INITIAL_PLAYER_RESPONSE_RE = r'ytInitialPlayerResponse\s*=\s*({.+?})\s*;' _YT_INITIAL_PLAYER_RESPONSE_RE = r'ytInitialPlayerResponse\s*=\s*({.+?})\s*;'
_YT_INITIAL_BOUNDARY_RE = r'(?:var\s+meta|</script|\n)'
def _call_api(self, ep, query, video_id): def _call_api(self, ep, query, video_id):
data = self._DEFAULT_API_DATA.copy() data = self._DEFAULT_API_DATA.copy()
@@ -296,7 +298,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
def _extract_yt_initial_data(self, video_id, webpage): def _extract_yt_initial_data(self, video_id, webpage):
return self._parse_json( return self._parse_json(
self._search_regex( self._search_regex(
(r'%s\s*\n' % self._YT_INITIAL_DATA_RE, (r'%s\s*%s' % (self._YT_INITIAL_DATA_RE, self._YT_INITIAL_BOUNDARY_RE),
self._YT_INITIAL_DATA_RE), webpage, 'yt initial data'), self._YT_INITIAL_DATA_RE), webpage, 'yt initial data'),
video_id) video_id)
@@ -321,7 +323,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Invidious instances taken from https://github.com/omarroth/invidious/wiki/Invidious-Instances # Invidious instances taken from https://github.com/omarroth/invidious/wiki/Invidious-Instances
(?:(?:www|dev)\.)?invidio\.us/| (?:(?:www|dev)\.)?invidio\.us/|
(?:(?:www|no)\.)?invidiou\.sh/| (?:(?:www|no)\.)?invidiou\.sh/|
(?:(?:www|fi|de)\.)?invidious\.snopyta\.org/| (?:(?:www|fi)\.)?invidious\.snopyta\.org/|
(?:www\.)?invidious\.kabi\.tk/| (?:www\.)?invidious\.kabi\.tk/|
(?:www\.)?invidious\.13ad\.de/| (?:www\.)?invidious\.13ad\.de/|
(?:www\.)?invidious\.mastodon\.host/| (?:www\.)?invidious\.mastodon\.host/|
@@ -1102,6 +1104,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
# another example of '};' in ytInitialData
'url': 'https://www.youtube.com/watch?v=gVfgbahppCY',
'only_matching': True,
},
{
'url': 'https://www.youtube.com/watch_popup?v=63RmMXCd_bQ',
'only_matching': True,
},
] ]
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
@@ -1322,17 +1333,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return self._parse_json( return self._parse_json(
uppercase_escape(config), video_id, fatal=False) uppercase_escape(config), video_id, fatal=False)
def _get_automatic_captions(self, video_id, webpage): def _get_automatic_captions(self, video_id, player_response, player_config):
"""We need the webpage for getting the captions url, pass it as an """We need the webpage for getting the captions url, pass it as an
argument to speed up the process.""" argument to speed up the process."""
self.to_screen('%s: Looking for automatic captions' % video_id) self.to_screen('%s: Looking for automatic captions' % video_id)
player_config = self._get_ytplayer_config(video_id, webpage)
err_msg = 'Couldn\'t find automatic captions for %s' % video_id err_msg = 'Couldn\'t find automatic captions for %s' % video_id
if not player_config: if not (player_response or player_config):
self._downloader.report_warning(err_msg) self._downloader.report_warning(err_msg)
return {} return {}
try: try:
args = player_config['args'] args = player_config.get('args') if player_config else {}
caption_url = args.get('ttsurl') caption_url = args.get('ttsurl')
if caption_url: if caption_url:
timestamp = args['timestamp'] timestamp = args['timestamp']
@@ -1391,19 +1401,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return captions return captions
# New captions format as of 22.06.2017 # New captions format as of 22.06.2017
player_response = args.get('player_response') if player_response:
if player_response and isinstance(player_response, compat_str): renderer = player_response['captions']['playerCaptionsTracklistRenderer']
player_response = self._parse_json( base_url = renderer['captionTracks'][0]['baseUrl']
player_response, video_id, fatal=False) sub_lang_list = []
if player_response: for lang in renderer['translationLanguages']:
renderer = player_response['captions']['playerCaptionsTracklistRenderer'] lang_code = lang.get('languageCode')
base_url = renderer['captionTracks'][0]['baseUrl'] if lang_code:
sub_lang_list = [] sub_lang_list.append(lang_code)
for lang in renderer['translationLanguages']: return make_captions(base_url, sub_lang_list)
lang_code = lang.get('languageCode')
if lang_code:
sub_lang_list.append(lang_code)
return make_captions(base_url, sub_lang_list)
# Some videos don't provide ttsurl but rather caption_tracks and # Some videos don't provide ttsurl but rather caption_tracks and
# caption_translation_languages (e.g. 20LmZk1hakA) # caption_translation_languages (e.g. 20LmZk1hakA)
@@ -1652,6 +1658,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Get video info # Get video info
video_info = {} video_info = {}
embed_webpage = None embed_webpage = None
ytplayer_config = None
if re.search(r'["\']status["\']\s*:\s*["\']LOGIN_REQUIRED', video_webpage) is not None: if re.search(r'["\']status["\']\s*:\s*["\']LOGIN_REQUIRED', video_webpage) is not None:
age_gate = True age_gate = True
@@ -1705,7 +1712,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if not video_info and not player_response: if not video_info and not player_response:
player_response = extract_player_response( player_response = extract_player_response(
self._search_regex( self._search_regex(
(r'%s\s*(?:var\s+meta|</script|\n)' % self._YT_INITIAL_PLAYER_RESPONSE_RE, (r'%s\s*%s' % (self._YT_INITIAL_PLAYER_RESPONSE_RE, self._YT_INITIAL_BOUNDARY_RE),
self._YT_INITIAL_PLAYER_RESPONSE_RE), video_webpage, self._YT_INITIAL_PLAYER_RESPONSE_RE), video_webpage,
'initial player response', default='{}'), 'initial player response', default='{}'),
video_id) video_id)
@@ -2276,7 +2283,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# subtitles # subtitles
video_subtitles = self.extract_subtitles(video_id, video_webpage) video_subtitles = self.extract_subtitles(video_id, video_webpage)
automatic_captions = self.extract_automatic_captions(video_id, video_webpage) automatic_captions = self.extract_automatic_captions(video_id, player_response, ytplayer_config)
video_duration = try_get( video_duration = try_get(
video_info, lambda x: int_or_none(x['length_seconds'][0])) video_info, lambda x: int_or_none(x['length_seconds'][0]))
@@ -2446,7 +2453,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
)/ )/
(?: (?:
(?:channel|c|user|feed)/| (?:channel|c|user|feed)/|
(?:playlist|watch)\?.*?\blist= (?:playlist|watch)\?.*?\blist=|
(?!(?:watch|embed|v|e)\b)
) )
(?P<id>[^/?\#&]+) (?P<id>[^/?\#&]+)
''' '''
@@ -2715,13 +2723,27 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
# inline playlist with not always working continuations # inline playlist with not always working continuations
'url': 'https://www.youtube.com/watch?v=UC6u0Tct-Fo&list=PL36D642111D65BE7C', 'url': 'https://www.youtube.com/watch?v=UC6u0Tct-Fo&list=PL36D642111D65BE7C',
'only_matching': True, 'only_matching': True,
} }, {
# TODO 'url': 'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8',
# { 'only_matching': True,
# 'url': 'https://www.youtube.com/TheYoungTurks/live', }, {
# 'only_matching': True, 'url': 'https://www.youtube.com/course',
# } 'only_matching': True,
] }, {
'url': 'https://www.youtube.com/zsecurity',
'only_matching': True,
}, {
'url': 'http://www.youtube.com/NASAgovVideo/videos',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/TheYoungTurks/live',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if YoutubeIE.suitable(url) else super(
YoutubeTabIE, cls).suitable(url)
def _extract_channel_id(self, webpage): def _extract_channel_id(self, webpage):
channel_id = self._html_search_meta( channel_id = self._html_search_meta(
@@ -3003,10 +3025,24 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
for page_num in itertools.count(1): for page_num in itertools.count(1):
if not continuation: if not continuation:
break break
browse = self._download_json( count = 0
'https://www.youtube.com/browse_ajax', None, retries = 3
'Downloading page %d' % page_num, while count <= retries:
headers=headers, query=continuation, fatal=False) try:
# Downloading page may result in intermittent 5xx HTTP error
# that is usually worked around with a retry
browse = self._download_json(
'https://www.youtube.com/browse_ajax', None,
'Downloading page %d%s'
% (page_num, ' (retry #%d)' % count if count else ''),
headers=headers, query=continuation)
break
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
count += 1
if count <= retries:
continue
raise
if not browse: if not browse:
break break
response = try_get(browse, lambda x: x[1]['response'], dict) response = try_get(browse, lambda x: x[1]['response'], dict)

View File

@@ -85,7 +85,13 @@ class ZypeIE(InfoExtractor):
else: else:
m3u8_url = self._search_regex( m3u8_url = self._search_regex(
r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1', r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1',
body, 'm3u8 url', group='url') body, 'm3u8 url', group='url', default=None)
if not m3u8_url:
source = self._parse_json(self._search_regex(
r'(?s)sources\s*:\s*\[\s*({.+?})\s*\]', body,
'source'), video_id, js_to_json)
if source.get('integration') == 'verizon-media':
m3u8_url = 'https://content.uplynk.com/%s.m3u8' % source['id']
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls') m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
text_tracks = self._search_regex( text_tracks = self._search_regex(

View File

@@ -3640,7 +3640,7 @@ def url_or_none(url):
if not url or not isinstance(url, compat_str): if not url or not isinstance(url, compat_str):
return None return None
url = url.strip() url = url.strip()
return url if re.match(r'^(?:[a-zA-Z][\da-zA-Z.+-]*:)?//', url) else None return url if re.match(r'^(?:(?:https?|rt(?:m(?:pt?[es]?|fp)|sp[su]?)|mms|ftps?):)?//', url) else None
def parse_duration(s): def parse_duration(s):
@@ -5706,3 +5706,20 @@ def random_birthday(year_field, month_field, day_field):
month_field: str(random_date.month), month_field: str(random_date.month),
day_field: str(random_date.day), day_field: str(random_date.day),
} }
def clean_podcast_url(url):
return re.sub(r'''(?x)
(?:
(?:
chtbl\.com/track|
media\.blubrry\.com| # https://create.blubrry.com/resources/podcast-media-download-statistics/getting-started/
play\.podtrac\.com
)/[^/]+|
(?:dts|www)\.podtrac\.com/(?:pts/)?redirect\.[0-9a-z]{3,4}| # http://analytics.podtrac.com/how-to-measure
flex\.acast\.com|
pd(?:
cn\.co| # https://podcorn.com/analytics-prefix/
st\.fm # https://podsights.com/docs/
)/e
)/''', '', url)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2020.12.22' __version__ = '2021.01.03'