Compare commits

...

547 Commits

Author SHA1 Message Date
Sergey M․
cfefb7d854 release 2021.02.04 2021-02-04 04:49:25 +07:00
Sergey M․
3c07d007ca [ChangeLog] Actualize
[ci skip]
2021-02-04 04:47:30 +07:00
Sergey M․
89c5a7d5aa [pornhub] Implement lazy playlist extraction 2021-02-04 04:42:14 +07:00
Sergey M․
2adc0c51cd [pornhub] Add placeholder netrc machine 2021-02-04 04:20:09 +07:00
Sergey M․
1f0910bc27 [svtplay] Fix video id extraction (closes #28058) 2021-02-04 04:17:45 +07:00
Sergey M․
e22ff4e356 [pornhub] Add support for authentication (closes #18797, closes #21416, closes #24294) 2021-02-04 04:09:11 +07:00
Sergey M․
83031d749b [pornhub:user] Add support for URLs unavailable via /videos page and improve paging (closes #27853) 2021-02-04 00:25:53 +07:00
Remita Amine
1b731ebcaa [bravotv] add support for oxygen.com(closes #13357)(closes #22500) 2021-02-03 18:13:17 +01:00
Remita Amine
ab25f3f431 [youtube] pass embed URL to get_video_info request 2021-02-03 17:15:31 +01:00
Guillem Vela
07f7aad81c [ccma] improve metadata extraction(closes #27994)
- extract age_limit, alt_title, categories, series and episode_number
- fix timestamp multiple subtitles extraction
2021-02-03 09:19:54 +01:00
Remita Amine
1e2575df87 Credit @adrianheine for #27732 2021-02-03 00:21:46 +01:00
Remita Amine
b111a64135 [egghead] fix typo 2021-02-02 19:05:37 +01:00
Viren Rajput
0e3a968479 [egghead] update API domain(closes #28038) 2021-02-02 19:00:36 +01:00
Remita Amine
c11f7cf9bd [vidzi] remove extractor(closes #12629) 2021-02-01 22:35:28 +01:00
Remita Amine
8fa7cc387d [vidio] improve metadata extraction 2021-02-01 21:35:18 +01:00
Remita Amine
65eee5a745 [youtube] improve subtitle extraction 2021-02-01 18:12:35 +01:00
Remita Amine
efef4ddf51 [youtube] fix chapter extraction fallback 2021-02-01 16:49:52 +01:00
Remita Amine
159a3d48df [youtube] keep _formats array for format sorting tests 2021-02-01 16:36:19 +01:00
Remita Amine
b46483a6ec [youtube/test_youtube_signature] fix test 2021-02-01 16:35:07 +01:00
Remita Amine
9c724601ba [youtube] remove description chapters tests
video description no longer contain yt.www.watch.player.seekTo
function
2021-02-01 16:11:07 +01:00
Remita Amine
67299f23d8 [youtube] Rewrite Extractor
- improve format sorting
- remove unused code(swf parsing, ...)
- fix series metadata extraction
- fix trailer video extraction
- improve error reporting
- extract video location
2021-02-01 14:53:01 +01:00
Adrian Heine né Lang
8bf9591a70 [AENetworks] update AENetworksShowIE test playlist id (#27851) 2021-01-30 13:44:01 +01:00
nixxo
a800838f5a [vvvvid] add support for youtube embeds (#27825) 2021-01-29 14:24:28 +01:00
Remita Amine
ba15b2fee6 [googledrive] report download page errors(closes #28005) 2021-01-29 11:53:49 +01:00
ping
56a7ee9033 [vlive] Fix error message decoding for python 2 (#28004) 2021-01-29 08:02:18 +00:00
Remita Amine
0b4f03a563 [youtube] improve DASH formats file size extraction 2021-01-28 16:24:46 +01:00
knapior
7b8fa658f8 [cda] Improve birth validation detection (closes #14022) (#27929)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-01-28 02:43:20 +07:00
Adrian Heine né Lang
fd95fc33b1 [awaan] Extract uploader id (#27963) 2021-01-28 02:06:12 +07:00
Remita Amine
c669554ef5 [medialaan] add support DPG Media MyChannels based websites
closes #14871
closes #15597
closes #16106
closes #16489
2021-01-27 14:58:47 +01:00
Remita Amine
11b68df7a4 [abcnews] fix extraction(closes #12394)(closes #27920) 2021-01-27 12:28:22 +01:00
Adrian Heine né Lang
d18f4419a7 [AMP] Fix upload_date and timestamp extraction (#27970) 2021-01-26 22:43:11 +01:00
Remita Amine
0f7d413d5b [tv4] relax _VALID_URL(closes #27964) 2021-01-25 15:15:45 +01:00
Remita Amine
286e5d6724 [tv2] improve MTV Uutiset Article extraction 2021-01-25 14:46:04 +01:00
tpikonen
395981288b [tv2] Add support for mtvuutiset.fi (#27744) 2021-01-25 14:43:41 +01:00
Remita Amine
55bb3556c8 [adn] improve login warning reporting 2021-01-25 09:28:24 +01:00
Remita Amine
57f2488bbe [zype] fix uplynk id extraction(closes #27956) 2021-01-25 09:26:22 +01:00
Adrian Heine né Lang
ea399a53eb [ADN] Implement login (#27937)
closes #17091
closes #27841
2021-01-25 08:25:57 +00:00
Sergey M․
811a183eb6 [ChangeLog] Actualize
[ci skip]
2021-01-24 18:15:32 +07:00
Sergey M․
b63981e850 release 2021.01.24.1 2021-01-24 18:11:20 +07:00
Sergey M․
186cbaffb9 release 2021.01.24 2021-01-24 18:00:58 +07:00
Sergey M․
dbf3fa8af6 [ChangeLog] Actualize
[ci skip]
2021-01-24 17:36:40 +07:00
Sergey M․
f08c31cf33 [franceculture] Make thumbnail optional (closes #18807) 2021-01-23 01:10:35 +07:00
Aurélien Grosdidier
d8dab85419 [franceculture] Fix extraction (closes #27891) (#27903)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-01-23 01:03:45 +07:00
Sergey M․
5519bba3e1 [options] Clarify --extract-audio help string (closes #27878) 2021-01-23 00:00:53 +07:00
Sergey M․
142c584063 Introduce --output-na-placeholder (closes #27896) 2021-01-22 23:54:29 +07:00
aarubui
4542e3e555 [njpwworld] fix extraction (#27890) 2021-01-22 08:56:51 +00:00
Remita Amine
fa8f6d8580 [comedycentral] fix extraction(closes #27905) 2021-01-21 23:53:09 +01:00
Remita Amine
3bb7769c40 [wat] remove unused variable 2021-01-21 17:22:30 +01:00
Remita Amine
8d286bd5b6 [wat] fix format extraction(closes #27901) 2021-01-21 17:20:32 +01:00
Remita Amine
cff72b4cc0 [americastestkitchen] improve season extraction 2021-01-21 16:47:55 +01:00
Brian Marks
657221c81d [americastestkitchen] Add support for downloading entire seasons (#27861) 2021-01-21 15:46:29 +00:00
Remita Amine
62acf5fa2c [trovo] Add new extractor(closes #26125) 2021-01-19 22:43:59 +01:00
Remita Amine
b79977fb6b [aol] add support for yahoo videos(closes #26650) 2021-01-19 14:49:45 +01:00
Remita Amine
bc7c8f3d4e [yahoo] fix single video extraction 2021-01-19 14:47:39 +01:00
Remita Amine
015e19b350 [lbry] unescape lbry uri(closes #27872) 2021-01-19 11:54:39 +01:00
Remita Amine
54856480d7 [ninegag] improve extraction 2021-01-19 10:23:02 +01:00
DrWursterich
1dd12708c2 [9gag] Fix Extraction (#23022) 2021-01-19 10:21:37 +01:00
Brian Marks
f9201cef58 [americastestkitchen] Improve metadata extraction for ATK episodes (#27860) 2021-01-18 14:47:06 +00:00
Remita Amine
26499ba823 [aljazeera] fix extraction(closes #20911)(closes #27779) 2021-01-17 17:35:10 +01:00
Remita Amine
58f6c2112d [minds] improve extraction 2021-01-17 14:07:56 +01:00
Tatsh
de026a6acd [Minds] Add new extractor (#17934) 2021-01-17 13:05:39 +00:00
Adrian Heine né Lang
d4564afc70 [ard] Fix title and description extraction and update tests (#27761) 2021-01-17 05:41:52 +07:00
Adrian Heine né Lang
360a5e0f60 [aenetworks] Fix test (#27847) 2021-01-17 03:40:08 +07:00
Remita Amine
55a3ca16d3 [spotify] Add new extractor for Spotify Podcasts(closes #27443) 2021-01-16 18:12:05 +01:00
Sergey M․
ef50cb3fda release 2021.01.16 2021-01-16 09:51:30 +07:00
Sergey M․
8673f4344c [ChangeLog] Actualize
[ci skip]
2021-01-16 09:48:00 +07:00
Sergey M․
f1487d4fca [mixcloud:playlist:base] Fix video id extraction in flat playlist mode (refs #27787) 2021-01-16 09:42:57 +07:00
Sergey M․
0cd4c402f0 [animeondemand] Add support for lazy playlist extraction (closes #27829) 2021-01-16 09:22:33 +07:00
Sergey M․
9c9b458145 [YoutubeDL] Protect from infinite recursion due to recursively nested playlists (closes #27833) 2021-01-16 01:46:41 +07:00
Remita Amine
9d50f86232 [twitter] Add tests for more cards 2021-01-15 10:32:01 +01:00
Sergey M․
7e92f9015e [youporn] Restrict fallback download URL (refs #27822) 2021-01-15 15:12:04 +07:00
Sergey M․
aa860b8016 [youporn] Improve height and tbr extraction (refs #23659, refs #20425) 2021-01-15 14:54:23 +07:00
Sergey M․
b484097b01 [youporn] Fix extraction (closes #27822) 2021-01-15 14:43:52 +07:00
Sergey M․
ab9001dab5 [twitter] Add support for unified cards (closes #27826) 2021-01-15 12:39:21 +07:00
main()
879866a230 [twitch] Set OAuth token for GraphQL requests using auth-token cookie (#27790)
Co-authored-by: remitamine <remitamine@gmail.com>
2021-01-14 12:41:44 +00:00
Aaron Zeng
8e5477d036 [YoutubeDL] Ignore failure to create existing directory (#27811) 2021-01-14 11:46:07 +07:00
Sergey M․
1e8e5d5238 [workflows/ci.yml] Run CI on PRs 2021-01-14 02:54:38 +07:00
Sergey M․
d81a213cfb [YoutubeDL] Raise syntax error for format selection expressions with multiple + operators (closes #27803) 2021-01-14 00:38:31 +07:00
Aarni Koskela
7c2d18a13f [Mixcloud] Harmonize ID generation from lists with full ID generation (#27787)
Mixcloud IDs are generated as `username_slug` when the full ID dict has been
downloaded.  When downloading a list (e.g. uploads, favorites, ...), the temporary
ID is just the `slug`.  This made e.g. archive file usage require the download
of stream metadata before the download can be rejected as already downloaded.

This commit attempts to get the uploader username during the GraphQL query, so the
temporary IDs are generated similarly.
2021-01-13 09:22:48 +00:00
Remita Amine
2408e6d26a [cspan] improve info extraction(closes #27791) 2021-01-13 09:17:43 +01:00
Remita Amine
cf862771d7 [adn] improve info extraction 2021-01-12 21:15:13 +01:00
Adrian Heine né Lang
a938f111ed [ADN] Fix extraction (#27732)
Closes #26963.
2021-01-12 20:12:44 +00:00
Remita Amine
4759543f6e [youtube:search] fix view_count and try to extract all video sections(closes #27588)(closes #27604) 2021-01-11 17:35:51 +01:00
Sergey M․
d0fc289f45 [workflows/ci.yml] Change ubuntu-latest to ubuntu-18.04
ubuntu-latest alias will be switched to ubuntu-20.04 soon (see [1]) which lacks python 3.3 and 3.4 (see [2]).

1. https://github.blog/changelog/2020-10-29-github-actions-ubuntu-latest-workflows-will-use-ubuntu-20-04/
2. https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json
2021-01-11 01:30:54 +07:00
Sergey M․
70f572585d [twitch] Improve login error extraction 2021-01-10 06:23:58 +07:00
Sergey M․
c2d06aef60 [twitch] Fix authentication (refs #27743) 2021-01-10 06:23:08 +07:00
Remita Amine
ff1e765400 [threeqsdn] Improve extraction(closes #21058) 2021-01-09 22:29:58 +01:00
0l-l0
170e1c1995 [peertube] Extract files also from streamingPlaylists (#27728)
JSON objects with an empty "files" tag seem to be a valid PeerTube API
response. In those cases the "files" arrays contained in the
"streamingPlaylists" members can be used instead.
closes #26002
closes #27586
2021-01-08 20:09:38 +00:00
Remita Amine
61e669acff [khanacademy] fix extraction(closes #2887)(closes #26803) 2021-01-08 16:13:22 +01:00
Remita Amine
2c337f4e85 [spike] Update Paramount Network feed URL(closes #27715) 2021-01-08 09:26:12 +01:00
Sergey M․
bf6a74c620 release 2021.01.08 2021-01-08 11:03:34 +07:00
Sergey M․
38a967c98e [ChangeLog] Actualize
[ci skip]
2021-01-08 11:01:29 +07:00
nixxo
3a61e6d360 [rai] improve subtitles extraction (#27705)
closes #27698
2021-01-07 13:48:45 +00:00
Remita Amine
3d8e32dcc0 [canvas] Match only supported VRT NU URLs(#27707) 2021-01-07 12:35:04 +01:00
Remita Amine
8f29b2dd38 [extractors] add BibelTVIE import 2021-01-06 20:39:04 +01:00
Remita Amine
a29e340efa [bibeltv] Add new extractor(closes #14361) 2021-01-06 20:38:02 +01:00
Remita Amine
b13f29098f [bfmtv] Add new extractor(closes #16053)(closes #26615) 2021-01-06 18:50:02 +01:00
Remita Amine
430c4bc9d0 [sbs] Add support for ondemand play and news embed URLs(closes #17650)(closes #27629) 2021-01-06 13:41:07 +01:00
Sergey M․
4ae243fc6c [twitch] Refactor 2021-01-06 03:54:33 +07:00
Sergey M․
8f20ad36dc [twitch] Drop legacy kraken API v5 code altogether 2021-01-06 03:46:45 +07:00
Sergey M․
799c794947 [twitch:vod] Switch to GraphQL for video metadata 2021-01-06 03:34:36 +07:00
Remita Amine
1ae7ae0b96 [canvas] Fix VRT NU extraction(closes #26957)(closes #27053) 2021-01-05 21:17:39 +01:00
Sergey M․
ccc7112291 [twitch] Improve access token extraction and remove unused code (closes #27646) 2021-01-06 02:11:49 +07:00
23rd
5b24f8f505 [twitch] Switch access token to GraphQL and refactor. 2021-01-06 02:11:42 +07:00
nixxo
fcd90d2583 [rai] Detect ContentItem in iframe (closes #12652) (#27673)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-01-06 01:49:56 +07:00
Remita Amine
8f757c7353 [ketnet] fix extraction(closes #27662) 2021-01-05 18:17:04 +01:00
Remita Amine
be1a3f2d11 [dplay] Add suport Discovery+ domains(closes #27680) 2021-01-05 17:31:58 +01:00
Sergey M․
ecae54a98d [motherless] Fix review issues and improve extraction (closes #26495, closes #27450) 2021-01-05 07:40:30 +07:00
cladmi
f318882955 [motherless] Fix recent videos upload date extraction (closes #27661)
Less than a week old videos use a '20h ago' or '1d ago' format.

I kept the support for 'Ago' with uppercase start at is was already in the code.
2021-01-05 07:40:25 +07:00
Sergey M․
c3399cac19 [travis] Drop Travis CI (closes #7193, closes #12366) 2021-01-05 02:44:27 +07:00
Sergey M․
9237aaa77f [workflows/ci.yml] Add support for jython 2021-01-05 02:38:44 +07:00
Kevin O'Connor
766fcdd0fa [downloader/hls] Disable decryption in tests (#27660)
Tests truncate the download to 10241 bytes, which is not divisible by 16
and cannot be decrypted. Tests don't really care about the decrypted
content, just that the data they retrieved is the expected data.
Therefore, it's fine to just return the encrypted data to tests.

See: #27621 and #27620
2021-01-05 01:24:01 +07:00
Yurii H
f6ea29e24b [iheart] Update test description value (#27037)
the description has no HTML tags now.
2021-01-04 17:16:27 +01:00
Remita Amine
8a3797a4ab [nrk] fix extraction for videos without a legalAge rating 2021-01-04 12:16:54 +01:00
Remita Amine
745db8899d [iheart] clean HTML tags from episode description 2021-01-04 09:59:20 +01:00
Remita Amine
83db801cbf [iheart] remove print statement 2021-01-04 09:42:27 +01:00
Remita Amine
964a8eb754 [googleplus] Remove Extractor(closes #4955)(closes #7400) 2021-01-04 01:14:26 +01:00
Remita Amine
ac61f2e058 [applepodcasts] Add new extractor(#25918) 2021-01-04 01:14:26 +01:00
Remita Amine
8487e8b98a [googlepodcasts] Add new extractor 2021-01-04 01:14:26 +01:00
Remita Amine
9c484c0019 [iheart] Add new extractor for iHeartRadio(#27037) 2021-01-04 01:14:26 +01:00
Remita Amine
0e96b4b5ce [acast] clean podcast URLs 2021-01-04 01:14:26 +01:00
Remita Amine
a563c97c5c [stitcher] clean podcast URLs 2021-01-04 01:14:25 +01:00
Remita Amine
e88c9ef62a [utils] add a function to clean podcast URLs 2021-01-04 01:14:25 +01:00
Sergey M․
0889eb33e0 [xfileshare] Add support for aparat.cam (closes #27651) 2021-01-04 00:17:51 +07:00
Sergey M․
0021a2b9a1 [nrktv] Add subtitles test 2021-01-03 22:25:17 +07:00
Remita Amine
19ec468635 [twitter] Add support for summary card(closes #25121) 2021-01-03 13:37:24 +01:00
Remita Amine
491ee7efe4 [twitter] try to use a Generic fallback for unknown twitter cards(closes #25982) 2021-01-03 13:29:00 +01:00
Remita Amine
8522bcd97c [stitcher] Add support for shows and show metadata extraction(closes #20510) 2021-01-03 12:12:06 +01:00
Remita Amine
ac71fd5919 [stv] improve episode id extraction(closes #23083) 2021-01-03 10:04:57 +01:00
Sergey M․
8e953dcbb1 release 2021.01.03 2021-01-03 09:12:47 +07:00
Sergey M․
f4afb9a6a8 [ChangeLog] Actualize
[ci skip]
2021-01-03 09:10:40 +07:00
Sergey M․
d5b8cf093c [nrk] Fix age limit extraction 2021-01-03 08:58:05 +07:00
Sergey M․
5c6e84c0ff [nrk] Improve series metadata extraction (closes #27473) 2021-01-03 08:31:26 +07:00
Sergey M․
1aaee908b9 [nrk] PEP 8 2021-01-03 07:57:12 +07:00
Sergey M․
b2d9fd9c9f [nrk] Improve episode and season number extraction 2021-01-03 07:53:05 +07:00
Sergey M․
bc2f83b95e [nrktv] Fix tests 2021-01-03 07:31:03 +07:00
Sergey M․
85de33b04e [nrk] Improve series metadata extraction 2021-01-03 07:30:45 +07:00
Sergey M․
7dfd966848 [nrk] Extract subtitles 2021-01-03 07:02:13 +07:00
Sergey M․
a25d03d7cb [nrk] Fix age limit extraction 2021-01-03 06:51:21 +07:00
Sergey M․
cabfd4b1f0 [nrk] Inline _extract_from_playback 2021-01-03 06:45:17 +07:00
Sergey M․
7b643d4cd0 [nrk] Improve video id extraction 2021-01-03 06:44:28 +07:00
Sergey M․
1f1d01d498 [nrk] Add more shortcut tests 2021-01-03 06:41:37 +07:00
Sergey M․
21a42e2588 [nrk] Improve extraction (closes #27634, closes #27635)
+ Add support for mp3 formats
* Generalize and delegate all item extractors to nrk, beware ie key breakages
+ Add support for podcasts
+ Generalize nrk shortcut form to support all kind of ids
2021-01-03 06:29:52 +07:00
Sergey M․
2df93a0c4a [nrktv] Switch to playback endpoint
mediaelement endpoint is no longer in use.
2021-01-03 05:01:45 +07:00
Remita Amine
75972e200d [vvvvid] fix season metadata extraction(#18130) 2021-01-01 20:21:59 +01:00
Remita Amine
d0d838638c [stitcher] fix extraction(closes #20811)(closes #27606) 2021-01-01 18:38:46 +01:00
Remita Amine
8c17afc471 [acast] fix extraction(closes #21444)(closes #27612)(closes #27613) 2021-01-01 17:05:42 +01:00
Remita Amine
40d66e07df [arcpublishing] add missing staticmethod decorator 2020-12-31 23:42:17 +01:00
Remita Amine
ab89a8678b [arcpublishing] Add new extractor
closes #2298
closes #9340
closes #17200
2020-12-31 23:31:07 +01:00
Remita Amine
4d7d056909 [sky] add support for Sports News articles and Brighcove videos(closes #13054) 2020-12-31 12:18:30 +01:00
Remita Amine
c35bc82606 [vvvvid] skip unplayable episodes and extract akamai formats(closes #27599) 2020-12-31 12:18:30 +01:00
Remita Amine
2f56caf083 [yandexvideo] fix extraction for Python 3.4 2020-12-31 12:18:30 +01:00
Sergey M․
4066945919 release 2020.12.31 2020-12-31 05:17:55 +07:00
Sergey M․
2a84694b1e [ChangeLog] Actualize
[ci skip]
2020-12-31 05:14:33 +07:00
Sergey M․
4046ffe1e1 [redditr] Fix review issues and extract source thumbnail (closes #27503) 2020-12-31 05:07:57 +07:00
ozburo
d1d0612160 [redditr] Extract all thumbnails 2020-12-31 05:07:51 +07:00
Remita Amine
7b0f04ed1f [vvvvid] imporove info extraction 2020-12-30 18:16:47 +01:00
nixxo
2e21b06ea2 [vvvvid] add playlists support (#27574)
closes #18130
2020-12-30 18:12:17 +01:00
Remita Amine
a6f75e6e89 [yandexdisk] extract info from webpage
the public API does not return metadata when download limit is reached
2020-12-30 16:45:53 +01:00
Remita Amine
bd18824c2a [yandexdisk] fix extraction(closes #17861)(closes #27131) 2020-12-30 13:43:56 +01:00
Remita Amine
bdd044e67b [yandexvideo] use old api call as fallback 2020-12-30 13:30:11 +01:00
Remita Amine
f7e95fb2a0 [yandexvideo] fix extraction(closes #25000) 2020-12-30 09:30:30 +01:00
Remita Amine
9dd674e1d2 [utils] accept only supported protocols in url_or_none 2020-12-30 09:22:30 +01:00
Remita Amine
9c1e164e0c [YoutubeDL] Allow format filtering using audio language(#16209) 2020-12-29 19:29:08 +01:00
Remita Amine
c706fbe9fe [nbc] Remove CSNNE extractor 2020-12-29 17:21:05 +01:00
Remita Amine
ebdcf70b0d [nbc] fix NBCSport VPlayer URL extraction(closes #16640) 2020-12-29 17:15:13 +01:00
Remita Amine
5966095e65 [aenetworks] fix HistoryPlayerIE tests 2020-12-29 16:59:31 +01:00
Remita Amine
9ee984fc76 [aenetworks] add support for biography.com (closes #3863) 2020-12-29 16:13:36 +01:00
Remita Amine
53528e1d23 [uktvplay] match new video URLs(closes #17909) 2020-12-29 14:11:37 +01:00
Remita Amine
c931c4b8dd [sevenplay] detect API errors 2020-12-29 14:11:37 +01:00
Remita Amine
7acd042bbb [tenplay] fix format extraction(closes #26653) 2020-12-29 14:11:37 +01:00
Remita Amine
bcfe485e01 [brightcove] raise ExtractorError for DRM protected videos(closes #23467)(closes #27568) 2020-12-29 14:11:37 +01:00
Sergey M․
479cc6d5a1 release 2020.12.29 2020-12-29 02:52:31 +07:00
Sergey M․
38286ee729 [ChangeLog] Actualize
[ci skip]
2020-12-29 02:49:53 +07:00
Sergey M․
1a95953867 [youtube] Improve yt initial data extraction (closes #27524) 2020-12-29 02:29:34 +07:00
Sergey M․
71febd1c52 [youtube:tab] Improve URL matching (closes #27559) 2020-12-29 02:19:43 +07:00
Sergey M․
f1bc56c99b [youtube:tab] Restore retry on browse requests (closes #27313, closes #27564) 2020-12-29 02:11:48 +07:00
Remita Amine
64e419bd73 [aparat] Fix extraction
closes #22285
closes #22611
closes #23348
closes #24354
closes #24591
closes #24904
closes #25418
closes #26070
closes #26350
closes #26738
closes #27563
2020-12-28 18:19:30 +01:00
Remita Amine
782ea947b4 [brightcove] remove sonyliv specific code 2020-12-28 11:12:57 +01:00
Remita Amine
f27224d57b [piksel] import format extraction 2020-12-28 10:50:29 +01:00
Remita Amine
c007188598 [zype] Add support for uplynk videos 2020-12-27 23:47:28 +01:00
Remita Amine
af93ecfd88 [toggle] add support for live.mewatch.sg (closes #27555) 2020-12-27 22:26:20 +01:00
JamKage
794771a164 [go] Added support for FXNetworks (#26826)
Co-authored-by: James Kirrage <james.kirrage@mortgagegym.com>

closes #13972
closes #22467
closes #23754
2020-12-27 17:36:21 +00:00
Sergey M․
6f2eaaf73d [teachable] Improve embed detection (closes #26923) 2020-12-27 22:57:50 +07:00
Remita Amine
4c7a4dbc4d [mitele] fix free video extraction(#24624)(closes #25827)(closes #26757) 2020-12-27 16:22:43 +01:00
Remita Amine
f86b299d0e [telecinco] fix extraction 2020-12-27 16:22:43 +01:00
Sergey M
e474996541 [youtube] Update invidious.snopyta.org (#22667)
Co-authored-by: sofutru <54445344+sofutru@users.noreply.github.com>
2020-12-27 21:15:09 +07:00
Remita Amine
aed617e311 [amcnetworks] improve auth only video detection(closes #27548) 2020-12-27 09:00:08 +01:00
Remita Amine
0fa67c1d68 [generic] Add support for VHX Embeds(#27546) 2020-12-27 09:00:07 +01:00
Sergey M․
365b3cc72d release 2020.12.26 2020-12-26 23:17:35 +07:00
Sergey M․
a272fe21a8 [ChangeLog] Actualize
[ci skip]
2020-12-26 23:13:26 +07:00
Sergey M․
cec1c2f211 [instagram] Fix test 2020-12-26 23:00:15 +07:00
Sergey M․
12053450dc [instagram] Fix comment count extraction 2020-12-26 23:00:15 +07:00
Sergey M․
46cffb0c47 [instagram] Add support for reel URLs (closes #26234, closes #26250) 2020-12-26 23:00:15 +07:00
Remita Amine
c32a059f52 [bbc] switch to media selector v6
closes #23232
closes #23933
closes #26303
closes #26432
closes #26821
closes #27538
2020-12-26 16:57:02 +01:00
Sergey M․
6911312e53 [instagram] Improve thumbnail extraction 2020-12-26 22:42:58 +07:00
Sergey M․
f22b5a6b96 [instagram] Improve extraction (closes #22880) 2020-12-26 22:37:41 +07:00
Andrew Udvare
58e55198c1 [instagram] Fix extraction when authenticated (closes #27422) 2020-12-26 22:31:55 +07:00
Sergey M․
d61ed9f2f1 [spankbang] Remove unused import 2020-12-26 22:14:31 +07:00
Sergey M․
8bc4c6350e [spangbang:playlist] Fix extraction (closes #24087) 2020-12-26 21:58:26 +07:00
Sergey M․
cfa4ffa23b [spangbang] Add support for playlist videos 2020-12-26 21:55:12 +07:00
Sergey M․
4f1dc1463d [pornhub] Improve like and dislike count extraction (closes #27356) 2020-12-26 21:24:43 +07:00
Sergey M․
17e0f41d34 [pornhub] Fix review issues (closes #27393) 2020-12-26 21:17:17 +07:00
JChris246
b57b27ff8f [pornhub] Fix lq formats extraction (closes #27386) 2020-12-26 21:17:11 +07:00
Marco Fantauzzo
bbe8cc6662 [README.md] Update reference to cookie export extension for Chrome (closes #26885) (#27433)
The cookies.txt extension doesn't exist anymore on the Chrome Web Store (see https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg), so I propose to change the link in the README.md to another similar extension called Get cookies.txt (https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/) with the same functions and utility of the old one.

This PR close #26885
2020-12-26 20:50:39 +07:00
Sergey M․
98106accb6 [bongacams] Add extractor (closes #27440) 2020-12-26 20:30:19 +07:00
Sergey M․
af1312bfc3 [youtube:tab] Extend _VALID_URL (closes #27501) 2020-12-26 19:59:57 +07:00
Remita Amine
4c7d7215cd [theweatherchannel] fix extraction (closes #25930)(closes #26051) 2020-12-26 09:37:40 +01:00
Remita Amine
0370d9eb3d [sprout] correct typo 2020-12-25 21:35:55 +01:00
Remita Amine
1434651d20 [sprout] Add support for Universal Kids (closes #22518) 2020-12-25 21:17:44 +01:00
Remita Amine
2c312ab84a [theplatform] allow passing geo bypass countries from other extractors 2020-12-25 21:16:36 +01:00
Remita Amine
0ee78d62d5 [wistia] Add support for playlists (closes #27533) 2020-12-25 17:00:30 +01:00
Remita Amine
7f3c90ab25 [ctv] Add new extractor (closes #27525) 2020-12-24 19:40:08 +01:00
Remita Amine
1d3cd29730 [9c9media] improve info extraction 2020-12-24 17:39:13 +01:00
Remita Amine
4ef1fc9707 [youtube] fix automatic captions extraction(closes #27162)(closes #27388) 2020-12-24 16:05:03 +01:00
Remita Amine
f9e6aa1dcf [sonyliv] fix title for movies 2020-12-24 13:33:12 +01:00
Remita Amine
f83db9064b [sonyliv] fix extraction(closes #25667) 2020-12-24 13:10:20 +01:00
Remita Amine
2da9a86399 [streetvoice] fix extraction(closes #27455)(closes #27492) 2020-12-24 13:10:20 +01:00
Remita Amine
ecaa535cf4 [facebook] add support for watchparty pages(closes #27507) 2020-12-24 13:10:20 +01:00
Remita Amine
79dd92b1fe [cbslocal] fix video extraction 2020-12-24 13:10:20 +01:00
Remita Amine
bd3844c9c2 [brightcove] add another method to extract policyKey 2020-12-24 13:10:20 +01:00
Sergey M․
7bf5e3a84a [mewatch] Relax _VALID_URL (closes #27506) 2020-12-22 23:27:05 +07:00
Sergey M․
08a17dae5b release 2020.12.22 2020-12-22 04:48:07 +07:00
Sergey M․
924ea66ade [ChangeLog] Actualize
[ci skip]
2020-12-22 04:45:37 +07:00
Remita Amine
5b72f5b74f [anvato] remove NFLTokenGenerator
until a better solution is introduced that:
- works with lazy_extractors
- allows for 3rd party token generators
2020-12-21 09:02:45 +01:00
Remita Amine
bfa345744d [tastytrade] Remove Extractor(closes #25716)
covered by GenericIE via BrighcoveNewIE
2020-12-20 18:07:34 +01:00
Remita Amine
f966461476 [niconico] fix playlist extraction(closes #27428) 2020-12-20 17:15:43 +01:00
Remita Amine
b8aea53682 [everyonesmixtape] Remove Extractor 2020-12-20 17:10:40 +01:00
Remita Amine
c0d9eb7043 [kanalplay] Remove Extractor 2020-12-20 12:06:17 +01:00
Remita Amine
3ba6aabd25 [arkena] fix extraction 2020-12-20 12:06:17 +01:00
Sergey M․
a8b31505ed Switch to GitHub actions for CI
Travis CI has ignored our requests and does not look to be interested in providing OSS credits for youtube-dl
2020-12-20 06:48:20 +07:00
Remita Amine
90a271e914 [nba] rewrite extractor 2020-12-19 20:14:44 +01:00
Remita Amine
172d270607 [turner] improve info extraction 2020-12-19 20:14:44 +01:00
Remita Amine
22feed08a1 [common] remove unwanted query params from unsigned akamai manifest URLs 2020-12-19 20:14:44 +01:00
Sergey M․
942b8ca3be [youtube] Improve xsrf token extraction (closes #27442) 2020-12-20 00:48:44 +07:00
Sergey M․
3729c52f9d [generic] Improve RSS age limit extraction 2020-12-19 23:24:52 +07:00
renalid
71679eaee8 [generic] Fix RSS itunes thumbnail extraction (#27405) 2020-12-19 23:18:51 +07:00
Trevor Nelson
76fe4ba3b2 [redditr] Extract duration (#27426) 2020-12-17 23:21:36 +07:00
Remita Amine
164a4a5756 [anvato] Disable NFLTokenGenerator(closes #27449) 2020-12-16 21:47:50 +01:00
Remita Amine
455951985b [zaq1] Remove extractor 2020-12-16 21:20:57 +01:00
Remita Amine
c29500e412 [asiancrush] fix extraction and add support for retrocrush.tv
closes #25577
closes #25829
2020-12-16 12:30:41 +01:00
Remita Amine
1bc1520adc [noco] Remove Extractor(closes #10864) 2020-12-15 19:07:14 +01:00
Remita Amine
022e05dc1f [nfl] fix extraction(closes #22245) 2020-12-15 16:06:10 +01:00
Remita Amine
b34c9551aa [anvato] update ANVACK table and add experimental token generator for NFL 2020-12-15 16:01:33 +01:00
Remita Amine
84f19c026f [sky] relax SkySports URL regex (closes #27435) 2020-12-14 19:01:16 +01:00
Remita Amine
6bde5492b6 [tv5unis] Add new extractor(closes #22399)(closes #24890) 2020-12-14 18:47:00 +01:00
Remita Amine
6086df4d6a [videomore] add support more.tv (closes #27088) 2020-12-14 15:00:02 +01:00
Sergey M․
c98052c5da [yandexmusic:album] Simplify 2020-12-14 12:52:25 +07:00
Sergey M․
ab62bc5838 [yandexmusic] Add support for music.yandex.com (closes #27425) 2020-12-14 12:48:12 +07:00
Sergey M․
bc87ba8424 [yandexmusic] DRY _VALID_URL base 2020-12-14 12:45:56 +07:00
Remita Amine
b79df1b68d [nhk:program] Add support for audio programs and program clips 2020-12-13 22:04:44 +01:00
Matthew Rayermann
2797c7be45 [nhk] Add support for NHK video programs (#27230) 2020-12-13 20:57:19 +00:00
Sergey M․
755f186e21 release 2020.12.14 2020-12-14 00:57:11 +07:00
Sergey M․
2240a1dc4d [ChangeLog] Actualize
[ci skip]
2020-12-14 00:55:04 +07:00
Sergey M․
03d3af9768 [test_InfoExtractor] PEP 8 2020-12-13 23:47:13 +07:00
Sergey M․
5ce9527e16 [mdr] Bypass geo restriction 2020-12-13 23:43:16 +07:00
Sergey M․
c527f5ada0 [mdr] Improve extraction (closes #24346, closes #26873) 2020-12-13 23:42:43 +07:00
Sergey M․
ace52668f0 [PULL_REQUEST_TEMPLATE.md] Add tests requirement and make coding conventions requirement more explicit 2020-12-13 22:46:17 +07:00
Sergey M․
9c33eb027e [yandexmusic:album] Improve album title extraction (closes #27418) 2020-12-13 22:18:40 +07:00
Sergey M․
679b711395 [eporner] Fix view count extraction and make optional (closes #23306) 2020-12-13 20:27:08 +07:00
Sergey M․
1727541315 [extractor/common] Improve JSON-LD interaction statistic extraction (refs #23306) 2020-12-13 20:24:13 +07:00
Sergey M․
45b0a0d11b [eporner] Fix embed test URL 2020-12-13 19:57:11 +07:00
spvkgn
e665fcd4da [eporner] Fix hash extraction and extend _VALID_URL (#27396)
Co-authored-by: Sergey M <dstftw@gmail.com>
2020-12-13 19:56:28 +07:00
Remita Amine
aae737d4af [slideslive] use m3u8 entry protocol for m3u8 formats(closes #27400) 2020-12-13 12:43:21 +01:00
Remita Amine
92a6de861e [downloader/hls] delegate manifests with media initialization to ffmpeg 2020-12-13 12:40:10 +01:00
Remita Amine
5ff881aee6 [twitcasting] fix format extraction and improve info extraction(closes #24868) 2020-12-13 12:13:58 +01:00
Sergey M․
eae19a4473 [extractor/common] Document duration meta field for playlists 2020-12-13 16:53:23 +07:00
Sergey M․
f70c263ce5 [linuxacademy] Fix authentication and extraction (closes #21129, closes #26223, closes #27402) 2020-12-13 16:52:36 +07:00
Remita Amine
92d135921f [itv] clean description from HTML tags (closes #27399) 2020-12-13 09:51:03 +01:00
Sergey M․
d8008dee4f [vlive] Sort live formats (closes #27404) 2020-12-13 15:04:18 +07:00
Remita Amine
bb38a12157 [hotstart] fix and improve extraction
- fix format extraction (closes #26690)
- extract thumbnail URL (closes #16079, closes #20412)
- support country specific playlist URLs (closes #23496)
- select the last id in video URL (closes #26412)
2020-12-12 23:07:52 +01:00
toniz4
bcc8ef0a5a [youtube] Add some invidious instances (#27373)
Co-authored-by: Cássio <heyitscassio@cock.li>
2020-12-13 01:27:56 +07:00
Sergey M․
3cb1a5dc73 release 2020.12.12 2020-12-12 07:09:59 +07:00
Sergey M․
ed40c02c9b [ChangeLog] Actualize
[ci skip]
2020-12-12 07:04:28 +07:00
Sergey M․
b5fdceb4d1 [ruutu] Extract more metadata and detect non-free videos (closes #21154) 2020-12-12 06:49:43 +07:00
Sergey M․
1e6f7f3b45 [ruutu] Authenticate format URLs (closes #21031, closes #26782) 2020-12-12 05:57:14 +07:00
Sergey M․
469ff02f4e [ruutu] Add support for static.nelonenmedia.fi (closes #25412) 2020-12-12 05:10:54 +07:00
Sergey M․
56f1c5ad38 [ruutu] Extend _VALID_URL (closes #24839) 2020-12-12 05:10:54 +07:00
Remita Amine
3a6b8f4edf [facebook] Add support archived live video URLs(closes #15859) 2020-12-11 23:09:42 +01:00
Sergey M․
3ae02d4a62 [wdr] Extent subtitles extraction and improve overall extraction (closes #22672, closes #22723) 2020-12-12 04:52:35 +07:00
Remita Amine
59fea63dc2 [facebook] add support for videos attached to Relay based story pages(#10795) 2020-12-11 22:44:18 +01:00
Sergey M․
22933e02d4 [wdr:page] Add support for kinder.wdr.de (closes #27350) 2020-12-12 04:27:25 +07:00
Remita Amine
98a62d7cbd [facebook] Add another regex for handleServerJS 2020-12-11 21:50:40 +01:00
Remita Amine
d6c5fdef34 [facebook] fix embed page extraction 2020-12-11 21:27:47 +01:00
compujo
4b9051cf39 [YoutubeDL] Improve thumbnails' filenames deducing (closes #26010) (#27244) 2020-12-12 03:12:31 +07:00
Remita Amine
00f5068908 [facebook] add support for Relay post pages(closes #26935) 2020-12-11 17:49:22 +01:00
Remita Amine
432c6b0f48 [facebook] proper support for watch videos(closes #22795)(#27062) 2020-12-11 16:05:47 +01:00
Remita Amine
ad5e5788ff Revert "[facebook] add support for watch videos(closes #22795)"
This reverts commit dc65041c22.
2020-12-11 15:54:24 +01:00
Remita Amine
dc65041c22 [facebook] add support for watch videos(closes #22795) 2020-12-11 13:54:22 +01:00
Remita Amine
9f88b07945 [facebook] add support for group posts with multiple videos(closes #19131) 2020-12-11 12:04:02 +01:00
Remita Amine
225646c4ca [itv] remove old extractio method and fix series metadata extraction
closes #23177
closes #26897
2020-12-10 22:28:44 +01:00
Remita Amine
5c6e9f0b6c [facebook] redirect Mobile URLs to Desktop URLs
closes #24831
closes #25624
2020-12-10 15:19:43 +01:00
Remita Amine
6c370bc149 [facebook] Add support for Relay based pages(closes #26823) 2020-12-10 14:52:35 +01:00
Remita Amine
37fd242342 [facebook] try to reduce unessessary tahoe requests 2020-12-10 12:10:33 +01:00
Remita Amine
aee1f87168 [facebook] remove hardcoded chrome user-agent
closes #18974
closes #25411
closes #26958
closes #27329
2020-12-09 23:54:15 +01:00
Andrey Smirnoff
b69bb1ed11 [smotri] Remove extractor (#27358) 2020-12-10 02:01:23 +07:00
Sergey M․
772cefef8c [ChangeLog] Actualize
[ci skip]
2020-12-10 00:12:59 +07:00
谭九鼎
842654b6d0 [README.md] Update travis CI tld (#27354) 2020-12-09 21:50:42 +07:00
Remita Amine
df5e50954b [beampro] Remove Extractor
closes #17290
closes #22871
closes #23020
closes #23061
closes #26099
2020-12-09 10:41:08 +01:00
Sergey M․
a4a2fa8754 release 2020.12.09 2020-12-09 04:25:24 +07:00
Sergey M․
9da0504a09 [ChangeLog] Actualize
[ci skip]
2020-12-09 04:24:31 +07:00
Sergey M․
470cf496f5 [youtube:tab] Improve identity token extraction (closes #27197) 2020-12-09 04:17:47 +07:00
Sergey M․
e029da9add [youtube:tab] Make click tracking params on continuation optional 2020-12-09 04:06:44 +07:00
Sergey M․
e00b8f60d4 [youtube:tab] Delegate inline playlists to tab-based playlists (closes #27298) 2020-12-09 03:40:02 +07:00
EntranceJew
644c3ef886 [tubitv] Extract release year (#27317) 2020-12-09 01:14:46 +07:00
Soneé John
9d8d0f8b4a [youtube] Remove unused code (#27343) 2020-12-09 00:41:47 +07:00
Sergey M․
5a1fbbf8b7 [extractor/common] Fix inline HTML5 media tags processing and add test (closes #27345) 2020-12-09 00:05:21 +07:00
Remita Amine
e2bdf8bf4f [amcnetworks] Fix free content extraction(closes #20354) 2020-12-08 17:11:28 +01:00
Remita Amine
c368dc98e0 [lbry] add support for channel extraction(closes #25584) 2020-12-08 14:53:22 +01:00
Remita Amine
e7eff914cd [lbry] add support for short and embed URLs and fix channel metadata extraction 2020-12-08 10:52:52 +01:00
Remita Amine
07333d0062 [telequebec] Fix Extraction and Add Support for video.telequebec.tv
closes #25733
closes #26883
closes #27339
2020-12-08 00:35:47 +01:00
Sergey M․
5bd7ad2e81 [youtube:tab] Capture and output alerts (closes #27340) 2020-12-08 01:12:00 +07:00
Remita Amine
3ded751985 [generic] comment a test covered now by AmericasTestKitchenIE 2020-12-07 15:27:13 +01:00
Remita Amine
6956db3606 [tvplay:home] Fix extraction(closes #21153) 2020-12-07 15:12:54 +01:00
Remita Amine
17b01228f8 [americastestkitchen] Fix Extraction and add support for Cook's Country and Cook's Illustrated
closes #17234
closes #27322
2020-12-06 23:41:18 +01:00
Sergey M․
4f1ecca58d [slideslive] Add support for yoda service videos and extract subtitles (closes #27323) 2020-12-07 03:27:21 +07:00
Sergey M․
2717036489 release 2020.12.07 2020-12-07 02:03:34 +07:00
Sergey M․
d9482c0083 [ChangeLog] Actualize
[ci skip]
2020-12-07 02:01:53 +07:00
Sergey M․
791b743765 [extractor/generic] Remove unused import 2020-12-07 02:01:47 +07:00
Remita Amine
fa604d9083 [aenetworks] Fix extraction
- Fix Fastly format extraction
- Add support for play and watch subdomains
- Extract series metadata

closes #23363
closes #23390
closes #26795
closes #26985
2020-12-06 19:50:05 +01:00
Sergey M․
2bf0634d16 [youtube] Improve youtu.be extraction in non-existing playlists (closes #27324) 2020-12-07 01:48:10 +07:00
Sergey M․
dccf4932e1 [extractor/common] Extract timestamp from Last-Modified header 2020-12-07 01:15:30 +07:00
Sergey M․
91dd25fe1e [extractor/common] Add support for dl8-* media tags (closes #27283) 2020-12-07 01:08:22 +07:00
Sergey M․
06bf2ac20f [extractor/common] Eliminate media tag name regex duplication 2020-12-07 00:56:29 +07:00
Sergey M․
6ad0d8781e [extractor/common] Fix media type extraction for HTML5 media tags in start/end form 2020-12-07 00:45:16 +07:00
Sergey M․
f2c704e112 [generic] Extract RSS video itunes metadata 2020-12-06 23:08:03 +07:00
Sergey M․
5e822c2526 [generic] Extract RSS video timestamp 2020-12-06 22:30:30 +07:00
renalid
cc017e07ca [generic] Extract RSS video description (#27177) 2020-12-06 22:12:25 +07:00
Remita Amine
082da36416 [nrk] reduce requests for Radio series 2020-12-05 16:24:49 +01:00
Remita Amine
6bf95b15ee [nrk] reduce the number of instalments requests 2020-12-05 15:35:29 +01:00
Remita Amine
4c93b2fd15 [nrk] improve format extraction 2020-12-05 09:13:42 +01:00
Remita Amine
1b26bfd425 [nrk] improve extraction
- improve format extraction for old akamai formats
- update some of the tests
- add is_live value to entry info dict
- request instalments only when their available
- fix skole extraction
2020-12-05 00:40:03 +01:00
Sergey M․
13ec444a98 [peertube] Extract fps 2020-12-05 04:45:30 +07:00
Sergey M․
51579d87e4 [peertube] Recognize audio-only formats (closes #27295) 2020-12-05 04:41:33 +07:00
Sergey M․
e147619669 release 2020.12.05 2020-12-05 03:40:00 +07:00
Sergey M․
e7f93fbd85 [ChangeLog] Actualize
[ci skip]
2020-12-05 03:31:07 +07:00
Sergey M․
58f7ada235 [teachable:course] Improve extraction (closes #24507, closes #27286) 2020-12-05 03:24:56 +07:00
Sergey M․
c67b33888f [nrk] Improve error extraction 2020-12-05 03:00:10 +07:00
Sergey M․
e8c0af04b7 [nrktv] Relax _VALID_URL 2020-12-05 02:46:28 +07:00
Sergey M․
5d769860c3 [nrktv:series] Improve extraction (closes #21926) 2020-12-05 02:43:33 +07:00
Sergey M․
02b04785ee [nrktv:series] Improve extraction 2020-12-05 02:21:07 +07:00
Sergey M․
41c92b8d02 [nrktv:season] Improve extraction 2020-12-05 01:20:14 +07:00
Remita Amine
fe0c28f956 [nrk] fix call to moved method 2020-12-04 18:08:08 +01:00
Remita Amine
957c65b9ea [nrk] fix typo 2020-12-04 18:05:27 +01:00
Remita Amine
5e95e18ce9 [nrk] improve format extraction and geo-restriction detection (closes #24221) 2020-12-04 18:04:38 +01:00
Sergey M․
e91df0c550 [pornhub] Handle HTTP errors gracefully (closes #26414) 2020-12-04 23:56:50 +07:00
Sergey M․
c5636e9bca [nrktv] Relax _VALID_URL (closes #27299, closes #26185) 2020-12-04 23:38:42 +07:00
Remita Amine
2e47264235 [zdf] extract webm formats(closes #26659) 2020-12-04 14:16:32 +01:00
Sergey M․
1c78cb118c [travis] Disable download jobs
Until youtube-dl OSS approval by Travis or moving to GitHub Actions
[ci skip]
2020-12-03 23:25:36 +07:00
Matthew Rayermann
beab2f88c9 [nhk] Add audio clip test to NHK extractor (#27269) 2020-12-03 07:47:56 +00:00
Remita Amine
0025447369 [gamespot] Extract DASH and HTTP formats 2020-12-03 00:35:47 +01:00
Remita Amine
da4304609d [extractor/commons] improve Akamai HTTP formats extraction 2020-12-03 00:33:55 +01:00
Remita Amine
ea89680aea [tver] correct episode_number key 2020-12-02 22:49:51 +01:00
Remita Amine
664dd8ba85 [extractor/common] improve Akamai HTTP format extraction
- Allow m3u8 manifest without an additional audio format
- Fix extraction for qualities starting with a number
Solution provided by @nixxo based on: https://stackoverflow.com/a/5984688
2020-12-02 21:49:09 +01:00
Remita Amine
64554c12e1 [tver] Add new extractor (closes #26662)(closes #27284) 2020-12-02 21:49:09 +01:00
opusforlife2
4ded9c0f00 [pornhub] Add support for pornhub.org (#27276)
Most ISPs block the other two TLDs through deep packet inspection
2020-12-03 01:30:08 +07:00
Remita Amine
c0820dd52a [extractors] Add QubIE import 2020-12-02 00:41:21 +01:00
Sergey M․
2bb70750a9 release 2020.12.02 2020-12-02 01:37:40 +07:00
Sergey M․
09d923f2eb [ChangeLog] Actualize
[ci skip]
2020-12-02 01:22:43 +07:00
Remita Amine
37d979ad33 [tva] Add support for qub.ca (closes #27235) 2020-12-01 12:25:02 +01:00
Remita Amine
95ac4de229 [toggle] Detect DRM protected videos (closes #16479)(closes #20805) 2020-12-01 10:38:53 +01:00
Remita Amine
d3e142b3fa [toggle] Add support for new MeWatch URLs (closes #27256) 2020-12-01 10:13:06 +01:00
Sergey M․
132aece1ed [youtube:tab] Extract channels only from channels tab (closes #27266) 2020-12-01 04:44:10 +07:00
Sergey M․
3e4e338133 [cspan] Extract info from jwplayer data (closes #3672, closes #3734, closes #10638, closes #13030, closes #18806, closes #23148, closes #24461, closes #26171, closes #26800, closes #27263) 2020-12-01 01:53:12 +07:00
Roman Beránek
be19ae11fd [cspan] Pass Referer header with format's video URL (#26032) (closes #25729) 2020-12-01 01:14:29 +07:00
Sergey M․
59d63d8d4a [youtube] Improve age-gated videos extraction (closes #27259) 2020-12-01 00:49:26 +07:00
Remita Amine
cfeba5d17f [mediaset] add support for movie URLs(closes #27240) 2020-11-30 09:46:15 +01:00
Sergey M․
6da0e5e7a2 [yandexmusic] Refactor and add support for artist's tracks and albums (closes #11887, closes #22284) 2020-11-30 00:25:06 +07:00
Sergey M․
d6ce649f15 [yandexmusic:track] Fix extraction (closes #26449, closes #26669, closes #26747, closes #26748, closes #26762) 2020-11-29 22:15:51 +07:00
Sergey M․
b449b73dcc release 2020.11.29 2020-11-29 13:53:01 +07:00
Sergey M․
16c822e91e [ChangeLog] Actualize
[ci skip]
2020-11-29 13:49:12 +07:00
Michael Munch
4318170779 [drtv] Extend _VALID_URL (#27243) 2020-11-29 13:44:36 +07:00
Sergey M․
fb626c0586 [tiktok] Fix extraction (closes #20809, closes #22838, closes #22850, closes #25987, closes #26281, closes #26411, closes #26639, closes #26776, closes #27237) 2020-11-29 08:09:20 +07:00
bopol
717d1d2d5a [ina] Add support for mobile URLs (#27229) 2020-11-29 04:15:53 +07:00
Sergey M․
9585b376db [YoutubeDL] Write static debug to stderr and respect quiet for dynamic debug (closes #14579, closes #22593)
TODO: logging and verbosity needs major refactoring (refs #10894)
2020-11-29 04:04:06 +07:00
JChris246
f04cfe24e0 [pornhub] Fix like and dislike count extraction (closes #27227) (#27234) 2020-11-29 02:32:13 +07:00
Sergey M․
20c50c6556 [youtube] Improve yt initial player response extraction (closes #27216) 2020-11-28 15:02:31 +07:00
Remita Amine
f9f9699f2f [videa] improve extraction 2020-11-26 12:56:49 +01:00
Adrian Heine né Lang
a3cf22e590 [videa] Adapt to updates (#26301)
closes #25973, closes #25650.
2020-11-26 11:55:06 +00:00
Remita Amine
99de2f38d3 [spreaker] fix SpreakerShowIE test URL 2020-11-25 21:39:17 +01:00
Sergey M․
9fe50837c3 release 2020.11.26 2020-11-26 03:05:51 +07:00
Sergey M․
4dc545553f [ChangeLog] Actualize
[ci skip]
2020-11-26 03:03:51 +07:00
Sergey M․
686e898fde [spreaker] Add extractor (closes #13480, closes #13877) 2020-11-26 02:58:48 +07:00
Remita Amine
3a78198a96 [vlive] improve extraction for geo-restricted videos 2020-11-25 11:40:37 +01:00
Remita Amine
836c810716 [vlive] Add support for post URLs(closes #27122)(closes #27123) 2020-11-25 11:26:26 +01:00
Remita Amine
97c5be383c [viki] fix video API request(closes #27184) 2020-11-24 23:25:03 +01:00
Remita Amine
a7ea88537a [bbc] fix BBC Three clip extraction 2020-11-24 22:54:08 +01:00
Remita Amine
6c35de4c6b [bbc] fix BBC News videos extraction 2020-11-24 22:49:04 +01:00
Remita Amine
579d43951d [medaltv] improve extraction 2020-11-24 18:29:46 +01:00
Joshua Lochner
e24ebeafd1 [medaltv] Add new extractor (#27149) 2020-11-24 18:27:33 +01:00
Remita Amine
01c92973dd [youtube] imporve music metadata and license extraction(closes #26013) 2020-11-23 22:44:59 +01:00
Sergey M․
f4415faa46 [downloader/fragment] Set final file's mtime according to last fragment's Last-Modified header (closes #11718, closes #18384, closes #27138) 2020-11-24 04:17:24 +07:00
Sergey M․
a86ce9d7a1 [nrk] Fix extraction 2020-11-24 02:59:49 +07:00
Sergey M․
37258c644f [cda] Fix extraction (closes #17803, closes #24458, closes #24518, closes #26381) 2020-11-24 02:18:40 +07:00
Sergey M․
d0512ac4c5 release 2020.11.24 2020-11-24 00:23:14 +07:00
Sergey M․
da4eaa15a2 [ChangeLog] Actualize
[ci skip]
2020-11-24 00:20:42 +07:00
Sergey M․
191286265d [youtube:tab] Fix feeds extraction (closes #25695, closes #26452) 2020-11-24 00:10:25 +07:00
Sergey M․
3234272818 [youtube:tab] PEP 8 2020-11-23 22:34:27 +07:00
Sergey M․
9d2c90354f [youtube:favorites] Restore extractor 2020-11-23 22:33:21 +07:00
Sergey M․
316b10855a [youtube:tab] Fix some weird typo (closes #27157) 2020-11-23 22:19:25 +07:00
Remita Amine
484fe78737 [pinterest] Add support for large collections(more than 25 pins) 2020-11-23 14:16:38 +01:00
Remita Amine
2cd43a00d1 [franceinter] flake8 2020-11-22 19:38:45 +01:00
renalid
dd0f524c69 [franceinter] add thumbnail url (#27153)
Co-authored-by: remitamine <remitamine@gmail.com>
2020-11-22 19:35:53 +01:00
Remita Amine
c84f9475b8 [box] Add new extractor(#5949) 2020-11-22 17:39:41 +01:00
Jia Rong Yee
15f2734791 [nytimes] Add new cooking.nytimes.com extractor (#27143)
* [nytimes] support cooking.nytimes.com, resolves #27112

Co-authored-by: remitamine <remitamine@gmail.com>
2020-11-22 14:12:47 +01:00
Remita Amine
cb6e24f946 [lbry] relax _VALID_URL regex(closes #27144) 2020-11-22 13:16:03 +01:00
Remita Amine
9d531aa291 [rumble] add support for embed pages(#10785) 2020-11-22 13:07:04 +01:00
Remita Amine
e9cbb98a0f [skyit] add support for multiple Sky Italia websites(closes #26629) 2020-11-22 13:01:56 +01:00
Remita Amine
193422e12a [extractor/common] add generic support for akamai http format extraction 2020-11-22 12:54:55 +01:00
Sergey M․
c4cabf040e [pinterest] Add extractor (closes #25747) 2020-11-22 05:04:01 +07:00
Sergey M․
f23eceebbf release 2020.11.21.1 2020-11-21 23:59:11 +07:00
Sergey M․
650bd8f623 [ChangeLog] Actualize
[ci skip]
2020-11-21 23:56:32 +07:00
Sergey M․
5d8cb4367d release 2020.11.21 2020-11-21 23:54:33 +07:00
Sergey M․
82abc13aed [youtube:tab] Comment out test 2020-11-21 23:41:49 +07:00
Sergey M․
1e72660c9b [svtplay] Fix test title 2020-11-21 23:36:25 +07:00
Sergey M․
049f224248 [svtplay] Add support for svt.se/barnkanalen (closes #24817) 2020-11-21 23:35:32 +07:00
Mattias Wadman
0ada1b90b8 [svt] Extract timestamp and thumbnail in more cases (#27130)
Add timestamp, set to "valid from" which i think could been seen as publish time.
Add thumbnail in more cases, seems to was only done in the embedded data case for some reason.
Switch svtplay test url to an existing video and also one with no expire date.
Also add an additional thumbnail url test regex.
2020-11-21 23:24:37 +07:00
Josh Soref
71ddc222ad Fix typos (#27084)
* spelling: authorization

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: brightcove

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: creation

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: exceeded

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: exception

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: extension

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: extracting

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: extraction

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: frontline

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: improve

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: length

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: listsubtitles

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: multimedia

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: obfuscated

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: partitioning

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: playlist

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: playlists

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: restriction

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: services

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: split

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: srmediathek

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: support

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: thumbnail

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: verification

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: whitespaces

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-11-21 22:00:05 +07:00
Daniel Peukert
21292c0649 [youtube] Fix error reason extraction (#27081) 2020-11-21 21:52:20 +07:00
Sergey M․
46a265a2da [youtube] Fix like and dislike count extraction (closes #25977) 2020-11-21 20:49:56 +07:00
Sergey M․
e2096776b9 [youtube:tab] Add support for current video and fix lives extraction (closes #27126) 2020-11-21 20:48:13 +07:00
Remita Amine
a1c88c4819 [infoq] fix format extraction(closes #25984) 2020-11-20 23:23:55 +01:00
renalid
51ca93d751 [francetv] Update to fix thumbnail URL issue (#27120)
Fix the thumbnail URL. The issue was here for many years, never fixed. It's done ! :-)

Example : https://www.france.tv/france-2/de-gaulle-l-eclat-et-le-secret/de-gaulle-l-eclat-et-le-secret-saison-1/2035247-solitude.html

failed thumbnail url generated : http://pluzz.francetv.fr/staticftv/ref_emissions/2020-11-02/EMI_1104da66f533cc7dc5d0d07a181a18c2e2fe1d81_20201014122553940.jpg

right thumbnail url fixed : https://sivideo.webservices.francetelevisions.fr/staticftv/ref_emissions/2020-11-02/EMI_1104da66f533cc7dc5d0d07a181a18c2e2fe1d81_20201014122553940.jpg
2020-11-20 22:44:08 +01:00
Sergey M․
a7e0531999 [downloader/http] Fix crash during urlopen caused by missing reason of URLError 2020-11-21 04:22:21 +07:00
Sergey M․
ab0eda99e1 [YoutubeDL] Fix --ignore-errors for playlists with generator-based entries of url_transparent (closes #27064) 2020-11-21 04:00:08 +07:00
Sergey M․
ec99f47108 [youtube:tab] Replace some test URLs with RIAA-friendly ones 2020-11-20 23:34:46 +07:00
Sergey M․
b31b5f4434 [youtube] Improve yt initial data extraction (closes #27093) 2020-11-20 23:21:52 +07:00
Remita Amine
86f2fa1590 [discoverynetworks] add support new TLC/DMAX URLs(closes #27100) 2020-11-20 10:47:52 +01:00
Remita Amine
7bc7fbce23 [rai] fix protocol relative relinker URLs(closes #22766) 2020-11-20 10:26:55 +01:00
Remita Amine
a78e530c14 [rai] fix unavailable video format detection 2020-11-20 10:10:57 +01:00
Remita Amine
af7bb684c0 [rai] improve extraction 2020-11-20 10:01:56 +01:00
Leonardo Taccari
dd9e0f58f3 [rai] Fix extraction for recent raiplay.it updates (#27077)
- Remove first test of RaiPlayIE: it is no longer available
- Make RaiPlayIE extension-agnostic (passing possible `.json' URLs is now
  supported too)
- Adjust RaiPlayLiveIE to recent raiplay.it updates.  Passing it as
  `url_transparent' is no longer supported (there is no longer an accessible
  ContentItem)
- Adjust RaiPlayPlaylistIE to recent raiplay.it updates and instruct it about
  ContentSet-s.
- Update a RaiIE test and remove two tests that are no longer availables

Thanks to @remitamine for the review!
2020-11-20 10:00:05 +01:00
Remita Amine
59e583f7e8 [viki] improve format extraction 2020-11-19 22:49:28 +01:00
beefchop
daa25d4142 [viki] fix stream extraction from mpd (#27092)
Co-authored-by: beefchop <beefchop@users.noreply.github.com>
2020-11-19 21:38:09 +01:00
Remita Amine
25a35cb38a [googledrive] fix format extraction(closes #26979) 2020-11-19 20:01:24 +01:00
Remita Amine
2cf8003638 [amara] improve extraction 2020-11-19 17:29:30 +01:00
Joost Verdoorn
cf1a8668e8 [Amara] Add new extractor (#20618)
* [Amara] Add new extractor
2020-11-19 17:26:53 +01:00
Remita Amine
2dbb45ae82 [vimeo:album] fix extraction(closes #27079) 2020-11-19 13:13:33 +01:00
Remita Amine
4fe190df70 [mtv] fix mgid extraction(closes #26841) 2020-11-19 11:57:09 +01:00
Sergey M․
039e715b30 release 2020.11.19 2020-11-19 05:22:27 +07:00
Sergey M․
32152bab7a [ChangeLog] Actualize
[ci skip]
2020-11-19 05:21:09 +07:00
Sergey M․
b1347a5881 [youporn] Fix upload date extraction and make comment count optional (closes #26986) 2020-11-19 05:16:25 +07:00
Sergey M․
91e954587f [arte] Rework extractors
* Reimplement embed and playlist extractors to delegate to the single entrypoint artetv extractor
  Beware reluctant download archive extractor keys breakage.
* Improve embeds detection (closes #27057)
- Remove obsolete code
2020-11-19 05:02:04 +07:00
Sergey M․
9b505185da [arte] Extract m3u8 formats (closes #27061) 2020-11-19 03:26:49 +07:00
Remita Amine
444a68e0ec [mgtv] fix format extraction(closes #26415) 2020-11-18 21:07:00 +01:00
Sergey M․
c7178f0f7a [extractor/common] Output error for invalid URLs in _is_valid_url (refs #21400, refs #24151, refs #25617, refs #25618, refs #25586, refs #26068, refs #27072) 2020-11-18 23:31:35 +07:00
Remita Amine
5c3f7014ef [lbry] add support for odysee.com domain(closes #26806) 2020-11-18 00:41:07 +01:00
Remita Amine
d65628ef03 [francetv] improve info extraction 2020-11-17 23:16:04 +01:00
gdzx
8a6c5b0806 [francetv] Add fallback video url extraction (#27047)
Fallback on another API endpoint when no video formats are found.

Closes ytdl-org#22561
2020-11-17 23:06:19 +01:00
Sergey M․
9360936f26 release 2020.11.18 2020-11-18 04:15:30 +07:00
Sergey M․
bb2b89e077 [ChangeLog] Actualize
[ci skip]
2020-11-18 04:11:58 +07:00
Sergey M․
2d7a29081c [spiegel] Fix extraction (closes #24206, closes #24767)
Code picked from PR #24767 since original repo is not available due to takedown.
2020-11-18 04:09:02 +07:00
Sergey M․
2864179293 [youtube] Improve extraction
+ Add support for --no-playlist (closes #27009)
* Improve playlist and mix extraction (closes #26390, closes #26509, closes #26534, closes #27011)
+ Extract playlist uploader data
* Update tests
2020-11-18 03:34:08 +07:00
Remita Amine
284f8306df [youtube:tab] fix view_count extraction(closes #27051) 2020-11-17 20:32:50 +01:00
Remita Amine
aa613ef7e1 [malltv] fix extraction(closes #27035) 2020-11-17 19:13:38 +01:00
Remita Amine
cb2b9a22a5 [bandcamp] extract playlist_description(closes #22684) 2020-11-17 14:46:02 +01:00
Remita Amine
5b867c15a8 [urplay] fix extraction(closes #26828) 2020-11-17 13:12:20 +01:00
Sergey M․
f8c749f12c [youtube:tab] Fix playlist title extraction (closes #27015) 2020-11-17 07:01:41 +07:00
Sergey M․
7d509c613b [youtube] Fix chapters extraction (closes #26005) 2020-11-17 04:28:35 +07:00
Sergey M․
b92e95aa01 release 2020.11.17 2020-11-17 03:59:54 +07:00
Sergey M․
e3cad6bd99 [ChangeLog] Actualize
[ci skip]
2020-11-17 03:50:12 +07:00
Sergey M․
6699b6ce41 [youtube:tab] Fix extraction with cookies provided (closes #27005) 2020-11-17 03:40:57 +07:00
Remita Amine
91dcde8a38 [lrt] fix extraction with empty tags(closes #20264) 2020-11-16 21:28:12 +01:00
Sergey M․
11f3471c4b [ndr:embed:base] Extract subtitles (closes #25447, closes #26106) 2020-11-17 03:24:23 +07:00
Sergey M․
f4093b34f6 [servus] Add support for pm-wissen.com (closes #25869) 2020-11-17 03:02:07 +07:00
Sergey M․
a80b23c373 [servus] Fix extraction (closes #26872, closes #26967, closes #26983, closes #27000) 2020-11-17 02:57:26 +07:00
Sergey M․
2e7fa18bb9 [xtube] Fix extraction (closes #26996) 2020-11-17 01:32:48 +07:00
Sergey M․
fe07e788bf [utils] Skip ! prefixed code in js_to_json 2020-11-17 01:30:43 +07:00
Remita Amine
6d3bdcf217 [lrt] fix extraction 2020-11-16 19:17:10 +01:00
Remita Amine
ec2a2ab441 [lbry] Add new extractor 2020-11-16 19:04:55 +01:00
Remita Amine
9448a20312 [condenast] fix extraction and extract subtitles 2020-11-16 18:57:56 +01:00
Remita Amine
3f1748b944 [bandcamp] fix extraction 2020-11-16 17:26:09 +01:00
Remita Amine
fe13087cd1 [rai] fix RaiPlay extraction 2020-11-16 17:17:52 +01:00
Remita Amine
f22fa82d7f [extractors] Remove VLivePlaylistIE import 2020-11-16 16:36:56 +01:00
Remita Amine
059fa9aa81 [vlive] fix extraction 2020-11-16 16:35:08 +01:00
Remita Amine
650aec4a98 [usanetwork] fix extraction 2020-11-16 16:21:48 +01:00
Remita Amine
2ea9c97432 [nbc] fix NBCNews/Today/MSNBC extraction 2020-11-16 16:18:37 +01:00
Edward Betts
efc589b865 [devscripts/make_lazy_extractors] Correct a spelling mistake (#26991) 2020-11-16 22:08:20 +07:00
Remita Amine
1737ea69b9 [cnbc] fix extraction 2020-11-16 15:55:07 +01:00
Sergey M․
1fb034d029 [youtube] Remove RIAA copyrighted media from tests as per [1]
1. Github dmca and 1201 notice re youtube-dl 9-21-20
2020-11-16 21:08:16 +07:00
Sergey M․
28f9568a84 release 2020.11.12 2020-11-12 06:23:46 +07:00
Sergey M․
5bbdadd5f8 [ChangeLog] Actualize
[ci skip]
2020-11-12 06:18:16 +07:00
Sergey M․
2de2ca6659 [youtube] Rework extractors
WIP
2020-11-12 06:16:37 +07:00
Sergey M․
051071203c release 2020.11.01.1 2020-11-01 08:58:40 +07:00
Sergey M․
34299510bb release 2020.11.01 2020-11-01 08:52:27 +07:00
Sergey M․
b9bceba37c [ChangeLog] Actualize
[ci skip]
2020-11-01 07:34:20 +07:00
Sergey M․
6d4733ce7b [youtube] Fix JS player URL extraction 2020-11-01 06:52:00 +07:00
Sergey M․
416da574ec [ytsearch] Fix extraction (closes #26920) 2020-10-23 21:31:37 +07:00
Toan Nguyen
48c5663c5f [afreecatv] Fix typo (#26970) 2020-10-22 19:15:05 +07:00
Hannu Hartikainen
7d740e7dc7 [23video] Relax _VALID_URL (#26870) 2020-10-20 00:56:23 +07:00
Kevin O'Connor
4eda10499e [utils] Don't attempt to coerce JS strings to numbers in js_to_json (#26851)
The current logic in `js_to_json` tries to rewrite octal/hex numbers to
decimal. However, when the logic actually happens the `"` or `'` have
already been trimmed off. This causes what were originally strings, that
happen to look like octal/hex numbers, to get rewritten to decimal and
returned as a number rather than a string.

In practive something like:

```js
{
  "0x40": "foo",
  "040": "bar",
}
```

would get rewritten as:

```json
{
  64: "foo",
  32: "bar
}
```

This is problematic since this isn't valid JSON as you cannot have
non-string keys.
2020-10-18 00:10:41 +07:00
Sergio Livi
605535776a [ustream] Add support for video.ibm.com (#26894) 2020-10-17 23:14:46 +07:00
Felix Yan
1050e0d09f [iqiyi] Fix typo (#26884) 2020-10-17 23:02:17 +07:00
Sergey M․
d65d89183f [expressen] Add support for di.se (closes #26670) 2020-09-24 07:37:10 +07:00
Surkal
0c92f1e96b [iprima] Improve video id extraction (#26507) (closes #26494) 2020-09-24 06:46:58 +07:00
Sergey M․
adae9e844b [README.md] Fix autonumber sequence description (refs #26686) 2020-09-24 06:36:07 +07:00
Sergey M․
c5764b3f89 [downloader/http] Properly handle missing message in SSLError (closes #26646) 2020-09-22 07:01:59 +07:00
Sergey M․
0837992a22 [downloader/http] Fix access to not yet opened stream in retry 2020-09-22 06:44:14 +07:00
Sergey M․
b55715934b release 2020.09.20 2020-09-20 12:30:45 +07:00
Sergey M․
bbc3b5b4bb [ChangeLog] Actualize
[ci skip]
2020-09-20 12:24:32 +07:00
nixxo
1ca5f821c8 [redtube] Extend _VALID_URL (#26506) 2020-09-20 11:39:42 +07:00
Sergey M․
defc820b70 [twitch] Switch streams to GraphQL and refactor (closes #26535) 2020-09-20 10:05:00 +07:00
Sergey M․
82ef02e936 [telequebec] Fix issues (closes #26368) 2020-09-19 07:56:00 +07:00
Patrick Dessalle
b856b3997c [telequebec] Add support for brightcove videos (closes #25833) 2020-09-19 07:52:57 +07:00
Sergey M․
cd85a1bb8b [pornhub] Extract metadata from JSON-LD (closes #26614) 2020-09-19 06:34:34 +07:00
Sergey M․
ce5b904050 [extractor/common] Relax interaction count extraction in _json_ld 2020-09-19 06:33:17 +07:00
Sergey M․
ad06b99dd4 [extractor/common] Extract author as uploader for VideoObject in _json_ld 2020-09-19 06:13:42 +07:00
JChris246
540b9f5164 [pornhub] Fix view count extraction (#26621) (refs #26614) 2020-09-19 05:59:19 +07:00
Stefan Pöschel
6e65a2a67e [downloader/hls] Fix incorrect end byte in Range HTTP header for media segments with EXT-X-BYTERANGE (#24512) (closes #14748)
The end of the byte range is the first byte that is NOT part of the to
be downloaded range. So don't include it into the requested HTTP
download range, as this additional byte leads to a broken TS packet and
subsequently to e.g. visible video corruption.

Fixes #14748.
2020-09-18 05:26:56 +07:00
Sergey M․
f8c7bed133 [extractor/common] Handle ssl.CertificateError in _request_webpage (closes #26601)
ssl.CertificateError is raised on some python versions <= 3.7.x
2020-09-18 03:41:16 +07:00
Sergey M․
cdc55e666f [downloader/http] Improve timeout detection when reading block of data (refs #10935) 2020-09-18 03:32:54 +07:00
Ori Avtalion
86b7c00adc [downloader/http] Retry download when urlopen times out (#26603) (refs #10935) 2020-09-18 03:15:44 +07:00
Sergey M․
e8c5d40bc8 release 2020.09.14 2020-09-14 03:37:36 +07:00
Sergey M․
ca7ebc4e5e [ChangeLog] Actualize
[ci skip]
2020-09-14 03:35:18 +07:00
Sergey M․
bff857a8af [postprocessor/embedthumbnail] Fix issues (closes #25717)
* Fix WebP with wrong extension processing
* Fix embedding of thumbnails with % character in path
2020-09-14 03:28:31 +07:00
Alex Merkel
a31a022efd [postprocessor/embedthumbnail] Add support for non jpeg/png thumbnails (closes #25687) 2020-09-14 03:10:01 +07:00
Sergey M․
45f6362464 [rtlnl] Extend _VALID_URL for new embed URL schema 2020-09-13 21:42:06 +07:00
Derek Land
97f34a48d7 [rtlnl] Extend _VALID_URL (#26549) (closes #25821) 2020-09-13 21:38:16 +07:00
Daniel Peukert
ea74e00b3a [youtube] Fix empty description extraction (#26575) (closes #26006) 2020-09-13 21:23:21 +07:00
Sergey M․
06cd4cdb25 [srgssr] Extend _VALID_URL (closes #26555, closes #26556, closes #26578) 2020-09-13 21:07:25 +07:00
Sergey M․
da2069fb22 [googledrive] Use redirect URLs for source format (closes #18877, closes #23919, closes #24689, closes #26565) 2020-09-13 20:49:32 +07:00
Sergey M․
95c9810015 [svtplay] Fix id extraction (closes #26576) 2020-09-13 18:59:37 +07:00
Remita Amine
b03eebdb6a [redbulltv] improve support for rebull.com TV localized URLS(#22063) 2020-09-13 11:26:11 +01:00
Remita Amine
1f7675451c [redbulltv] Add support for new redbull.com TV URLs(closes #22037)(closes #22063) 2020-09-12 19:27:58 +01:00
tfvlrue
aa27253556 [soundcloud] Reduce pagination limit to fix 502 Bad Gateway errors when listing a user's tracks. (#26557)
Per the documentation here https://developers.soundcloud.com/blog/offset-pagination-deprecated the maximum limit is 200, so let's respect that (even if a higher value sometimes works).

Co-authored-by: tfvlrue <tfvlrue>
2020-09-12 09:35:11 +00:00
Sergey M․
d51e23d9fc release 2020.09.06 2020-09-06 13:00:41 +07:00
Sergey M․
6cd452acff [ChangeLog] Actualize
[ci skip]
2020-09-06 12:57:56 +07:00
Sergey M․
50e9fcc1fd [nrktv:episode] Improve video id extraction (closes #25594, closes #26369, closes #26409) 2020-09-06 12:43:50 +07:00
random-nick
16ee69c1b7 [youtube] Fix age gate content detection (#26100) (closes #26152, closes #26311, closes #26384) 2020-09-06 11:44:53 +07:00
Sergey M․
67171ed7e9 [youtube:user] Extend _VALID_URL (closes #26443) 2020-09-06 11:31:28 +07:00
Sergey M․
1d9bf655e6 [utils] Recognize wav mimetype (closes #26463) 2020-09-06 11:19:53 +07:00
TheRealDude2
62ae19ff76 [xhamster] Improve initials regex (#26526) (closes #26353) 2020-09-06 11:10:27 +07:00
Sergey M․
5ed05f26ad [svtplay] Fix svt id extraction (closes #26425, closes #26428, closes #26438) 2020-09-06 10:45:57 +07:00
Sergey M․
841b683804 [twitch] Rework extractors (closes #12297, closes #20414, closes #20604, closes #21811, closes #21812, closes #22979, closes #24263, closes #25010, closes #25553, closes #25606)
* Switch to GraphQL.
+ Add support for collections.
+ Add support for clips and collections playlists.
2020-09-06 10:45:34 +07:00
Remita Amine
f5863a3ea0 [biqle] improve video_ext extraction 2020-08-27 19:20:41 +01:00
Sergey M․
10709fc7c6 [xhamster] Extend _VALID_URL (closes #25927) 2020-08-12 21:51:50 +07:00
TheRealDude2
a7e348556a [xhamster] Fix extraction (closes #26157) (#26254) 2020-08-12 21:42:17 +07:00
JChris246
6cb30ea5ed [xhamster] Extend _VALID_URL (closes #25789) (#25804) 2020-08-12 21:37:22 +07:00
Sergey M․
a4ed50bb84 release 2020.07.28 2020-07-28 05:13:03 +07:00
Sergey M․
570611955f [ChangeLog] Actualize
[ci skip]
2020-07-28 05:07:54 +07:00
Sergey M․
e450f6cb63 [youtube] Fix sigfunc name extraction (closes #26134, closes #26135, closes #26136, closes #26137) 2020-07-28 05:05:38 +07:00
MRWITEK
a115e07594 [youtube] Improve description extraction (closes #25937) (#25980) 2020-07-14 12:01:15 +01:00
Sergey M․
718393c632 [wistia] Restrict embed regex (closes #25969) 2020-07-11 18:27:19 +07:00
Glenn Slayden
07af16b92e [youtube] Prevent excess HTTP 301 (#25786) 2020-07-01 02:56:16 +07:00
Sergey M․
e942cfd1a7 [youtube:playlists] Extend _VALID_URL (closes #25810) 2020-06-28 10:30:03 +07:00
Remita Amine
9a7e5cb88a [bellmedia] add support for cp24.com clip URLs(closes #25764) 2020-06-23 15:09:13 +01:00
Sergey M․
2391941f28 [brightcove] Improve embed detection (closes #25674) 2020-06-16 17:38:25 +07:00
224 changed files with 15199 additions and 10379 deletions

View File

@@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1** - [ ] I've verified that I'm running youtube-dl version **2021.02.04**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.06.16.1 [debug] youtube-dl version 2021.02.04
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -19,7 +19,7 @@ labels: 'site-support-request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1** - [ ] I've verified that I'm running youtube-dl version **2021.02.04**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -18,13 +18,13 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1** - [ ] I've verified that I'm running youtube-dl version **2021.02.04**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -18,7 +18,7 @@ title: ''
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1** - [ ] I've verified that I'm running youtube-dl version **2021.02.04**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.06.16.1 [debug] youtube-dl version 2021.02.04
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@@ -19,13 +19,13 @@ labels: 'request'
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.04. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1** - [ ] I've verified that I'm running youtube-dl version **2021.02.04**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -7,8 +7,10 @@
--- ---
### Before submitting a *pull request* make sure you have: ### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/ytdl-org/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/ytdl-org/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/ytdl-org/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests - [ ] [Searched](https://github.com/ytdl-org/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
- [ ] Read [adding new extractor tutorial](https://github.com/ytdl-org/youtube-dl#adding-support-for-a-new-site)
- [ ] Read [youtube-dl coding conventions](https://github.com/ytdl-org/youtube-dl#youtube-dl-coding-conventions) and adjusted the code to meet them
- [ ] Covered the code with tests (note that PRs without tests will be REJECTED)
- [ ] Checked the code with [flake8](https://pypi.python.org/pypi/flake8) - [ ] Checked the code with [flake8](https://pypi.python.org/pypi/flake8)
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options: ### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:

74
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,74 @@
name: CI
on: [push, pull_request]
jobs:
tests:
name: Tests
runs-on: ${{ matrix.os }}
strategy:
fail-fast: true
matrix:
os: [ubuntu-18.04]
# TODO: python 2.6
python-version: [2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7]
python-impl: [cpython]
ytdl-test-set: [core, download]
run-tests-ext: [sh]
include:
# python 3.2 is only available on windows via setup-python
- os: windows-latest
python-version: 3.2
python-impl: cpython
ytdl-test-set: core
run-tests-ext: bat
- os: windows-latest
python-version: 3.2
python-impl: cpython
ytdl-test-set: download
run-tests-ext: bat
# jython
- os: ubuntu-18.04
python-impl: jython
ytdl-test-set: core
run-tests-ext: sh
- os: ubuntu-18.04
python-impl: jython
ytdl-test-set: download
run-tests-ext: sh
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
if: ${{ matrix.python-impl == 'cpython' }}
with:
python-version: ${{ matrix.python-version }}
- name: Set up Java 8
if: ${{ matrix.python-impl == 'jython' }}
uses: actions/setup-java@v1
with:
java-version: 8
- name: Install Jython
if: ${{ matrix.python-impl == 'jython' }}
run: |
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose
run: pip install nose
- name: Run tests
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env:
YTDL_TEST_SET: ${{ matrix.ytdl-test-set }}
run: ./devscripts/run_tests.${{ matrix.run-tests-ext }}
flake8:
name: Linter
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install flake8
run: pip install flake8
- name: Run flake8
run: flake8 .

View File

@@ -1,50 +0,0 @@
language: python
python:
- "2.6"
- "2.7"
- "3.2"
- "3.3"
- "3.4"
- "3.5"
- "3.6"
- "pypy"
- "pypy3"
dist: trusty
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
jobs:
include:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
- name: flake8
python: 3.8
dist: xenial
install: pip install flake8
script: flake8 .
fast_finish: true
allow_failures:
- env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
before_install:
- if [ "$JYTHON" == "true" ]; then ./devscripts/install_jython.sh; export PATH="$HOME/jython/bin:$PATH"; fi
script: ./devscripts/run_tests.sh

View File

@@ -246,3 +246,4 @@ Enes Solak
Nathan Rossi Nathan Rossi
Thomas van der Berg Thomas van der Berg
Luca Cherubin Luca Cherubin
Adrian Heine

682
ChangeLog
View File

@@ -1,3 +1,651 @@
version 2021.02.04
Extractors
* [pornhub] Implement lazy playlist extraction
* [svtplay] Fix video id extraction (#28058)
+ [pornhub] Add support for authentication (#18797, #21416, #24294)
* [pornhub:user] Improve paging
+ [pornhub:user] Add support for URLs unavailable via /videos page (#27853)
+ [bravotv] Add support for oxygen.com (#13357, #22500)
+ [youtube] Pass embed URL to get_video_info request
* [ccma] Improve metadata extraction (#27994)
+ Extract age limit, alt title, categories, series and episode number
* Fix timestamp multiple subtitles extraction
* [egghead] Update API domain (#28038)
- [vidzi] Remove extractor (#12629)
* [vidio] Improve metadata extraction
* [youtube] Improve subtitles extraction
* [youtube] Fix chapter extraction fallback
* [youtube] Rewrite extractor
* Improve format sorting
* Remove unused code
* Fix series metadata extraction
* Fix trailer video extraction
* Improve error reporting
+ Extract video location
+ [vvvvid] Add support for youtube embeds (#27825)
* [googledrive] Report download page errors (#28005)
* [vlive] Fix error message decoding for python 2 (#28004)
* [youtube] Improve DASH formats file size extraction
* [cda] Improve birth validation detection (#14022, #27929)
+ [awaan] Extract uploader id (#27963)
+ [medialaan] Add support DPG Media MyChannels based websites (#14871, #15597,
#16106, #16489)
* [abcnews] Fix extraction (#12394, #27920)
* [AMP] Fix upload date and timestamp extraction (#27970)
* [tv4] Relax URL regular expression (#27964)
+ [tv2] Add support for mtvuutiset.fi (#27744)
* [adn] Improve login warning reporting
* [zype] Fix uplynk id extraction (#27956)
+ [adn] Add support for authentication (#17091, #27841, #27937)
version 2021.01.24.1
Core
* Introduce --output-na-placeholder (#27896)
Extractors
* [franceculture] Make thumbnail optional (#18807)
* [franceculture] Fix extraction (#27891, #27903)
* [njpwworld] Fix extraction (#27890)
* [comedycentral] Fix extraction (#27905)
* [wat] Fix format extraction (#27901)
+ [americastestkitchen:season] Add support for seasons (#27861)
+ [trovo] Add support for trovo.live (#26125)
+ [aol] Add support for yahoo videos (#26650)
* [yahoo] Fix single video extraction
* [lbry] Unescape lbry URI (#27872)
* [9gag] Fix and improve extraction (#23022)
* [americastestkitchen] Improve metadata extraction for ATK episodes (#27860)
* [aljazeera] Fix extraction (#20911, #27779)
+ [minds] Add support for minds.com (#17934)
* [ard] Fix title and description extraction (#27761)
+ [spotify] Add support for Spotify Podcasts (#27443)
version 2021.01.16
Core
* [YoutubeDL] Protect from infinite recursion due to recursively nested
playlists (#27833)
* [YoutubeDL] Ignore failure to create existing directory (#27811)
* [YoutubeDL] Raise syntax error for format selection expressions with multiple
+ operators (#27803)
Extractors
+ [animeondemand] Add support for lazy playlist extraction (#27829)
* [youporn] Restrict fallback download URL (#27822)
* [youporn] Improve height and tbr extraction (#20425, #23659)
* [youporn] Fix extraction (#27822)
+ [twitter] Add support for unified cards (#27826)
+ [twitch] Add Authorization header with OAuth token for GraphQL requests
(#27790)
* [mixcloud:playlist:base] Extract video id in flat playlist mode (#27787)
* [cspan] Improve info extraction (#27791)
* [adn] Improve info extraction
* [adn] Fix extraction (#26963, #27732)
* [youtube:search] Extract from all sections (#27604)
* [youtube:search] fix viewcount and try to extract all video sections (#27604)
* [twitch] Improve login error extraction
* [twitch] Fix authentication (#27743)
* [3qsdn] Improve extraction (#21058)
* [peertube] Extract formats from streamingPlaylists (#26002, #27586, #27728)
* [khanacademy] Fix extraction (#2887, #26803)
* [spike] Update Paramount Network feed URL (#27715)
version 2021.01.08
Core
* [downloader/hls] Disable decryption in tests (#27660)
+ [utils] Add a function to clean podcast URLs
Extractors
* [rai] Improve subtitles extraction (#27698, #27705)
* [canvas] Match only supported VRT NU URLs (#27707)
+ [bibeltv] Add support for bibeltv.de (#14361)
+ [bfmtv] Add support for bfmtv.com (#16053, #26615)
+ [sbs] Add support for ondemand play and news embed URLs (#17650, #27629)
* [twitch] Drop legacy kraken API v5 code altogether and refactor
* [twitch:vod] Switch to GraphQL for video metadata
* [canvas] Fix VRT NU extraction (#26957, #27053)
* [twitch] Switch access token to GraphQL and refactor (#27646)
+ [rai] Detect ContentItem in iframe (#12652, #27673)
* [ketnet] Fix extraction (#27662)
+ [dplay] Add suport Discovery+ domains (#27680)
* [motherless] Improve extraction (#26495, #27450)
* [motherless] Fix recent videos upload date extraction (#27661)
* [nrk] Fix extraction for videos without a legalAge rating
- [googleplus] Remove extractor (#4955, #7400)
+ [applepodcasts] Add support for podcasts.apple.com (#25918)
+ [googlepodcasts] Add support for podcasts.google.com
+ [iheart] Add support for iheart.com (#27037)
* [acast] Clean podcast URLs
* [stitcher] Clean podcast URLs
+ [xfileshare] Add support for aparat.cam (#27651)
+ [twitter] Add support for summary card (#25121)
* [twitter] Try to use a Generic fallback for unknown twitter cards (#25982)
+ [stitcher] Add support for shows and show metadata extraction (#20510)
* [stv] Improve episode id extraction (#23083)
version 2021.01.03
Extractors
* [nrk] Improve series metadata extraction (#27473)
+ [nrk] Extract subtitles
* [nrk] Fix age limit extraction
* [nrk] Improve video id extraction
+ [nrk] Add support for podcasts (#27634, #27635)
* [nrk] Generalize and delegate all item extractors to nrk
+ [nrk] Add support for mp3 formats
* [nrktv] Switch to playback endpoint
* [vvvvid] Fix season metadata extraction (#18130)
* [stitcher] Fix extraction (#20811, #27606)
* [acast] Fix extraction (#21444, #27612, #27613)
+ [arcpublishing] Add support for arcpublishing.com (#2298, #9340, #17200)
+ [sky] Add support for Sports News articles and Brighcove videos (#13054)
+ [vvvvid] Extract akamai formats
* [vvvvid] Skip unplayable episodes (#27599)
* [yandexvideo] Fix extraction for Python 3.4
version 2020.12.31
Core
* [utils] Accept only supported protocols in url_or_none
* [YoutubeDL] Allow format filtering using audio language (#16209)
Extractors
+ [redditr] Extract all thumbnails (#27503)
* [vvvvid] Improve info extraction
+ [vvvvid] Add support for playlists (#18130, #27574)
+ [yandexdisk] Extract info from webpage
* [yandexdisk] Fix extraction (#17861, #27131)
* [yandexvideo] Use old API call as fallback
* [yandexvideo] Fix extraction (#25000)
- [nbc] Remove CSNNE extractor
* [nbc] Fix NBCSport VPlayer URL extraction (#16640)
+ [aenetworks] Add support for biography.com (#3863)
* [uktvplay] Match new video URLs (#17909)
* [sevenplay] Detect API errors
* [tenplay] Fix format extraction (#26653)
* [brightcove] Raise error for DRM protected videos (#23467, #27568)
version 2020.12.29
Extractors
* [youtube] Improve yt initial data extraction (#27524)
* [youtube:tab] Improve URL matching #27559)
* [youtube:tab] Restore retry on browse requests (#27313, #27564)
* [aparat] Fix extraction (#22285, #22611, #23348, #24354, #24591, #24904,
#25418, #26070, #26350, #26738, #27563)
- [brightcove] Remove sonyliv specific code
* [piksel] Improve format extraction
+ [zype] Add support for uplynk videos
+ [toggle] Add support for live.mewatch.sg (#27555)
+ [go] Add support for fxnow.fxnetworks.com (#13972, #22467, #23754, #26826)
* [teachable] Improve embed detection (#26923)
* [mitele] Fix free video extraction (#24624, #25827, #26757)
* [telecinco] Fix extraction
* [youtube] Update invidious.snopyta.org (#22667)
* [amcnetworks] Improve auth only video detection (#27548)
+ [generic] Add support for VHX Embeds (#27546)
version 2020.12.26
Extractors
* [instagram] Fix comment count extraction
+ [instagram] Add support for reel URLs (#26234, #26250)
* [bbc] Switch to media selector v6 (#23232, #23933, #26303, #26432, #26821,
#27538)
* [instagram] Improve thumbnail extraction
* [instagram] Fix extraction when authenticated (#22880, #26377, #26981,
#27422)
* [spankbang:playlist] Fix extraction (#24087)
+ [spankbang] Add support for playlist videos
* [pornhub] Improve like and dislike count extraction (#27356)
* [pornhub] Fix lq formats extraction (#27386, #27393)
+ [bongacams] Add support for bongacams.com (#27440)
* [youtube:tab] Extend URL regular expression (#27501)
* [theweatherchannel] Fix extraction (#25930, #26051)
+ [sprout] Add support for Universal Kids (#22518)
* [theplatform] Allow passing geo bypass countries from other extractors
+ [wistia] Add support for playlists (#27533)
+ [ctv] Add support for ctv.ca (#27525)
* [9c9media] Improve info extraction
* [youtube] Fix automatic captions extraction (#27162, #27388)
* [sonyliv] Fix title for movies
* [sonyliv] Fix extraction (#25667)
* [streetvoice] Fix extraction (#27455, #27492)
+ [facebook] Add support for watchparty pages (#27507)
* [cbslocal] Fix video extraction
+ [brightcove] Add another method to extract policyKey
* [mewatch] Relax URL regular expression (#27506)
version 2020.12.22
Core
* [common] Remove unwanted query params from unsigned akamai manifest URLs
Extractors
- [tastytrade] Remove extractor (#25716)
* [niconico] Fix playlist extraction (#27428)
- [everyonesmixtape] Remove extractor
- [kanalplay] Remove extractor
* [arkena] Fix extraction
* [nba] Rewrite extractor
* [turner] Improve info extraction
* [youtube] Improve xsrf token extraction (#27442)
* [generic] Improve RSS age limit extraction
* [generic] Fix RSS itunes thumbnail extraction (#27405)
+ [redditr] Extract duration (#27426)
- [zaq1] Remove extractor
+ [asiancrush] Add support for retrocrush.tv
* [asiancrush] Fix extraction
- [noco] Remove extractor (#10864)
* [nfl] Fix extraction (#22245)
* [skysports] Relax URL regular expression (#27435)
+ [tv5unis] Add support for tv5unis.ca (#22399, #24890)
+ [videomore] Add support for more.tv (#27088)
+ [yandexmusic] Add support for music.yandex.com (#27425)
+ [nhk:program] Add support for audio programs and program clips
+ [nhk] Add support for NHK video programs (#27230)
version 2020.12.14
Core
* [extractor/common] Improve JSON-LD interaction statistic extraction (#23306)
* [downloader/hls] Delegate manifests with media initialization to ffmpeg
+ [extractor/common] Document duration meta field for playlists
Extractors
* [mdr] Bypass geo restriction
* [mdr] Improve extraction (#24346, #26873)
* [yandexmusic:album] Improve album title extraction (#27418)
* [eporner] Fix view count extraction and make optional (#23306)
+ [eporner] Extend URL regular expression
* [eporner] Fix hash extraction and extend _VALID_URL (#27396)
* [slideslive] Use m3u8 entry protocol for m3u8 formats (#27400)
* [twitcasting] Fix format extraction and improve info extraction (#24868)
* [linuxacademy] Fix authentication and extraction (#21129, #26223, #27402)
* [itv] Clean description from HTML tags (#27399)
* [vlive] Sort live formats (#27404)
* [hotstart] Fix and improve extraction
* Fix format extraction (#26690)
+ Extract thumbnail URL (#16079, #20412)
+ Add support for country specific playlist URLs (#23496)
* Select the last id in video URL (#26412)
+ [youtube] Add some invidious instances (#27373)
version 2020.12.12
Core
* [YoutubeDL] Improve thumbnail filename deducing (#26010, #27244)
Extractors
+ [ruutu] Extract more metadata
+ [ruutu] Detect non-free videos (#21154)
* [ruutu] Authenticate format URLs (#21031, #26782)
+ [ruutu] Add support for static.nelonenmedia.fi (#25412)
+ [ruutu] Extend URL regular expression (#24839)
+ [facebook] Add support archived live video URLs (#15859)
* [wdr] Improve overall extraction
+ [wdr] Extend subtitles extraction (#22672, #22723)
+ [facebook] Add support for videos attached to Relay based story pages
(#10795)
+ [wdr:page] Add support for kinder.wdr.de (#27350)
+ [facebook] Add another regular expression for handleServerJS
* [facebook] Fix embed page extraction
+ [facebook] Add support for Relay post pages (#26935)
+ [facebook] Add support for watch videos (#22795, #27062)
+ [facebook] Add support for group posts with multiple videos (#19131)
* [itv] Fix series metadata extraction (#26897)
- [itv] Remove old extraction method (#23177)
* [facebook] Redirect mobile URLs to desktop URLs (#24831, #25624)
+ [facebook] Add support for Relay based pages (#26823)
* [facebook] Try to reduce unnecessary tahoe requests
- [facebook] Remove hardcoded Chrome User-Agent (#18974, #25411, #26958,
#27329)
- [smotri] Remove extractor (#27358)
- [beampro] Remove extractor (#17290, #22871, #23020, #23061, #26099)
version 2020.12.09
Core
* [extractor/common] Fix inline HTML5 media tags processing (#27345)
Extractors
* [youtube:tab] Improve identity token extraction (#27197)
* [youtube:tab] Make click tracking params on continuation optional
* [youtube:tab] Delegate inline playlists to tab-based playlists (27298)
+ [tubitv] Extract release year (#27317)
* [amcnetworks] Fix free content extraction (#20354)
+ [lbry:channel] Add support for channels (#25584)
+ [lbry] Add support for short and embed URLs
* [lbry] Fix channel metadata extraction
+ [telequebec] Add support for video.telequebec.tv (#27339)
* [telequebec] Fix extraction (#25733, #26883)
+ [youtube:tab] Capture and output alerts (#27340)
* [tvplay:home] Fix extraction (#21153)
* [americastestkitchen] Fix Extraction and add support
for Cook's Country and Cook's Illustrated (#17234, #27322)
+ [slideslive] Add support for yoda service videos and extract subtitles
(#27323)
version 2020.12.07
Core
* [extractor/common] Extract timestamp from Last-Modified header
+ [extractor/common] Add support for dl8-* media tags (#27283)
* [extractor/common] Fix media type extraction for HTML5 media tags
in start/end form
Extractors
* [aenetworks] Fix extraction (#23363, #23390, #26795, #26985)
* Fix Fastly format extraction
+ Add support for play and watch subdomains
+ Extract series metadata
* [youtube] Improve youtu.be extraction in non-existing playlists (#27324)
+ [generic] Extract RSS video description, timestamp and itunes metadata
(#27177)
* [nrk] Reduce the number of instalments and episodes requests
* [nrk] Improve extraction
* Improve format extraction for old akamai formats
+ Add is_live value to entry info dict
* Request instalments only when available
* Fix skole extraction
+ [peertube] Extract fps
+ [peertube] Recognize audio-only formats (#27295)
version 2020.12.05
Core
* [extractor/common] Improve Akamai HTTP format extraction
* Allow m3u8 manifest without an additional audio format
* Fix extraction for qualities starting with a number
Extractors
* [teachable:course] Improve extraction (#24507, #27286)
* [nrk] Improve error extraction
* [nrktv:series] Improve extraction (#21926)
* [nrktv:season] Improve extraction
* [nrk] Improve format extraction and geo-restriction detection (#24221)
* [pornhub] Handle HTTP errors gracefully (#26414)
* [nrktv] Relax URL regular expression (#27299, #26185)
+ [zdf] Extract webm formats (#26659)
+ [gamespot] Extract DASH and HTTP formats
+ [tver] Add support for tver.jp (#26662, #27284)
+ [pornhub] Add support for pornhub.org (#27276)
version 2020.12.02
Extractors
+ [tva] Add support for qub.ca (#27235)
+ [toggle] Detect DRM protected videos (#16479, #20805)
+ [toggle] Add support for new MeWatch URLs (#27256)
* [youtube:tab] Extract channels only from channels tab (#27266)
+ [cspan] Extract info from jwplayer data (#3672, #3734, #10638, #13030,
#18806, #23148, #24461, #26171, #26800, #27263)
* [cspan] Pass Referer header with format's video URL (#26032, #25729)
* [youtube] Improve age-gated videos extraction (#27259)
+ [mediaset] Add support for movie URLs (#27240)
* [yandexmusic] Refactor
+ [yandexmusic] Add support for artist's tracks and albums (#11887, #22284)
* [yandexmusic:track] Fix extraction (#26449, #26669, #26747, #26748, #26762)
version 2020.11.29
Core
* [YoutubeDL] Write static debug to stderr and respect quiet for dynamic debug
(#14579, #22593)
Extractors
* [drtv] Extend URL regular expression (#27243)
* [tiktok] Fix extraction (#20809, #22838, #22850, #25987, #26281, #26411,
#26639, #26776, #27237)
+ [ina] Add support for mobile URLs (#27229)
* [pornhub] Fix like and dislike count extraction (#27227, #27234)
* [youtube] Improve yt initial player response extraction (#27216)
* [videa] Fix extraction (#25650, #25973, #26301)
version 2020.11.26
Core
* [downloader/fragment] Set final file's mtime according to last fragment's
Last-Modified header (#11718, #18384, #27138)
Extractors
+ [spreaker] Add support for spreaker.com (#13480, #13877)
* [vlive] Improve extraction for geo-restricted videos
+ [vlive] Add support for post URLs (#27122, #27123)
* [viki] Fix video API request (#27184)
* [bbc] Fix BBC Three clip extraction
* [bbc] Fix BBC News videos extraction
+ [medaltv] Add support for medal.tv (#27149)
* [youtube] Improve music metadata and license extraction (#26013)
* [nrk] Fix extraction
* [cda] Fix extraction (#17803, #24458, #24518, #26381)
version 2020.11.24
Core
+ [extractor/common] Add generic support for akamai HTTP format extraction
Extractors
* [youtube:tab] Fix feeds extraction (#25695, #26452)
* [youtube:favorites] Restore extractor
* [youtube:tab] Fix some weird typo (#27157)
+ [pinterest] Add support for large collections (more than 25 pins)
+ [franceinter] Extract thumbnail (#27153)
+ [box] Add support for box.com (#5949)
+ [nytimes] Add support for cooking.nytimes.com (#27112, #27143)
* [lbry] Relax URL regular expression (#27144)
+ [rumble] Add support for embed pages (#10785)
+ [skyit] Add support for multiple Sky Italia websites (#26629)
+ [pinterest] Add support for pinterest.com (#25747)
version 2020.11.21.1
Core
* [downloader/http] Fix crash during urlopen caused by missing reason
of URLError
* [YoutubeDL] Fix --ignore-errors for playlists with generator-based entries
of url_transparent (#27064)
Extractors
+ [svtplay] Add support for svt.se/barnkanalen (#24817)
+ [svt] Extract timestamp (#27130)
* [svtplay] Improve thumbnail extraction (#27130)
* [youtube] Fix error reason extraction (#27081)
* [youtube] Fix like and dislike count extraction (#25977)
+ [youtube:tab] Add support for current video and fix lives extraction (#27126)
* [infoq] Fix format extraction (#25984)
* [francetv] Update to fix thumbnail URL issue (#27120)
* [youtube] Improve yt initial data extraction (#27093)
+ [discoverynetworks] Add support new TLC/DMAX URLs (#27100)
* [rai] Fix protocol relative relinker URLs (#22766)
* [rai] Fix unavailable video format detection
* [rai] Improve extraction
* [rai] Fix extraction (#27077)
* [viki] Improve format extraction
* [viki] Fix stream extraction from MPD (#27092)
* [googledrive] Fix format extraction (#26979)
+ [amara] Add support for amara.org (#20618)
* [vimeo:album] Fix extraction (#27079)
* [mtv] Fix mgid extraction (#26841)
version 2020.11.19
Core
* [extractor/common] Output error for invalid URLs in _is_valid_url (#21400,
#24151, #25617, #25618, #25586, #26068, #27072)
Extractors
* [youporn] Fix upload date extraction
* [youporn] Make comment count optional (#26986)
* [arte] Rework extractors
* Reimplement embed and playlist extractors to delegate to the single
entrypoint artetv extractor
* Improve embeds detection (#27057)
+ [arte] Extract m3u8 formats (#27061)
* [mgtv] Fix format extraction (#26415)
+ [lbry] Add support for odysee.com (#26806)
* [francetv] Improve info extraction
+ [francetv] Add fallback video URL extraction (#27047)
version 2020.11.18
Extractors
* [spiegel] Fix extraction (#24206, #24767)
* [youtube] Improve extraction
+ Add support for --no-playlist (#27009)
* Improve playlist and mix extraction (#26390, #26509, #26534, #27011)
+ Extract playlist uploader data
* [youtube:tab] Fix view count extraction (#27051)
* [malltv] Fix extraction (#27035)
+ [bandcamp] Extract playlist description (#22684)
* [urplay] Fix extraction (#26828)
* [youtube:tab] Fix playlist title extraction (#27015)
* [youtube] Fix chapters extraction (#26005)
version 2020.11.17
Core
* [utils] Skip ! prefixed code in js_to_json
Extractors
* [youtube:tab] Fix extraction with cookies provided (#27005)
* [lrt] Fix extraction with empty tags (#20264)
+ [ndr:embed:base] Extract subtitles (#25447, #26106)
+ [servus] Add support for pm-wissen.com (#25869)
* [servus] Fix extraction (#26872, #26967, #26983, #27000)
* [xtube] Fix extraction (#26996)
* [lrt] Fix extraction
+ [lbry] Add support for lbry.tv
+ [condenast] Extract subtitles
* [condenast] Fix extraction
* [bandcamp] Fix extraction (#26681, #26684)
* [rai] Fix RaiPlay extraction (#26064, #26096)
* [vlive] Fix extraction
* [usanetwork] Fix extraction
* [nbc] Fix NBCNews/Today/MSNBC extraction
* [cnbc] Fix extraction
version 2020.11.12
Extractors
* [youtube] Rework extractors
version 2020.11.01
Core
* [utils] Don't attempt to coerce JS strings to numbers in js_to_json (#26851)
* [downloader/http] Properly handle missing message in SSLError (#26646)
* [downloader/http] Fix access to not yet opened stream in retry
Extractors
* [youtube] Fix JS player URL extraction
* [ytsearch] Fix extraction (#26920)
* [afreecatv] Fix typo (#26970)
* [23video] Relax URL regular expression (#26870)
+ [ustream] Add support for video.ibm.com (#26894)
* [iqiyi] Fix typo (#26884)
+ [expressen] Add support for di.se (#26670)
* [iprima] Improve video id extraction (#26507, #26494)
version 2020.09.20
Core
* [extractor/common] Relax interaction count extraction in _json_ld
+ [extractor/common] Extract author as uploader for VideoObject in _json_ld
* [downloader/hls] Fix incorrect end byte in Range HTTP header for
media segments with EXT-X-BYTERANGE (#14748, #24512)
* [extractor/common] Handle ssl.CertificateError in _request_webpage (#26601)
* [downloader/http] Improve timeout detection when reading block of data
(#10935)
* [downloader/http] Retry download when urlopen times out (#10935, #26603)
Extractors
* [redtube] Extend URL regular expression (#26506)
* [twitch] Refactor
* [twitch:stream] Switch to GraphQL and fix reruns (#26535)
+ [telequebec] Add support for brightcove videos (#25833)
* [pornhub] Extract metadata from JSON-LD (#26614)
* [pornhub] Fix view count extraction (#26621, #26614)
version 2020.09.14
Core
+ [postprocessor/embedthumbnail] Add support for non jpg/png thumbnails
(#25687, #25717)
Extractors
* [rtlnl] Extend URL regular expression (#26549, #25821)
* [youtube] Fix empty description extraction (#26575, #26006)
* [srgssr] Extend URL regular expression (#26555, #26556, #26578)
* [googledrive] Use redirect URLs for source format (#18877, #23919, #24689,
#26565)
* [svtplay] Fix id extraction (#26576)
* [redbulltv] Improve support for rebull.com TV localized URLs (#22063)
+ [redbulltv] Add support for new redbull.com TV URLs (#22037, #22063)
* [soundcloud:pagedplaylist] Reduce pagination limit (#26557)
version 2020.09.06
Core
+ [utils] Recognize wav mimetype (#26463)
Extractors
* [nrktv:episode] Improve video id extraction (#25594, #26369, #26409)
* [youtube] Fix age gate content detection (#26100, #26152, #26311, #26384)
* [youtube:user] Extend URL regular expression (#26443)
* [xhamster] Improve initials regular expression (#26526, #26353)
* [svtplay] Fix video id extraction (#26425, #26428, #26438)
* [twitch] Rework extractors (#12297, #20414, #20604, #21811, #21812, #22979,
#24263, #25010, #25553, #25606)
* Switch to GraphQL
+ Add support for collections
+ Add support for clips and collections playlists
* [biqle] Improve video ext extraction
* [xhamster] Fix extraction (#26157, #26254)
* [xhamster] Extend URL regular expression (#25789, #25804, #25927))
version 2020.07.28
Extractors
* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
* [youtube] Improve description extraction (#25937, #25980)
* [wistia] Restrict embed regular expression (#25969)
* [youtube] Prevent excess HTTP 301 (#25786)
+ [youtube:playlists] Extend URL regular expression (#25810)
+ [bellmedia] Add support for cp24.com clip URLs (#25764)
* [brightcove] Improve embed detection (#25674)
version 2020.06.16.1 version 2020.06.16.1
Extractors Extractors
@@ -211,7 +859,7 @@ Extractors
+ Add support for more domains + Add support for more domains
* [svt] Fix series extraction (#22297) * [svt] Fix series extraction (#22297)
* [svt] Fix article extraction (#22897, #22919) * [svt] Fix article extraction (#22897, #22919)
* [soundcloud] Imporve private playlist/set tracks extraction (#3707) * [soundcloud] Improve private playlist/set tracks extraction (#3707)
version 2020.01.24 version 2020.01.24
@@ -337,7 +985,7 @@ Extractors
* [abcotvs] Relax URL regular expression and improve metadata extraction * [abcotvs] Relax URL regular expression and improve metadata extraction
(#18014) (#18014)
* [channel9] Reduce response size * [channel9] Reduce response size
* [adobetv] Improve extaction * [adobetv] Improve extraction
* Use OnDemandPagedList for list extractors * Use OnDemandPagedList for list extractors
* Reduce show extraction requests * Reduce show extraction requests
* Extract original video format and subtitles * Extract original video format and subtitles
@@ -362,7 +1010,7 @@ Extractors
* [dailymotion] Improve extraction * [dailymotion] Improve extraction
* Extract http formats included in m3u8 manifest * Extract http formats included in m3u8 manifest
* Fix user extraction (#3553, #21415) * Fix user extraction (#3553, #21415)
+ Add suport for User Authentication (#11491) + Add support for User Authentication (#11491)
* Fix password protected videos extraction (#23176) * Fix password protected videos extraction (#23176)
* Respect age limit option and family filter cookie value (#18437) * Respect age limit option and family filter cookie value (#18437)
* Handle video url playlist query param * Handle video url playlist query param
@@ -447,7 +1095,7 @@ Extractors
- [go90] Remove extractor - [go90] Remove extractor
* [kakao] Remove raw request * [kakao] Remove raw request
+ [kakao] Extract format total bitrate + [kakao] Extract format total bitrate
* [daum] Fix VOD and Clip extracton (#15015) * [daum] Fix VOD and Clip extraction (#15015)
* [kakao] Improve extraction * [kakao] Improve extraction
+ Add support for embed URLs + Add support for embed URLs
+ Add support for Kakao Legacy vid based embed URLs + Add support for Kakao Legacy vid based embed URLs
@@ -491,7 +1139,7 @@ Extractors
* Improve format extraction (#22123) * Improve format extraction (#22123)
+ Extract uploader_id and uploader_url (#21916) + Extract uploader_id and uploader_url (#21916)
+ Extract all known thumbnails (#19071, #20659) + Extract all known thumbnails (#19071, #20659)
* Fix extration for private playlists (#20976) * Fix extraction for private playlists (#20976)
+ Add support for playlist embeds (#20976) + Add support for playlist embeds (#20976)
* Skip preview formats (#22806) * Skip preview formats (#22806)
* [dplay] Improve extraction * [dplay] Improve extraction
@@ -966,7 +1614,7 @@ Extractors
* [hbo] Fix extraction and extract subtitles (#14629, #13709) * [hbo] Fix extraction and extract subtitles (#14629, #13709)
* [youtube] Extract srv[1-3] subtitle formats (#20566) * [youtube] Extract srv[1-3] subtitle formats (#20566)
* [adultswim] Fix extraction (#18025) * [adultswim] Fix extraction (#18025)
* [teamcoco] Fix extraction and add suport for subdomains (#17099, #20339) * [teamcoco] Fix extraction and add support for subdomains (#17099, #20339)
* [adn] Fix subtitle compatibility with ffmpeg * [adn] Fix subtitle compatibility with ffmpeg
* [adn] Fix extraction and add support for positioning styles (#20549) * [adn] Fix extraction and add support for positioning styles (#20549)
* [vk] Use unique video id (#17848) * [vk] Use unique video id (#17848)
@@ -1378,7 +2026,7 @@ version 2018.11.18
Extractors Extractors
+ [wwe] Extract subtitles + [wwe] Extract subtitles
+ [wwe] Add support for playlistst (#14781) + [wwe] Add support for playlists (#14781)
+ [wwe] Add support for wwe.com (#14781, #17450) + [wwe] Add support for wwe.com (#14781, #17450)
* [vk] Detect geo restriction (#17767) * [vk] Detect geo restriction (#17767)
* [openload] Use original host during extraction (#18211) * [openload] Use original host during extraction (#18211)
@@ -2411,7 +3059,7 @@ Extractors
* [youku] Update ccode (#14872) * [youku] Update ccode (#14872)
* [mnet] Fix format extraction (#14883) * [mnet] Fix format extraction (#14883)
+ [xiami] Add Referer header to API request + [xiami] Add Referer header to API request
* [mtv] Correct scc extention in extracted subtitles (#13730) * [mtv] Correct scc extension in extracted subtitles (#13730)
* [vvvvid] Fix extraction for kenc videos (#13406) * [vvvvid] Fix extraction for kenc videos (#13406)
+ [br] Add support for BR Mediathek videos (#14560, #14788) + [br] Add support for BR Mediathek videos (#14560, #14788)
+ [daisuki] Add support for motto.daisuki.com (#14681) + [daisuki] Add support for motto.daisuki.com (#14681)
@@ -2432,7 +3080,7 @@ Extractors
* [nexx] Extract more formats * [nexx] Extract more formats
+ [openload] Add support for openload.link (#14763) + [openload] Add support for openload.link (#14763)
* [empflix] Relax URL regular expression * [empflix] Relax URL regular expression
* [empflix] Fix extractrion * [empflix] Fix extraction
* [tnaflix] Don't modify download URLs (#14811) * [tnaflix] Don't modify download URLs (#14811)
- [gamersyde] Remove extractor - [gamersyde] Remove extractor
* [francetv:generationwhat] Fix extraction * [francetv:generationwhat] Fix extraction
@@ -2627,7 +3275,7 @@ Extractors
* [yahoo] Bypass geo restriction for brightcove (#14210) * [yahoo] Bypass geo restriction for brightcove (#14210)
* [yahoo] Use extracted brightcove account id (#14210) * [yahoo] Use extracted brightcove account id (#14210)
* [rtve:alacarta] Fix extraction (#14290) * [rtve:alacarta] Fix extraction (#14290)
+ [yahoo] Add support for custom brigthcove embeds (#14210) + [yahoo] Add support for custom brightcove embeds (#14210)
+ [generic] Add support for Video.js embeds + [generic] Add support for Video.js embeds
+ [gfycat] Add support for /gifs/detail URLs (#14322) + [gfycat] Add support for /gifs/detail URLs (#14322)
* [generic] Fix infinite recursion for twitter:player URLs (#14339) * [generic] Fix infinite recursion for twitter:player URLs (#14339)
@@ -2872,7 +3520,7 @@ Extractors
* [amcnetworks] Make rating optional (#12453) * [amcnetworks] Make rating optional (#12453)
* [cloudy] Fix extraction (#13737) * [cloudy] Fix extraction (#13737)
+ [nickru] Add support for nickelodeon.ru + [nickru] Add support for nickelodeon.ru
* [mtv] Improve thumbnal extraction * [mtv] Improve thumbnail extraction
* [nick] Automate geo-restriction bypass (#13711) * [nick] Automate geo-restriction bypass (#13711)
* [niconico] Improve error reporting (#13696) * [niconico] Improve error reporting (#13696)
@@ -3236,7 +3884,7 @@ Extractors
+ [cda] Support birthday verification (#12789) + [cda] Support birthday verification (#12789)
* [leeco] Fix extraction (#12974) * [leeco] Fix extraction (#12974)
+ [pbs] Extract chapters + [pbs] Extract chapters
* [amp] Imporove thumbnail and subtitles extraction * [amp] Improve thumbnail and subtitles extraction
* [foxsports] Fix extraction (#12945) * [foxsports] Fix extraction (#12945)
- [coub] Remove comment count extraction (#12941) - [coub] Remove comment count extraction (#12941)
@@ -3406,7 +4054,7 @@ Extractors
+ [rbmaradio] Add support for redbullradio.com URLs (#12687) + [rbmaradio] Add support for redbullradio.com URLs (#12687)
+ [npo:live] Add support for default URL (#12555) + [npo:live] Add support for default URL (#12555)
* [mixcloud:playlist] Fix title, description and view count extraction (#12582) * [mixcloud:playlist] Fix title, description and view count extraction (#12582)
+ [thesun] Add suport for thesun.co.uk (#11298, #12674) + [thesun] Add support for thesun.co.uk (#11298, #12674)
+ [ceskateleveize:porady] Add support for porady (#7411, #12645) + [ceskateleveize:porady] Add support for porady (#7411, #12645)
* [ceskateleveize] Improve extraction and remove URL replacement hacks * [ceskateleveize] Improve extraction and remove URL replacement hacks
+ [kaltura] Add support for iframe embeds (#12679) + [kaltura] Add support for iframe embeds (#12679)
@@ -3445,7 +4093,7 @@ Extractors
* [funimation] Fix extraction (#10696, #11773) * [funimation] Fix extraction (#10696, #11773)
+ [xfileshare] Add support for vidabc.com (#12589) + [xfileshare] Add support for vidabc.com (#12589)
+ [xfileshare] Improve extraction and extract hls formats + [xfileshare] Improve extraction and extract hls formats
+ [crunchyroll] Pass geo verifcation proxy + [crunchyroll] Pass geo verification proxy
+ [cwtv] Extract ISM formats + [cwtv] Extract ISM formats
+ [tvplay] Bypass geo restriction + [tvplay] Bypass geo restriction
+ [vrv] Add support for vrv.co + [vrv] Add support for vrv.co
@@ -3509,7 +4157,7 @@ Extractors
+ [bostonglobe] Add extractor for bostonglobe.com (#12099) + [bostonglobe] Add extractor for bostonglobe.com (#12099)
+ [toongoggles] Add support for toongoggles.com (#12171) + [toongoggles] Add support for toongoggles.com (#12171)
+ [medialaan] Add support for Medialaan sites (#9974, #11912) + [medialaan] Add support for Medialaan sites (#9974, #11912)
+ [discoverynetworks] Add support for more domains and bypass geo restiction + [discoverynetworks] Add support for more domains and bypass geo restriction
* [openload] Fix extraction (#10408) * [openload] Fix extraction (#10408)
@@ -5099,7 +5747,7 @@ version 2016.07.09.1
Fixed/improved extractors Fixed/improved extractors
- youtube - youtube
- ard - ard
- srmediatek (#9373) - srmediathek (#9373)
version 2016.07.09 version 2016.07.09
@@ -5163,7 +5811,7 @@ Fixed/improved extractors
- kaltura (#5557) - kaltura (#5557)
- la7 - la7
- Changed features - Changed features
- Rename --cn-verfication-proxy to --geo-verification-proxy - Rename --cn-verification-proxy to --geo-verification-proxy
Miscellaneous Miscellaneous
- Add script for displaying downloads statistics - Add script for displaying downloads statistics

771
README.md
View File

@@ -1,4 +1,5 @@
[![Build Status](https://travis-ci.org/ytdl-org/youtube-dl.svg?branch=master)](https://travis-ci.org/ytdl-org/youtube-dl) [![Build Status](https://github.com/ytdl-org/youtube-dl/workflows/CI/badge.svg)](https://github.com/ytdl-org/youtube-dl/actions?query=workflow%3ACI)
youtube-dl - download videos from youtube.com or other video platforms youtube-dl - download videos from youtube.com or other video platforms
@@ -51,394 +52,431 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
youtube-dl [OPTIONS] URL [URL...] youtube-dl [OPTIONS] URL [URL...]
# OPTIONS # OPTIONS
-h, --help Print this help text and exit -h, --help Print this help text and exit
--version Print program version and exit --version Print program version and exit
-U, --update Update this program to latest version. Make -U, --update Update this program to latest version.
sure that you have sufficient permissions Make sure that you have sufficient
(run with sudo if needed) permissions (run with sudo if needed)
-i, --ignore-errors Continue on download errors, for example to -i, --ignore-errors Continue on download errors, for
skip unavailable videos in a playlist example to skip unavailable videos in a
--abort-on-error Abort downloading of further videos (in the playlist
playlist or the command line) if an error --abort-on-error Abort downloading of further videos (in
occurs the playlist or the command line) if an
--dump-user-agent Display the current browser identification error occurs
--list-extractors List all supported extractors --dump-user-agent Display the current browser
--extractor-descriptions Output descriptions of all supported identification
extractors --list-extractors List all supported extractors
--force-generic-extractor Force extraction to use the generic --extractor-descriptions Output descriptions of all supported
extractor extractors
--default-search PREFIX Use this prefix for unqualified URLs. For --force-generic-extractor Force extraction to use the generic
example "gvsearch2:" downloads two videos extractor
from google videos for youtube-dl "large --default-search PREFIX Use this prefix for unqualified URLs.
apple". Use the value "auto" to let For example "gvsearch2:" downloads two
youtube-dl guess ("auto_warning" to emit a videos from google videos for youtube-
warning when guessing). "error" just throws dl "large apple". Use the value "auto"
an error. The default value "fixup_error" to let youtube-dl guess ("auto_warning"
repairs broken URLs, but emits an error if to emit a warning when guessing).
this is not possible instead of searching. "error" just throws an error. The
--ignore-config Do not read configuration files. When given default value "fixup_error" repairs
in the global configuration file broken URLs, but emits an error if this
/etc/youtube-dl.conf: Do not read the user is not possible instead of searching.
configuration in ~/.config/youtube- --ignore-config Do not read configuration files. When
dl/config (%APPDATA%/youtube-dl/config.txt given in the global configuration file
on Windows) /etc/youtube-dl.conf: Do not read the
--config-location PATH Location of the configuration file; either user configuration in
the path to the config or its containing ~/.config/youtube-dl/config
directory. (%APPDATA%/youtube-dl/config.txt on
--flat-playlist Do not extract the videos of a playlist, Windows)
only list them. --config-location PATH Location of the configuration file;
--mark-watched Mark videos watched (YouTube only) either the path to the config or its
--no-mark-watched Do not mark videos watched (YouTube only) containing directory.
--no-color Do not emit color codes in output --flat-playlist Do not extract the videos of a
playlist, only list them.
--mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (YouTube
only)
--no-color Do not emit color codes in output
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. --proxy URL Use the specified HTTP/HTTPS/SOCKS
To enable SOCKS proxy, specify a proper proxy. To enable SOCKS proxy, specify a
scheme. For example proper scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty socks5://127.0.0.1:1080/. Pass in an
string (--proxy "") for direct connection empty string (--proxy "") for direct
--socket-timeout SECONDS Time to wait before giving up, in seconds connection
--source-address IP Client-side IP address to bind to --socket-timeout SECONDS Time to wait before giving up, in
-4, --force-ipv4 Make all connections via IPv4 seconds
-6, --force-ipv6 Make all connections via IPv6 --source-address IP Client-side IP address to bind to
-4, --force-ipv4 Make all connections via IPv4
-6, --force-ipv6 Make all connections via IPv6
## Geo Restriction: ## Geo Restriction:
--geo-verification-proxy URL Use this proxy to verify the IP address for --geo-verification-proxy URL Use this proxy to verify the IP address
some geo-restricted sites. The default for some geo-restricted sites. The
proxy specified by --proxy (or none, if the default proxy specified by --proxy (or
option is not present) is used for the none, if the option is not present) is
actual downloading. used for the actual downloading.
--geo-bypass Bypass geographic restriction via faking --geo-bypass Bypass geographic restriction via
X-Forwarded-For HTTP header faking X-Forwarded-For HTTP header
--no-geo-bypass Do not bypass geographic restriction via --no-geo-bypass Do not bypass geographic restriction
faking X-Forwarded-For HTTP header via faking X-Forwarded-For HTTP header
--geo-bypass-country CODE Force bypass geographic restriction with --geo-bypass-country CODE Force bypass geographic restriction
explicitly provided two-letter ISO 3166-2 with explicitly provided two-letter ISO
country code 3166-2 country code
--geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction with --geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction
explicitly provided IP block in CIDR with explicitly provided IP block in
notation CIDR notation
## Video Selection: ## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1) --playlist-start NUMBER Playlist video to start at (default is
--playlist-end NUMBER Playlist video to end at (default is last) 1)
--playlist-items ITEM_SPEC Playlist video items to download. Specify --playlist-end NUMBER Playlist video to end at (default is
indices of the videos in the playlist last)
separated by commas like: "--playlist-items --playlist-items ITEM_SPEC Playlist video items to download.
1,2,5,8" if you want to download videos Specify indices of the videos in the
indexed 1, 2, 5, 8 in the playlist. You can playlist separated by commas like: "--
specify range: "--playlist-items playlist-items 1,2,5,8" if you want to
1-3,7,10-13", it will download the videos download videos indexed 1, 2, 5, 8 in
at index 1, 2, 3, 7, 10, 11, 12 and 13. the playlist. You can specify range: "
--match-title REGEX Download only matching titles (regex or --playlist-items 1-3,7,10-13", it will
caseless sub-string) download the videos at index 1, 2, 3,
--reject-title REGEX Skip download for matching titles (regex or 7, 10, 11, 12 and 13.
caseless sub-string) --match-title REGEX Download only matching titles (regex or
--max-downloads NUMBER Abort after downloading NUMBER files caseless sub-string)
--min-filesize SIZE Do not download any videos smaller than --reject-title REGEX Skip download for matching titles
SIZE (e.g. 50k or 44.6m) (regex or caseless sub-string)
--max-filesize SIZE Do not download any videos larger than SIZE --max-downloads NUMBER Abort after downloading NUMBER files
(e.g. 50k or 44.6m) --min-filesize SIZE Do not download any videos smaller than
--date DATE Download only videos uploaded in this date SIZE (e.g. 50k or 44.6m)
--datebefore DATE Download only videos uploaded on or before --max-filesize SIZE Do not download any videos larger than
this date (i.e. inclusive) SIZE (e.g. 50k or 44.6m)
--dateafter DATE Download only videos uploaded on or after --date DATE Download only videos uploaded in this
this date (i.e. inclusive) date
--min-views COUNT Do not download any videos with less than --datebefore DATE Download only videos uploaded on or
COUNT views before this date (i.e. inclusive)
--max-views COUNT Do not download any videos with more than --dateafter DATE Download only videos uploaded on or
COUNT views after this date (i.e. inclusive)
--match-filter FILTER Generic video filter. Specify any key (see --min-views COUNT Do not download any videos with less
the "OUTPUT TEMPLATE" for a list of than COUNT views
available keys) to match if the key is --max-views COUNT Do not download any videos with more
present, !key to check if the key is not than COUNT views
present, key > NUMBER (like "comment_count --match-filter FILTER Generic video filter. Specify any key
> 12", also works with >=, <, <=, !=, =) to (see the "OUTPUT TEMPLATE" for a list
compare against a number, key = 'LITERAL' of available keys) to match if the key
(like "uploader = 'Mike Smith'", also works is present, !key to check if the key is
with !=) to match against a string literal not present, key > NUMBER (like
and & to require multiple matches. Values "comment_count > 12", also works with
which are not known are excluded unless you >=, <, <=, !=, =) to compare against a
put a question mark (?) after the operator. number, key = 'LITERAL' (like "uploader
For example, to only match videos that have = 'Mike Smith'", also works with !=) to
been liked more than 100 times and disliked match against a string literal and & to
less than 50 times (or the dislike require multiple matches. Values which
functionality is not available at the given are not known are excluded unless you
service), but who also have a description, put a question mark (?) after the
use --match-filter "like_count > 100 & operator. For example, to only match
dislike_count <? 50 & description" . videos that have been liked more than
--no-playlist Download only the video, if the URL refers 100 times and disliked less than 50
to a video and a playlist. times (or the dislike functionality is
--yes-playlist Download the playlist, if the URL refers to not available at the given service),
a video and a playlist. but who also have a description, use
--age-limit YEARS Download only videos suitable for the given --match-filter "like_count > 100 &
age dislike_count <? 50 & description" .
--download-archive FILE Download only videos not listed in the --no-playlist Download only the video, if the URL
archive file. Record the IDs of all refers to a video and a playlist.
downloaded videos in it. --yes-playlist Download the playlist, if the URL
--include-ads Download advertisements as well refers to a video and a playlist.
(experimental) --age-limit YEARS Download only videos suitable for the
given age
--download-archive FILE Download only videos not listed in the
archive file. Record the IDs of all
downloaded videos in it.
--include-ads Download advertisements as well
(experimental)
## Download Options: ## Download Options:
-r, --limit-rate RATE Maximum download rate in bytes per second -r, --limit-rate RATE Maximum download rate in bytes per
(e.g. 50K or 4.2M) second (e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
--fragment-retries RETRIES Number of retries for a fragment (default --fragment-retries RETRIES Number of retries for a fragment
is 10), or "infinite" (DASH, hlsnative and (default is 10), or "infinite" (DASH,
ISM) hlsnative and ISM)
--skip-unavailable-fragments Skip unavailable fragments (DASH, hlsnative --skip-unavailable-fragments Skip unavailable fragments (DASH,
and ISM) hlsnative and ISM)
--abort-on-unavailable-fragment Abort downloading when some fragment is not --abort-on-unavailable-fragment Abort downloading when some fragment is
available not available
--keep-fragments Keep downloaded fragments on disk after --keep-fragments Keep downloaded fragments on disk after
downloading is finished; fragments are downloading is finished; fragments are
erased by default erased by default
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or
(default is 1024) 16K) (default is 1024)
--no-resize-buffer Do not automatically adjust the buffer --no-resize-buffer Do not automatically adjust the buffer
size. By default, the buffer size is size. By default, the buffer size is
automatically resized from an initial value automatically resized from an initial
of SIZE. value of SIZE.
--http-chunk-size SIZE Size of a chunk for chunk-based HTTP --http-chunk-size SIZE Size of a chunk for chunk-based HTTP
downloading (e.g. 10485760 or 10M) (default downloading (e.g. 10485760 or 10M)
is disabled). May be useful for bypassing (default is disabled). May be useful
bandwidth throttling imposed by a webserver for bypassing bandwidth throttling
(experimental) imposed by a webserver (experimental)
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse
--playlist-random Download playlist videos in random order order
--xattr-set-filesize Set file xattribute ytdl.filesize with --playlist-random Download playlist videos in random
expected file size order
--hls-prefer-native Use the native HLS downloader instead of --xattr-set-filesize Set file xattribute ytdl.filesize with
ffmpeg expected file size
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS --hls-prefer-native Use the native HLS downloader instead
downloader of ffmpeg
--hls-use-mpegts Use the mpegts container for HLS videos, --hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
allowing to play the video while downloader
downloading (some players may not be able --hls-use-mpegts Use the mpegts container for HLS
to play it) videos, allowing to play the video
--external-downloader COMMAND Use the specified external downloader. while downloading (some players may not
Currently supports be able to play it)
aria2c,avconv,axel,curl,ffmpeg,httpie,wget --external-downloader COMMAND Use the specified external downloader.
--external-downloader-args ARGS Give these arguments to the external Currently supports aria2c,avconv,axel,c
downloader url,ffmpeg,httpie,wget
--external-downloader-args ARGS Give these arguments to the external
downloader
## Filesystem Options: ## Filesystem Options:
-a, --batch-file FILE File containing URLs to download ('-' for -a, --batch-file FILE File containing URLs to download ('-'
stdin), one URL per line. Lines starting for stdin), one URL per line. Lines
with '#', ';' or ']' are considered as starting with '#', ';' or ']' are
comments and ignored. considered as comments and ignored.
--id Use only video ID in file name --id Use only video ID in file name
-o, --output TEMPLATE Output filename template, see the "OUTPUT -o, --output TEMPLATE Output filename template, see the
TEMPLATE" for all the info "OUTPUT TEMPLATE" for all the info
--autonumber-start NUMBER Specify the start value for %(autonumber)s --output-na-placeholder PLACEHOLDER Placeholder value for unavailable meta
(default is 1) fields in output filename template
--restrict-filenames Restrict filenames to only ASCII (default is "NA")
characters, and avoid "&" and spaces in --autonumber-start NUMBER Specify the start value for
filenames %(autonumber)s (default is 1)
-w, --no-overwrites Do not overwrite files --restrict-filenames Restrict filenames to only ASCII
-c, --continue Force resume of partially downloaded files. characters, and avoid "&" and spaces in
By default, youtube-dl will resume filenames
downloads if possible. -w, --no-overwrites Do not overwrite files
--no-continue Do not resume partially downloaded files -c, --continue Force resume of partially downloaded
(restart from beginning) files. By default, youtube-dl will
--no-part Do not use .part files - write directly resume downloads if possible.
into output file --no-continue Do not resume partially downloaded
--no-mtime Do not use the Last-modified header to set files (restart from beginning)
the file modification time --no-part Do not use .part files - write directly
--write-description Write video description to a .description into output file
file --no-mtime Do not use the Last-modified header to
--write-info-json Write video metadata to a .info.json file set the file modification time
--write-annotations Write video annotations to a --write-description Write video description to a
.annotations.xml file .description file
--load-info-json FILE JSON file containing the video information --write-info-json Write video metadata to a .info.json
(created with the "--write-info-json" file
option) --write-annotations Write video annotations to a
--cookies FILE File to read cookies from and dump cookie .annotations.xml file
jar in --load-info-json FILE JSON file containing the video
--cache-dir DIR Location in the filesystem where youtube-dl information (created with the "--write-
can store some downloaded information info-json" option)
permanently. By default --cookies FILE File to read cookies from and dump
$XDG_CACHE_HOME/youtube-dl or cookie jar in
~/.cache/youtube-dl . At the moment, only --cache-dir DIR Location in the filesystem where
YouTube player files (for videos with youtube-dl can store some downloaded
obfuscated signatures) are cached, but that information permanently. By default
may change. $XDG_CACHE_HOME/youtube-dl or
--no-cache-dir Disable filesystem caching ~/.cache/youtube-dl . At the moment,
--rm-cache-dir Delete all filesystem cache files only YouTube player files (for videos
with obfuscated signatures) are cached,
but that may change.
--no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files
## Thumbnail images: ## Thumbnail images:
--write-thumbnail Write thumbnail image to disk --write-thumbnail Write thumbnail image to disk
--write-all-thumbnails Write all thumbnail image formats to disk --write-all-thumbnails Write all thumbnail image formats to
--list-thumbnails Simulate and list all available thumbnail disk
formats --list-thumbnails Simulate and list all available
thumbnail formats
## Verbosity / Simulation Options: ## Verbosity / Simulation Options:
-q, --quiet Activate quiet mode -q, --quiet Activate quiet mode
--no-warnings Ignore warnings --no-warnings Ignore warnings
-s, --simulate Do not download the video and do not write -s, --simulate Do not download the video and do not
anything to disk write anything to disk
--skip-download Do not download the video --skip-download Do not download the video
-g, --get-url Simulate, quiet but print URL -g, --get-url Simulate, quiet but print URL
-e, --get-title Simulate, quiet but print title -e, --get-title Simulate, quiet but print title
--get-id Simulate, quiet but print id --get-id Simulate, quiet but print id
--get-thumbnail Simulate, quiet but print thumbnail URL --get-thumbnail Simulate, quiet but print thumbnail URL
--get-description Simulate, quiet but print video description --get-description Simulate, quiet but print video
--get-duration Simulate, quiet but print video length description
--get-filename Simulate, quiet but print output filename --get-duration Simulate, quiet but print video length
--get-format Simulate, quiet but print output format --get-filename Simulate, quiet but print output
-j, --dump-json Simulate, quiet but print JSON information. filename
See the "OUTPUT TEMPLATE" for a description --get-format Simulate, quiet but print output format
of available keys. -j, --dump-json Simulate, quiet but print JSON
-J, --dump-single-json Simulate, quiet but print JSON information information. See the "OUTPUT TEMPLATE"
for each command-line argument. If the URL for a description of available keys.
refers to a playlist, dump the whole -J, --dump-single-json Simulate, quiet but print JSON
playlist information in a single line. information for each command-line
--print-json Be quiet and print the video information as argument. If the URL refers to a
JSON (video is still being downloaded). playlist, dump the whole playlist
--newline Output progress bar as new lines information in a single line.
--no-progress Do not print progress bar --print-json Be quiet and print the video
--console-title Display progress in console titlebar information as JSON (video is still
-v, --verbose Print various debugging information being downloaded).
--dump-pages Print downloaded pages encoded using base64 --newline Output progress bar as new lines
to debug problems (very verbose) --no-progress Do not print progress bar
--write-pages Write downloaded intermediary pages to --console-title Display progress in console titlebar
files in the current directory to debug -v, --verbose Print various debugging information
problems --dump-pages Print downloaded pages encoded using
--print-traffic Display sent and read HTTP traffic base64 to debug problems (very verbose)
-C, --call-home Contact the youtube-dl server for debugging --write-pages Write downloaded intermediary pages to
--no-call-home Do NOT contact the youtube-dl server for files in the current directory to debug
debugging problems
--print-traffic Display sent and read HTTP traffic
-C, --call-home Contact the youtube-dl server for
debugging
--no-call-home Do NOT contact the youtube-dl server
for debugging
## Workarounds: ## Workarounds:
--encoding ENCODING Force the specified encoding (experimental) --encoding ENCODING Force the specified encoding
--no-check-certificate Suppress HTTPS certificate validation (experimental)
--prefer-insecure Use an unencrypted connection to retrieve --no-check-certificate Suppress HTTPS certificate validation
information about the video. (Currently --prefer-insecure Use an unencrypted connection to
supported only for YouTube) retrieve information about the video.
--user-agent UA Specify a custom user agent (Currently supported only for YouTube)
--referer URL Specify a custom referer, use if the video --user-agent UA Specify a custom user agent
access is restricted to one domain --referer URL Specify a custom referer, use if the
--add-header FIELD:VALUE Specify a custom HTTP header and its value, video access is restricted to one
separated by a colon ':'. You can use this domain
option multiple times --add-header FIELD:VALUE Specify a custom HTTP header and its
--bidi-workaround Work around terminals that lack value, separated by a colon ':'. You
bidirectional text support. Requires bidiv can use this option multiple times
or fribidi executable in PATH --bidi-workaround Work around terminals that lack
--sleep-interval SECONDS Number of seconds to sleep before each bidirectional text support. Requires
download when used alone or a lower bound bidiv or fribidi executable in PATH
of a range for randomized sleep before each --sleep-interval SECONDS Number of seconds to sleep before each
download (minimum possible number of download when used alone or a lower
seconds to sleep) when used along with bound of a range for randomized sleep
--max-sleep-interval. before each download (minimum possible
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep number of seconds to sleep) when used
before each download (maximum possible along with --max-sleep-interval.
number of seconds to sleep). Must only be --max-sleep-interval SECONDS Upper bound of a range for randomized
used along with --min-sleep-interval. sleep before each download (maximum
possible number of seconds to sleep).
Must only be used along with --min-
sleep-interval.
## Video Format Options: ## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT -f, --format FORMAT Video format code, see the "FORMAT
SELECTION" for all the info SELECTION" for all the info
--all-formats Download all available video formats --all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific --prefer-free-formats Prefer free video formats unless a
one is requested specific one is requested
-F, --list-formats List all available formats of requested -F, --list-formats List all available formats of requested
videos videos
--youtube-skip-dash-manifest Do not download the DASH manifests and --youtube-skip-dash-manifest Do not download the DASH manifests and
related data on YouTube videos related data on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g. --merge-output-format FORMAT If a merge is required (e.g.
bestvideo+bestaudio), output to given bestvideo+bestaudio), output to given
container format. One of mkv, mp4, ogg, container format. One of mkv, mp4, ogg,
webm, flv. Ignored if no merge is required webm, flv. Ignored if no merge is
required
## Subtitle Options: ## Subtitle Options:
--write-sub Write subtitle file --write-sub Write subtitle file
--write-auto-sub Write automatically generated subtitle file --write-auto-sub Write automatically generated subtitle
(YouTube only) file (YouTube only)
--all-subs Download all the available subtitles of the --all-subs Download all the available subtitles of
video the video
--list-subs List all available subtitles for the video --list-subs List all available subtitles for the
--sub-format FORMAT Subtitle format, accepts formats video
preference, for example: "srt" or --sub-format FORMAT Subtitle format, accepts formats
"ass/srt/best" preference, for example: "srt" or
--sub-lang LANGS Languages of the subtitles to download "ass/srt/best"
(optional) separated by commas, use --list- --sub-lang LANGS Languages of the subtitles to download
subs for available language tags (optional) separated by commas, use
--list-subs for available language tags
## Authentication Options: ## Authentication Options:
-u, --username USERNAME Login with this account ID -u, --username USERNAME Login with this account ID
-p, --password PASSWORD Account password. If this option is left -p, --password PASSWORD Account password. If this option is
out, youtube-dl will ask interactively. left out, youtube-dl will ask
-2, --twofactor TWOFACTOR Two-factor authentication code interactively.
-n, --netrc Use .netrc authentication data -2, --twofactor TWOFACTOR Two-factor authentication code
--video-password PASSWORD Video password (vimeo, smotri, youku) -n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, youku)
## Adobe Pass Options: ## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV --ap-mso MSO Adobe Pass multiple-system operator (TV
provider) identifier, use --ap-list-mso for provider) identifier, use --ap-list-mso
a list of available MSOs for a list of available MSOs
--ap-username USERNAME Multiple-system operator account login --ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password. --ap-password PASSWORD Multiple-system operator account
If this option is left out, youtube-dl will password. If this option is left out,
ask interactively. youtube-dl will ask interactively.
--ap-list-mso List all supported multiple-system --ap-list-mso List all supported multiple-system
operators operators
## Post-processing Options: ## Post-processing Options:
-x, --extract-audio Convert video files to audio-only files -x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or (requires ffmpeg/avconv and
avprobe) ffprobe/avprobe)
--audio-format FORMAT Specify audio format: "best", "aac", --audio-format FORMAT Specify audio format: "best", "aac",
"flac", "mp3", "m4a", "opus", "vorbis", or "flac", "mp3", "m4a", "opus", "vorbis",
"wav"; "best" by default; No effect without or "wav"; "best" by default; No effect
-x without -x
--audio-quality QUALITY Specify ffmpeg/avconv audio quality, insert --audio-quality QUALITY Specify ffmpeg/avconv audio quality,
a value between 0 (better) and 9 (worse) insert a value between 0 (better) and 9
for VBR or a specific bitrate like 128K (worse) for VBR or a specific bitrate
(default 5) like 128K (default 5)
--recode-video FORMAT Encode the video to another format if --recode-video FORMAT Encode the video to another format if
necessary (currently supported: necessary (currently supported:
mp4|flv|ogg|webm|mkv|avi) mp4|flv|ogg|webm|mkv|avi)
--postprocessor-args ARGS Give these arguments to the postprocessor --postprocessor-args ARGS Give these arguments to the
-k, --keep-video Keep the video file on disk after the post- postprocessor
processing; the video is erased by default -k, --keep-video Keep the video file on disk after the
--no-post-overwrites Do not overwrite post-processed files; the post-processing; the video is erased by
post-processed files are overwritten by default
default --no-post-overwrites Do not overwrite post-processed files;
--embed-subs Embed subtitles in the video (only for mp4, the post-processed files are
webm and mkv videos) overwritten by default
--embed-thumbnail Embed thumbnail in the audio as cover art --embed-subs Embed subtitles in the video (only for
--add-metadata Write metadata to the video file mp4, webm and mkv videos)
--metadata-from-title FORMAT Parse additional metadata like song title / --embed-thumbnail Embed thumbnail in the audio as cover
artist from the video title. The format art
syntax is the same as --output. Regular --add-metadata Write metadata to the video file
expression with named capture groups may --metadata-from-title FORMAT Parse additional metadata like song
also be used. The parsed parameters replace title / artist from the video title.
existing values. Example: --metadata-from- The format syntax is the same as
title "%(artist)s - %(title)s" matches a --output. Regular expression with named
title like "Coldplay - Paradise". Example capture groups may also be used. The
(regex): --metadata-from-title parsed parameters replace existing
"(?P<artist>.+?) - (?P<title>.+)" values. Example: --metadata-from-title
--xattrs Write metadata to the video file's xattrs "%(artist)s - %(title)s" matches a
(using dublin core and xdg standards) title like "Coldplay - Paradise".
--fixup POLICY Automatically correct known faults of the Example (regex): --metadata-from-title
file. One of never (do nothing), warn (only "(?P<artist>.+?) - (?P<title>.+)"
emit a warning), detect_or_warn (the --xattrs Write metadata to the video file's
default; fix file if we can, warn xattrs (using dublin core and xdg
otherwise) standards)
--prefer-avconv Prefer avconv over ffmpeg for running the --fixup POLICY Automatically correct known faults of
postprocessors the file. One of never (do nothing),
--prefer-ffmpeg Prefer ffmpeg over avconv for running the warn (only emit a warning),
postprocessors (default) detect_or_warn (the default; fix file
--ffmpeg-location PATH Location of the ffmpeg/avconv binary; if we can, warn otherwise)
either the path to the binary or its --prefer-avconv Prefer avconv over ffmpeg for running
containing directory. the postprocessors
--exec CMD Execute a command on the file after --prefer-ffmpeg Prefer ffmpeg over avconv for running
downloading and post-processing, similar to the postprocessors (default)
find's -exec syntax. Example: --exec 'adb --ffmpeg-location PATH Location of the ffmpeg/avconv binary;
push {} /sdcard/Music/ && rm {}' either the path to the binary or its
--convert-subs FORMAT Convert the subtitles to other format containing directory.
(currently supported: srt|ass|vtt|lrc) --exec CMD Execute a command on the file after
downloading and post-processing,
similar to find's -exec syntax.
Example: --exec 'adb push {}
/sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc)
# CONFIGURATION # CONFIGURATION
@@ -545,7 +583,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `extractor` (string): Name of the extractor - `extractor` (string): Name of the extractor
- `extractor_key` (string): Key name of the extractor - `extractor_key` (string): Key name of the extractor
- `epoch` (numeric): Unix epoch when creating the file - `epoch` (numeric): Unix epoch when creating the file
- `autonumber` (numeric): Five-digit number that will be increased with each download, starting at zero - `autonumber` (numeric): Number that will be increased with each download, starting at `--autonumber-start`
- `playlist` (string): Name or id of the playlist that contains the video - `playlist` (string): Name or id of the playlist that contains the video
- `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist - `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id` (string): Playlist identifier - `playlist_id` (string): Playlist identifier
@@ -582,7 +620,7 @@ Available for the media that is a track or a part of a music album:
- `disc_number` (numeric): Number of the disc or other physical medium the track belongs to - `disc_number` (numeric): Number of the disc or other physical medium the track belongs to
- `release_year` (numeric): Year (YYYY) when the album was released - `release_year` (numeric): Year (YYYY) when the album was released
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`. Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default).
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory. For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
@@ -677,6 +715,7 @@ Also filtering work for comparisons `=` (equals), `^=` (starts with), `$=` (ends
- `container`: Name of the container format - `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`) - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- `format_id`: A short description of the format - `format_id`: A short description of the format
- `language`: Language code
Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain). Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain).
@@ -879,7 +918,7 @@ Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the op
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox). In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.

View File

@@ -1,5 +0,0 @@
#!/bin/bash
wget http://central.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar
java -jar jython-installer-2.7.1.jar -s -d "$HOME/jython"
$HOME/jython/bin/jython -m pip install nose

View File

@@ -61,7 +61,7 @@ def build_lazy_ie(ie, name):
return s return s
# find the correct sorting and add the required base classes so that sublcasses # find the correct sorting and add the required base classes so that subclasses
# can be correctly created # can be correctly created
classes = _ALL_CLASSES[:-1] classes = _ALL_CLASSES[:-1]
ordered_cls = [] ordered_cls = []

17
devscripts/run_tests.bat Normal file
View File

@@ -0,0 +1,17 @@
@echo off
rem Keep this list in sync with the `offlinetest` target in Makefile
set DOWNLOAD_TESTS="age_restriction^|download^|iqiyi_sdk_interpreter^|socks^|subtitles^|write_annotations^|youtube_lists^|youtube_signature"
if "%YTDL_TEST_SET%" == "core" (
set test_set="-I test_("%DOWNLOAD_TESTS%")\.py"
set multiprocess_args=""
) else if "%YTDL_TEST_SET%" == "download" (
set test_set="-I test_(?!"%DOWNLOAD_TESTS%").+\.py"
set multiprocess_args="--processes=4 --process-timeout=540"
) else (
echo YTDL_TEST_SET is not set or invalid
exit /b 1
)
nosetests test --verbose %test_set:"=% %multiprocess_args:"=%

View File

@@ -35,32 +35,39 @@
- **adobetv:video** - **adobetv:video**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **aenetworks:collection**
- **aenetworks:show**
- **afreecatv**: afreecatv.com - **afreecatv**: afreecatv.com
- **AirMozilla** - **AirMozilla**
- **AliExpressLive** - **AliExpressLive**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
- **AlphaPorno** - **AlphaPorno**
- **Amara**
- **AMCNetworks** - **AMCNetworks**
- **AmericasTestKitchen** - **AmericasTestKitchen**
- **AmericasTestKitchenSeason**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimeOnDemand** - **AnimeOnDemand**
- **Anvato** - **Anvato**
- **aol.com** - **aol.com**: Yahoo screen and movies
- **APA** - **APA**
- **Aparat** - **Aparat**
- **AppleConnect** - **AppleConnect**
- **AppleDaily**: 臺灣蘋果日報 - **AppleDaily**: 臺灣蘋果日報
- **ApplePodcasts**
- **appletrailers** - **appletrailers**
- **appletrailers:section** - **appletrailers:section**
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ArcPublishing**
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **ARDBetaMediathek** - **ARDBetaMediathek**
- **Arkena** - **Arkena**
- **arte.tv:+7** - **arte.sky.it**
- **arte.tv:embed** - **ArteTV**
- **arte.tv:playlist** - **ArteTVEmbed**
- **ArteTVPlaylist**
- **AsianCrush** - **AsianCrush**
- **AsianCrushPlaylist** - **AsianCrushPlaylist**
- **AtresPlayer** - **AtresPlayer**
@@ -93,6 +100,10 @@
- **BellMedia** - **BellMedia**
- **Bet** - **Bet**
- **bfi:player** - **bfi:player**
- **bfmtv**
- **bfmtv:article**
- **bfmtv:live**
- **BibelTV**
- **Bigflix** - **Bigflix**
- **Bild**: Bild.de - **Bild**: Bild.de
- **BiliBili** - **BiliBili**
@@ -100,6 +111,7 @@
- **BilibiliAudioAlbum** - **BilibiliAudioAlbum**
- **BiliBiliPlayer** - **BiliBiliPlayer**
- **BioBioChileTV** - **BioBioChileTV**
- **Biography**
- **BIQLE** - **BIQLE**
- **BitChute** - **BitChute**
- **BitChuteChannel** - **BitChuteChannel**
@@ -108,7 +120,9 @@
- **blinkx** - **blinkx**
- **Bloomberg** - **Bloomberg**
- **BokeCC** - **BokeCC**
- **BongaCams**
- **BostonGlobe** - **BostonGlobe**
- **Box**
- **Bpb**: Bundeszentrale für politische Bildung - **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk - **BR**: Bayerischer Rundfunk
- **BravoTV** - **BravoTV**
@@ -141,6 +155,7 @@
- **CBS** - **CBS**
- **CBSInteractive** - **CBSInteractive**
- **CBSLocal** - **CBSLocal**
- **CBSLocalArticle**
- **cbsnews**: CBS News - **cbsnews**: CBS News
- **cbsnews:embed** - **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
@@ -156,6 +171,7 @@
- **Chilloutzone** - **Chilloutzone**
- **chirbit** - **chirbit**
- **chirbit:profile** - **chirbit:profile**
- **cielotv.it**
- **Cinchcast** - **Cinchcast**
- **Cinemax** - **Cinemax**
- **CiscoLiveSearch** - **CiscoLiveSearch**
@@ -177,8 +193,6 @@
- **CNNArticle** - **CNNArticle**
- **CNNBlogs** - **CNNBlogs**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralFullEpisodes**
- **ComedyCentralShortname**
- **ComedyCentralTV** - **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **CONtv** - **CONtv**
@@ -189,9 +203,9 @@
- **CrooksAndLiars** - **CrooksAndLiars**
- **crunchyroll** - **crunchyroll**
- **crunchyroll:playlist** - **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV**
- **CTVNews** - **CTVNews**
- **cu.ntv.co.jp**: Nippon Television Network - **cu.ntv.co.jp**: Nippon Television Network
- **Culturebox** - **Culturebox**
@@ -262,7 +276,6 @@
- **ESPNArticle** - **ESPNArticle**
- **EsriVideo** - **EsriVideo**
- **Europa** - **Europa**
- **EveryonesMixtape**
- **EWETV** - **EWETV**
- **ExpoTV** - **ExpoTV**
- **Expressen** - **Expressen**
@@ -304,11 +317,11 @@
- **FrontendMasters** - **FrontendMasters**
- **FrontendMastersCourse** - **FrontendMastersCourse**
- **FrontendMastersLesson** - **FrontendMastersLesson**
- **FujiTVFODPlus7**
- **Funimation** - **Funimation**
- **Funk** - **Funk**
- **Fusion** - **Fusion**
- **Fux** - **Fux**
- **FXNetworks**
- **Gaia** - **Gaia**
- **GameInformer** - **GameInformer**
- **GameSpot** - **GameSpot**
@@ -327,6 +340,8 @@
- **Go** - **Go**
- **GodTube** - **GodTube**
- **Golem** - **Golem**
- **google:podcasts**
- **google:podcasts:feed**
- **GoogleDrive** - **GoogleDrive**
- **Goshgay** - **Goshgay**
- **GPUTechConf** - **GPUTechConf**
@@ -341,6 +356,7 @@
- **hgtv.com:show** - **hgtv.com:show**
- **HiDive** - **HiDive**
- **HistoricFilms** - **HistoricFilms**
- **history:player**
- **history:topic**: History.com Topic - **history:topic**: History.com Topic
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
@@ -360,6 +376,8 @@
- **HungamaSong** - **HungamaSong**
- **Hypem** - **Hypem**
- **ign.com** - **ign.com**
- **IHeartRadio**
- **iheartradio:podcast**
- **imdb**: Internet Movie Database trailers - **imdb**: Internet Movie Database trailers
- **imdb:list**: Internet Movie Database lists - **imdb:list**: Internet Movie Database lists
- **Imgur** - **Imgur**
@@ -393,14 +411,14 @@
- **JWPlatform** - **JWPlatform**
- **Kakao** - **Kakao**
- **Kaltura** - **Kaltura**
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
- **KarriereVideos** - **KarriereVideos**
- **Katsomo** - **Katsomo**
- **KeezMovies** - **KeezMovies**
- **Ketnet** - **Ketnet**
- **KhanAcademy** - **khanacademy**
- **khanacademy:unit**
- **KickStarter** - **KickStarter**
- **KinjaEmbed** - **KinjaEmbed**
- **KinoPoisk** - **KinoPoisk**
@@ -417,6 +435,8 @@
- **la7.it** - **la7.it**
- **laola1tv** - **laola1tv**
- **laola1tv:embed** - **laola1tv:embed**
- **lbry**
- **lbry:channel**
- **LCI** - **LCI**
- **Lcp** - **Lcp**
- **LcpPlay** - **LcpPlay**
@@ -466,6 +486,7 @@
- **massengeschmack.tv** - **massengeschmack.tv**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **MedalTV**
- **media.ccc.de** - **media.ccc.de**
- **media.ccc.de:lists** - **media.ccc.de:lists**
- **Medialaan** - **Medialaan**
@@ -480,9 +501,13 @@
- **META** - **META**
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **mewatch**
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV - **MGTV**: 芒果TV
- **MiaoPai** - **MiaoPai**
- **minds**
- **minds:channel**
- **minds:group**
- **MinistryGrid** - **MinistryGrid**
- **Minoto** - **Minoto**
- **miomio.tv** - **miomio.tv**
@@ -490,8 +515,6 @@
- **mixcloud** - **mixcloud**
- **mixcloud:playlist** - **mixcloud:playlist**
- **mixcloud:user** - **mixcloud:user**
- **Mixer:live**
- **Mixer:vod**
- **MLB** - **MLB**
- **Mnet** - **Mnet**
- **MNetTV** - **MNetTV**
@@ -514,6 +537,7 @@
- **mtv:video** - **mtv:video**
- **mtvjapan** - **mtvjapan**
- **mtvservices:embedded** - **mtvservices:embedded**
- **MTVUutisetArticle**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **mva**: Microsoft Virtual Academy videos - **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses - **mva:course**: Microsoft Virtual Academy courses
@@ -532,6 +556,11 @@
- **NationalGeographicTV** - **NationalGeographicTV**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **nba:watch**
- **nba:watch:collection**
- **NBAChannel**
- **NBAEmbed**
- **NBAWatchEmbed**
- **NBC** - **NBC**
- **NBCNews** - **NBCNews**
- **nbcolympics** - **nbcolympics**
@@ -561,8 +590,10 @@
- **NextTV**: 壹電視 - **NextTV**: 壹電視
- **Nexx** - **Nexx**
- **NexxEmbed** - **NexxEmbed**
- **nfl.com** - **nfl.com** (Currently broken)
- **nfl.com:article** (Currently broken)
- **NhkVod** - **NhkVod**
- **NhkVodProgram**
- **nhl.com** - **nhl.com**
- **nick.com** - **nick.com**
- **nick.de** - **nick.de**
@@ -576,7 +607,6 @@
- **njoy:embed** - **njoy:embed**
- **NJPWWorld**: 新日本プロレスワールド - **NJPWWorld**: 新日本プロレスワールド
- **NobelPrize** - **NobelPrize**
- **Noco**
- **NonkTube** - **NonkTube**
- **Noovo** - **Noovo**
- **Normalboots** - **Normalboots**
@@ -594,6 +624,7 @@
- **Npr** - **Npr**
- **NRK** - **NRK**
- **NRKPlaylist** - **NRKPlaylist**
- **NRKRadioPodkast**
- **NRKSkole**: NRK Skole - **NRKSkole**: NRK Skole
- **NRKTV**: NRK TV and NRK Radio - **NRKTV**: NRK TV and NRK Radio
- **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte - **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte
@@ -606,6 +637,7 @@
- **Nuvid** - **Nuvid**
- **NYTimes** - **NYTimes**
- **NYTimesArticle** - **NYTimesArticle**
- **NYTimesCooking**
- **NZZ** - **NZZ**
- **ocw.mit.edu** - **ocw.mit.edu**
- **OdaTV** - **OdaTV**
@@ -658,10 +690,13 @@
- **PicartoVod** - **PicartoVod**
- **Piksel** - **Piksel**
- **Pinkbike** - **Pinkbike**
- **Pinterest**
- **PinterestCollection**
- **Pladform** - **Pladform**
- **Platzi** - **Platzi**
- **PlatziCourse** - **PlatziCourse**
- **play.fm** - **play.fm**
- **player.sky.it**
- **PlayPlusTV** - **PlayPlusTV**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
@@ -669,7 +704,6 @@
- **Playwire** - **Playwire**
- **pluralsight** - **pluralsight**
- **pluralsight:course** - **pluralsight:course**
- **plus.google**: Google Plus
- **podomatic** - **podomatic**
- **Pokemon** - **Pokemon**
- **PolskieRadio** - **PolskieRadio**
@@ -699,6 +733,7 @@
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuantumTV** - **QuantumTV**
- **Qub**
- **Quickline** - **Quickline**
- **QuicklineLive** - **QuicklineLive**
- **R7** - **R7**
@@ -717,6 +752,8 @@
- **RayWenderlichCourse** - **RayWenderlichCourse**
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBull**
- **RedBullEmbed**
- **RedBullTV** - **RedBullTV**
- **RedBullTVRrnContent** - **RedBullTVRrnContent**
- **Reddit** - **Reddit**
@@ -751,6 +788,7 @@
- **RTVNH** - **RTVNH**
- **RTVS** - **RTVS**
- **RUHD** - **RUHD**
- **RumbleEmbed**
- **rutube**: Rutube videos - **rutube**: Rutube videos
- **rutube:channel**: Rutube channels - **rutube:channel**: Rutube channels
- **rutube:embed**: Rutube embedded videos - **rutube:embed**: Rutube embedded videos
@@ -788,18 +826,17 @@
- **Shared**: shared.sx - **Shared**: shared.sx
- **ShowRoomLive** - **ShowRoomLive**
- **Sina** - **Sina**
- **sky.it**
- **sky:news**
- **sky:sports**
- **sky:sports:news**
- **skyacademy.it**
- **SkylineWebcams** - **SkylineWebcams**
- **SkyNews**
- **skynewsarabia:article** - **skynewsarabia:article**
- **skynewsarabia:video** - **skynewsarabia:video**
- **SkySports**
- **Slideshare** - **Slideshare**
- **SlidesLive** - **SlidesLive**
- **Slutload** - **Slutload**
- **smotri**: Smotri.com
- **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos
- **Snotr** - **Snotr**
- **Sohu** - **Sohu**
- **SonyLIV** - **SonyLIV**
@@ -821,12 +858,16 @@
- **SpankBangPlaylist** - **SpankBangPlaylist**
- **Spankwire** - **Spankwire**
- **Spiegel** - **Spiegel**
- **Spiegel:Article**: Articles on spiegel.de
- **Spiegeltv**
- **sport.francetvinfo.fr** - **sport.francetvinfo.fr**
- **Sport5** - **Sport5**
- **SportBox** - **SportBox**
- **SportDeutschland** - **SportDeutschland**
- **spotify**
- **spotify:show**
- **Spreaker**
- **SpreakerPage**
- **SpreakerShow**
- **SpreakerShowPage**
- **SpringboardPlatform** - **SpringboardPlatform**
- **Sprout** - **Sprout**
- **sr:mediathek**: Saarländischer Rundfunk - **sr:mediathek**: Saarländischer Rundfunk
@@ -835,6 +876,7 @@
- **stanfordoc**: Stanford Open ClassRoom - **stanfordoc**: Stanford Open ClassRoom
- **Steam** - **Steam**
- **Stitcher** - **Stitcher**
- **StitcherShow**
- **Streamable** - **Streamable**
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
@@ -855,7 +897,6 @@
- **Tagesschau** - **Tagesschau**
- **tagesschau:player** - **tagesschau:player**
- **Tass** - **Tass**
- **TastyTrade**
- **TBS** - **TBS**
- **TDSLifeway** - **TDSLifeway**
- **Teachable** - **Teachable**
@@ -878,6 +919,7 @@
- **TeleQuebecEmission** - **TeleQuebecEmission**
- **TeleQuebecLive** - **TeleQuebecLive**
- **TeleQuebecSquat** - **TeleQuebecSquat**
- **TeleQuebecVideo**
- **TeleTask** - **TeleTask**
- **Telewebion** - **Telewebion**
- **TennisTV** - **TennisTV**
@@ -895,7 +937,7 @@
- **ThisAV** - **ThisAV**
- **ThisOldHouse** - **ThisOldHouse**
- **TikTok** - **TikTok**
- **TikTokUser** - **TikTokUser** (Currently broken)
- **tinypic**: tinypic.com videos - **tinypic**: tinypic.com videos
- **TMZ** - **TMZ**
- **TMZArticle** - **TMZArticle**
@@ -903,12 +945,13 @@
- **TNAFlixNetworkEmbed** - **TNAFlixNetworkEmbed**
- **toggle** - **toggle**
- **ToonGoggles** - **ToonGoggles**
- **Tosh**: Tosh.0
- **tou.tv** - **tou.tv**
- **Toypics**: Toypics video - **Toypics**: Toypics video
- **ToypicsUser**: Toypics user profile - **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken) - **TrailerAddict** (Currently broken)
- **Trilulilu** - **Trilulilu**
- **Trovo**
- **TrovoVod**
- **TruNews** - **TruNews**
- **TruTV** - **TruTV**
- **Tube8** - **Tube8**
@@ -928,11 +971,15 @@
- **TV2DKBornholmPlay** - **TV2DKBornholmPlay**
- **TV4**: tv4.se and tv4play.se - **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+ - **TV5MondePlus**: TV5MONDE+
- **tv5unis**
- **tv5unis:video**
- **tv8.it**
- **TVA** - **TVA**
- **TVANouvelles** - **TVANouvelles**
- **TVANouvellesArticle** - **TVANouvellesArticle**
- **TVC** - **TVC**
- **TVCArticle** - **TVCArticle**
- **TVer**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **TVN24** - **TVN24**
@@ -950,16 +997,13 @@
- **TVPlayHome** - **TVPlayHome**
- **Tweakers** - **Tweakers**
- **TwitCasting** - **TwitCasting**
- **twitch:chapter**
- **twitch:clips** - **twitch:clips**
- **twitch:profile**
- **twitch:stream** - **twitch:stream**
- **twitch:video**
- **twitch:videos:all**
- **twitch:videos:highlights**
- **twitch:videos:past-broadcasts**
- **twitch:videos:uploads**
- **twitch:vod** - **twitch:vod**
- **TwitchCollection**
- **TwitchVideos**
- **TwitchVideosClips**
- **TwitchVideosCollections**
- **twitter** - **twitter**
- **twitter:amplify** - **twitter:amplify**
- **twitter:broadcast** - **twitter:broadcast**
@@ -1002,6 +1046,8 @@
- **Viddler** - **Viddler**
- **Videa** - **Videa**
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.sky.it**
- **video.sky.it:live**
- **VideoDetective** - **VideoDetective**
- **videofy.me** - **videofy.me**
- **videomore** - **videomore**
@@ -1013,7 +1059,6 @@
- **vidme** - **vidme**
- **vidme:user** - **vidme:user**
- **vidme:user:likes** - **vidme:user:likes**
- **Vidzi**
- **vier**: vier.be and vijf.be - **vier**: vier.be and vijf.be
- **vier:videos** - **vier:videos**
- **viewlift** - **viewlift**
@@ -1043,7 +1088,7 @@
- **vk:wallpost** - **vk:wallpost**
- **vlive** - **vlive**
- **vlive:channel** - **vlive:channel**
- **vlive:playlist** - **vlive:post**
- **Vodlocker** - **Vodlocker**
- **VODPl** - **VODPl**
- **VODPlatform** - **VODPlatform**
@@ -1058,10 +1103,12 @@
- **vrv** - **vrv**
- **vrv:series** - **vrv:series**
- **VShare** - **VShare**
- **VTM**
- **VTXTV** - **VTXTV**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VVVVID** - **VVVVID**
- **VVVVIDShow**
- **VyboryMos** - **VyboryMos**
- **Vzaar** - **Vzaar**
- **Wakanim** - **Wakanim**
@@ -1084,6 +1131,7 @@
- **WeiboMobile** - **WeiboMobile**
- **WeiqiTV**: WQTV - **WeiqiTV**: WQTV
- **Wistia** - **Wistia**
- **WistiaPlaylist**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop** - **WorldStarHipHop**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
@@ -1091,7 +1139,7 @@
- **WWE** - **WWE**
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing - **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **XHamsterUser** - **XHamsterUser**
@@ -1115,6 +1163,8 @@
- **yahoo:japannews**: Yahoo! Japan News - **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk** - **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом - **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:artist:albums**: Яндекс.Музыка - Артист - Альбомы
- **yandexmusic:artist:tracks**: Яндекс.Музыка - Артист - Треки
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист - **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек - **yandexmusic:track**: Яндекс.Музыка - Трек
- **YandexVideo** - **YandexVideo**
@@ -1132,22 +1182,18 @@
- **YourPorn** - **YourPorn**
- **YourUpload** - **YourUpload**
- **youtube**: YouTube.com - **youtube**: YouTube.com
- **youtube:channel**: YouTube.com channels
- **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication) - **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication)
- **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication) - **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication)
- **youtube:live**: YouTube.com live streams
- **youtube:playlist**: YouTube.com playlists - **youtube:playlist**: YouTube.com playlists
- **youtube:playlists**: YouTube.com user/channel playlists
- **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication) - **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication)
- **youtube:search**: YouTube.com searches - **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first - **youtube:search:date**: YouTube.com searches, newest videos first
- **youtube:search_url**: YouTube.com search URLs
- **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) - **youtube:tab**: YouTube.com tab
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication) - **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **YoutubeYtBe**
- **YoutubeYtUser**
- **Zapiks** - **Zapiks**
- **Zaq1**
- **Zattoo** - **Zattoo**
- **ZattooLive** - **ZattooLive**
- **ZDF** - **ZDF**

View File

@@ -37,7 +37,7 @@
"writeinfojson": true, "writeinfojson": true,
"writesubtitles": false, "writesubtitles": false,
"allsubtitles": false, "allsubtitles": false,
"listssubtitles": false, "listsubtitles": false,
"socket_timeout": 20, "socket_timeout": 20,
"fixup": "never" "fixup": "never"
} }

View File

@@ -98,6 +98,55 @@ class TestInfoExtractor(unittest.TestCase):
self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True) self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True) self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_search_json_ld_realworld(self):
# https://github.com/ytdl-org/youtube-dl/issues/23306
expect_dict(
self,
self.ie._search_json_ld(r'''<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "VideoObject",
"name": "1 On 1 With Kleio",
"url": "https://www.eporner.com/hd-porn/xN49A1cT3eB/1-On-1-With-Kleio/",
"duration": "PT0H12M23S",
"thumbnailUrl": ["https://static-eu-cdn.eporner.com/thumbs/static4/7/78/780/780814/9_360.jpg", "https://imggen.eporner.com/780814/1920/1080/9.jpg"],
"contentUrl": "https://gvideo.eporner.com/xN49A1cT3eB/xN49A1cT3eB.mp4",
"embedUrl": "https://www.eporner.com/embed/xN49A1cT3eB/1-On-1-With-Kleio/",
"image": "https://static-eu-cdn.eporner.com/thumbs/static4/7/78/780/780814/9_360.jpg",
"width": "1920",
"height": "1080",
"encodingFormat": "mp4",
"bitrate": "6617kbps",
"isFamilyFriendly": "False",
"description": "Kleio Valentien",
"uploadDate": "2015-12-05T21:24:35+01:00",
"interactionStatistic": {
"@type": "InteractionCounter",
"interactionType": { "@type": "http://schema.org/WatchAction" },
"userInteractionCount": 1120958
}, "aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "88",
"ratingCount": "630",
"bestRating": "100",
"worstRating": "0"
}, "actor": [{
"@type": "Person",
"name": "Kleio Valentien",
"url": "https://www.eporner.com/pornstar/kleio-valentien/"
}]}
</script>''', None),
{
'title': '1 On 1 With Kleio',
'description': 'Kleio Valentien',
'url': 'https://gvideo.eporner.com/xN49A1cT3eB/xN49A1cT3eB.mp4',
'timestamp': 1449347075,
'duration': 743.0,
'view_count': 1120958,
'width': 1920,
'height': 1080,
})
def test_download_json(self): def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json') uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'}) self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
@@ -108,6 +157,18 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(self.ie._download_json(uri, None, fatal=False), None) self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
def test_parse_html5_media_entries(self): def test_parse_html5_media_entries(self):
# inline video tag
expect_dict(
self,
self.ie._parse_html5_media_entries(
'https://127.0.0.1/video.html',
r'<html><video src="/vid.mp4" /></html>', None)[0],
{
'formats': [{
'url': 'https://127.0.0.1/vid.mp4',
}],
})
# from https://www.r18.com/ # from https://www.r18.com/
# with kpbs in label # with kpbs in label
expect_dict( expect_dict(

View File

@@ -464,6 +464,7 @@ class TestFormatSelection(unittest.TestCase):
assert_syntax_error('+bestaudio') assert_syntax_error('+bestaudio')
assert_syntax_error('bestvideo+') assert_syntax_error('bestvideo+')
assert_syntax_error('/') assert_syntax_error('/')
assert_syntax_error('bestvideo+bestvideo+bestaudio')
def test_format_filtering(self): def test_format_filtering(self):
formats = [ formats = [
@@ -632,13 +633,20 @@ class TestYoutubeDL(unittest.TestCase):
'title2': '%PATH%', 'title2': '%PATH%',
} }
def fname(templ): def fname(templ, na_placeholder='NA'):
ydl = YoutubeDL({'outtmpl': templ}) params = {'outtmpl': templ}
if na_placeholder != 'NA':
params['outtmpl_na_placeholder'] = na_placeholder
ydl = YoutubeDL(params)
return ydl.prepare_filename(info) return ydl.prepare_filename(info)
self.assertEqual(fname('%(id)s.%(ext)s'), '1234.mp4') self.assertEqual(fname('%(id)s.%(ext)s'), '1234.mp4')
self.assertEqual(fname('%(id)s-%(width)s.%(ext)s'), '1234-NA.mp4') self.assertEqual(fname('%(id)s-%(width)s.%(ext)s'), '1234-NA.mp4')
# Replace missing fields with 'NA' NA_TEST_OUTTMPL = '%(uploader_date)s-%(width)d-%(id)s.%(ext)s'
self.assertEqual(fname('%(uploader_date)s-%(id)s.%(ext)s'), 'NA-1234.mp4') # Replace missing fields with 'NA' by default
self.assertEqual(fname(NA_TEST_OUTTMPL), 'NA-NA-1234.mp4')
# Or by provided placeholder
self.assertEqual(fname(NA_TEST_OUTTMPL, na_placeholder='none'), 'none-none-1234.mp4')
self.assertEqual(fname(NA_TEST_OUTTMPL, na_placeholder=''), '--1234.mp4')
self.assertEqual(fname('%(height)d.%(ext)s'), '1080.mp4') self.assertEqual(fname('%(height)d.%(ext)s'), '1080.mp4')
self.assertEqual(fname('%(height)6d.%(ext)s'), ' 1080.mp4') self.assertEqual(fname('%(height)6d.%(ext)s'), ' 1080.mp4')
self.assertEqual(fname('%(height)-6d.%(ext)s'), '1080 .mp4') self.assertEqual(fname('%(height)-6d.%(ext)s'), '1080 .mp4')
@@ -919,6 +927,76 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(downloaded['extractor'], 'testex') self.assertEqual(downloaded['extractor'], 'testex')
self.assertEqual(downloaded['extractor_key'], 'TestEx') self.assertEqual(downloaded['extractor_key'], 'TestEx')
# Test case for https://github.com/ytdl-org/youtube-dl/issues/27064
def test_ignoreerrors_for_playlist_with_url_transparent_iterable_entries(self):
class _YDL(YDL):
def __init__(self, *args, **kwargs):
super(_YDL, self).__init__(*args, **kwargs)
def trouble(self, s, tb=None):
pass
ydl = _YDL({
'format': 'extra',
'ignoreerrors': True,
})
class VideoIE(InfoExtractor):
_VALID_URL = r'video:(?P<id>\d+)'
def _real_extract(self, url):
video_id = self._match_id(url)
formats = [{
'format_id': 'default',
'url': 'url:',
}]
if video_id == '0':
raise ExtractorError('foo')
if video_id == '2':
formats.append({
'format_id': 'extra',
'url': TEST_URL,
})
return {
'id': video_id,
'title': 'Video %s' % video_id,
'formats': formats,
}
class PlaylistIE(InfoExtractor):
_VALID_URL = r'playlist:'
def _entries(self):
for n in range(3):
video_id = compat_str(n)
yield {
'_type': 'url_transparent',
'ie_key': VideoIE.ie_key(),
'id': video_id,
'url': 'video:%s' % video_id,
'title': 'Video Transparent %s' % video_id,
}
def _real_extract(self, url):
return self.playlist_result(self._entries())
ydl.add_info_extractor(VideoIE(ydl))
ydl.add_info_extractor(PlaylistIE(ydl))
info = ydl.extract_info('playlist:')
entries = info['entries']
self.assertEqual(len(entries), 3)
self.assertTrue(entries[0] is None)
self.assertTrue(entries[1] is None)
self.assertEqual(len(ydl.downloaded_info_dicts), 1)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(entries[2], downloaded)
self.assertEqual(downloaded['url'], TEST_URL)
self.assertEqual(downloaded['title'], 'Video Transparent 2')
self.assertEqual(downloaded['id'], '2')
self.assertEqual(downloaded['extractor'], 'Video')
self.assertEqual(downloaded['extractor_key'], 'Video')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -31,16 +31,17 @@ class TestAllURLsMatching(unittest.TestCase):
def test_youtube_playlist_matching(self): def test_youtube_playlist_matching(self):
assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist']) assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
assertTab = lambda url: self.assertMatch(url, ['youtube:tab'])
assertPlaylist('ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8') assertPlaylist('ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist('UUBABnxM4Ar9ten8Mdjj1j0Q') # 585 assertPlaylist('UUBABnxM4Ar9ten8Mdjj1j0Q') # 585
assertPlaylist('PL63F0C78739B09958') assertPlaylist('PL63F0C78739B09958')
assertPlaylist('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q') assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8') assertTab('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC') assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertPlaylist('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') # 668 assertTab('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') # 668
self.assertFalse('youtube:playlist' in self.matching_ies('PLtS2H6bU1M')) self.assertFalse('youtube:playlist' in self.matching_ies('PLtS2H6bU1M'))
# Top tracks # Top tracks
assertPlaylist('https://www.youtube.com/playlist?list=MCUS.20142101') assertTab('https://www.youtube.com/playlist?list=MCUS.20142101')
def test_youtube_matching(self): def test_youtube_matching(self):
self.assertTrue(YoutubeIE.suitable('PLtS2H6bU1M')) self.assertTrue(YoutubeIE.suitable('PLtS2H6bU1M'))
@@ -51,26 +52,23 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://www.cleanvideosearch.com/media/action/yt/watch?videoId=8v_4O44sfjM', ['youtube']) self.assertMatch('http://www.cleanvideosearch.com/media/action/yt/watch?videoId=8v_4O44sfjM', ['youtube'])
def test_youtube_channel_matching(self): def test_youtube_channel_matching(self):
assertChannel = lambda url: self.assertMatch(url, ['youtube:channel']) assertChannel = lambda url: self.assertMatch(url, ['youtube:tab'])
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM') assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec') assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos') assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self): def test_youtube_user_matching(self):
self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:user']) self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:tab'])
def test_youtube_feeds(self): def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watchlater']) self.assertMatch('https://www.youtube.com/feed/library', ['youtube:tab'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:subscriptions']) self.assertMatch('https://www.youtube.com/feed/history', ['youtube:tab'])
self.assertMatch('https://www.youtube.com/feed/recommended', ['youtube:recommended']) self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:tab'])
self.assertMatch('https://www.youtube.com/my_favorites', ['youtube:favorites']) self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:tab'])
def test_youtube_show_matching(self): # def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show']) # self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_extract(self): def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id) assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)

View File

@@ -258,16 +258,24 @@ class TestNRKSubtitles(BaseTestSubtitles):
class TestRaiPlaySubtitles(BaseTestSubtitles): class TestRaiPlaySubtitles(BaseTestSubtitles):
url = 'http://www.raiplay.it/video/2014/04/Report-del-07042014-cb27157f-9dd0-4aee-b788-b1f67643a391.html'
IE = RaiPlayIE IE = RaiPlayIE
def test_allsubtitles(self): def test_subtitles_key(self):
self.url = 'http://www.raiplay.it/video/2014/04/Report-del-07042014-cb27157f-9dd0-4aee-b788-b1f67643a391.html'
self.DL.params['writesubtitles'] = True self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles() subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['it'])) self.assertEqual(set(subtitles.keys()), set(['it']))
self.assertEqual(md5(subtitles['it']), 'b1d90a98755126b61e667567a1f6680a') self.assertEqual(md5(subtitles['it']), 'b1d90a98755126b61e667567a1f6680a')
def test_subtitles_array_key(self):
self.url = 'https://www.raiplay.it/video/2020/12/Report---04-01-2021-2e90f1de-8eee-4de4-ac0e-78d21db5b600.html'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['it']))
self.assertEqual(md5(subtitles['it']), '4b3264186fbb103508abe5311cfcb9cd')
class TestVikiSubtitles(BaseTestSubtitles): class TestVikiSubtitles(BaseTestSubtitles):
url = 'http://www.viki.com/videos/1060846v-punch-episode-18' url = 'http://www.viki.com/videos/1060846v-punch-episode-18'

View File

@@ -21,6 +21,7 @@ from youtube_dl.utils import (
encode_base_n, encode_base_n,
caesar, caesar,
clean_html, clean_html,
clean_podcast_url,
date_from_str, date_from_str,
DateRange, DateRange,
detect_exe_version, detect_exe_version,
@@ -554,6 +555,11 @@ class TestUtil(unittest.TestCase):
self.assertEqual(url_or_none('http$://foo.de'), None) self.assertEqual(url_or_none('http$://foo.de'), None)
self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de') self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de')
self.assertEqual(url_or_none('//foo.de'), '//foo.de') self.assertEqual(url_or_none('//foo.de'), '//foo.de')
self.assertEqual(url_or_none('s3://foo.de'), None)
self.assertEqual(url_or_none('rtmpte://foo.de'), 'rtmpte://foo.de')
self.assertEqual(url_or_none('mms://foo.de'), 'mms://foo.de')
self.assertEqual(url_or_none('rtspu://foo.de'), 'rtspu://foo.de')
self.assertEqual(url_or_none('ftps://foo.de'), 'ftps://foo.de')
def test_parse_age_limit(self): def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None) self.assertEqual(parse_age_limit(None), None)
@@ -803,6 +809,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(mimetype2ext('text/vtt'), 'vtt') self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt') self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html') self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
self.assertEqual(mimetype2ext('audio/x-wav'), 'wav')
self.assertEqual(mimetype2ext('audio/x-wav;codec=pcm'), 'wav')
def test_month_by_name(self): def test_month_by_name(self):
self.assertEqual(month_by_name(None), None) self.assertEqual(month_by_name(None), None)
@@ -935,6 +943,28 @@ class TestUtil(unittest.TestCase):
self.assertEqual(d['x'], 1) self.assertEqual(d['x'], 1)
self.assertEqual(d['y'], 'a') self.assertEqual(d['y'], 'a')
# Just drop ! prefix for now though this results in a wrong value
on = js_to_json('''{
a: !0,
b: !1,
c: !!0,
d: !!42.42,
e: !!![],
f: !"abc",
g: !"",
!42: 42
}''')
self.assertEqual(json.loads(on), {
'a': 0,
'b': 1,
'c': 0,
'd': 42.42,
'e': [],
'f': "abc",
'g': "",
'42': 42
})
on = js_to_json('["abc", "def",]') on = js_to_json('["abc", "def",]')
self.assertEqual(json.loads(on), ['abc', 'def']) self.assertEqual(json.loads(on), ['abc', 'def'])
@@ -992,6 +1022,12 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{42:4.2e1}') on = js_to_json('{42:4.2e1}')
self.assertEqual(json.loads(on), {'42': 42.0}) self.assertEqual(json.loads(on), {'42': 42.0})
on = js_to_json('{ "0x40": "0x40" }')
self.assertEqual(json.loads(on), {'0x40': '0x40'})
on = js_to_json('{ "040": "040" }')
self.assertEqual(json.loads(on), {'040': '040'})
def test_js_to_json_malformed(self): def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"') self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1') self.assertEqual(js_to_json('42a-1'), '42"a"-1')
@@ -1435,6 +1471,10 @@ Line 1
self.assertEqual(get_elements_by_attribute('class', 'foo', html), []) self.assertEqual(get_elements_by_attribute('class', 'foo', html), [])
self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), []) self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), [])
def test_clean_podcast_url(self):
self.assertEqual(clean_podcast_url('https://www.podtrac.com/pts/redirect.mp3/chtbl.com/track/5899E/traffic.megaphone.fm/HSW7835899191.mp3'), 'https://traffic.megaphone.fm/HSW7835899191.mp3')
self.assertEqual(clean_podcast_url('https://play.podtrac.com/npr-344098539/edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3'), 'https://edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -1,275 +0,0 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import expect_value
from youtube_dl.extractor import YoutubeIE
class TestYoutubeChapters(unittest.TestCase):
_TEST_CASES = [
(
# https://www.youtube.com/watch?v=A22oy8dFjqc
# pattern: 00:00 - <title>
'''This is the absolute ULTIMATE experience of Queen's set at LIVE AID, this is the best video mixed to the absolutely superior stereo radio broadcast. This vastly superior audio mix takes a huge dump on all of the official mixes. Best viewed in 1080p. ENJOY! ***MAKE SURE TO READ THE DESCRIPTION***<br /><a href="#" onclick="yt.www.watch.player.seekTo(00*60+36);return false;">00:36</a> - Bohemian Rhapsody<br /><a href="#" onclick="yt.www.watch.player.seekTo(02*60+42);return false;">02:42</a> - Radio Ga Ga<br /><a href="#" onclick="yt.www.watch.player.seekTo(06*60+53);return false;">06:53</a> - Ay Oh!<br /><a href="#" onclick="yt.www.watch.player.seekTo(07*60+34);return false;">07:34</a> - Hammer To Fall<br /><a href="#" onclick="yt.www.watch.player.seekTo(12*60+08);return false;">12:08</a> - Crazy Little Thing Called Love<br /><a href="#" onclick="yt.www.watch.player.seekTo(16*60+03);return false;">16:03</a> - We Will Rock You<br /><a href="#" onclick="yt.www.watch.player.seekTo(17*60+18);return false;">17:18</a> - We Are The Champions<br /><a href="#" onclick="yt.www.watch.player.seekTo(21*60+12);return false;">21:12</a> - Is This The World We Created...?<br /><br />Short song analysis:<br /><br />- "Bohemian Rhapsody": Although it's a short medley version, it's one of the best performances of the ballad section, with Freddie nailing the Bb4s with the correct studio phrasing (for the first time ever!).<br /><br />- "Radio Ga Ga": Although it's missing one chorus, this is one of - if not the best - the best versions ever, Freddie nails all the Bb4s and sounds very clean! Spike Edney's Roland Jupiter 8 also really shines through on this mix, compared to the DVD releases!<br /><br />- "Audience Improv": A great improv, Freddie sounds strong and confident. You gotta love when he sustains that A4 for 4 seconds!<br /><br />- "Hammer To Fall": Despite missing a verse and a chorus, it's a strong version (possibly the best ever). Freddie sings the song amazingly, and even ad-libs a C#5 and a C5! Also notice how heavy Brian's guitar sounds compared to the thin DVD mixes - it roars!<br /><br />- "Crazy Little Thing Called Love": A great version, the crowd loves the song, the jam is great as well! Only downside to this is the slight feedback issues.<br /><br />- "We Will Rock You": Although cut down to the 1st verse and chorus, Freddie sounds strong. He nails the A4, and the solo from Dr. May is brilliant!<br /><br />- "We Are the Champions": Perhaps the high-light of the performance - Freddie is very daring on this version, he sustains the pre-chorus Bb4s, nails the 1st C5, belts great A4s, but most importantly: He nails the chorus Bb4s, in all 3 choruses! This is the only time he has ever done so! It has to be said though, the last one sounds a bit rough, but that's a side effect of belting high notes for the past 18 minutes, with nodules AND laryngitis!<br /><br />- "Is This The World We Created... ?": Freddie and Brian perform a beautiful version of this, and it is one of the best versions ever. It's both sad and hilarious that a couple of BBC engineers are talking over the song, one of them being completely oblivious of the fact that he is interrupting the performance, on live television... Which was being televised to almost 2 billion homes.<br /><br /><br />All rights go to their respective owners!<br />-----Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for fair use for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use''',
1477,
[{
'start_time': 36,
'end_time': 162,
'title': 'Bohemian Rhapsody',
}, {
'start_time': 162,
'end_time': 413,
'title': 'Radio Ga Ga',
}, {
'start_time': 413,
'end_time': 454,
'title': 'Ay Oh!',
}, {
'start_time': 454,
'end_time': 728,
'title': 'Hammer To Fall',
}, {
'start_time': 728,
'end_time': 963,
'title': 'Crazy Little Thing Called Love',
}, {
'start_time': 963,
'end_time': 1038,
'title': 'We Will Rock You',
}, {
'start_time': 1038,
'end_time': 1272,
'title': 'We Are The Champions',
}, {
'start_time': 1272,
'end_time': 1477,
'title': 'Is This The World We Created...?',
}]
),
(
# https://www.youtube.com/watch?v=ekYlRhALiRQ
# pattern: <num>. <title> 0:00
'1. Those Beaten Paths of Confusion <a href="#" onclick="yt.www.watch.player.seekTo(0*60+00);return false;">0:00</a><br />2. Beyond the Shadows of Emptiness & Nothingness <a href="#" onclick="yt.www.watch.player.seekTo(11*60+47);return false;">11:47</a><br />3. Poison Yourself...With Thought <a href="#" onclick="yt.www.watch.player.seekTo(26*60+30);return false;">26:30</a><br />4. The Agents of Transformation <a href="#" onclick="yt.www.watch.player.seekTo(35*60+57);return false;">35:57</a><br />5. Drowning in the Pain of Consciousness <a href="#" onclick="yt.www.watch.player.seekTo(44*60+32);return false;">44:32</a><br />6. Deny the Disease of Life <a href="#" onclick="yt.www.watch.player.seekTo(53*60+07);return false;">53:07</a><br /><br />More info/Buy: http://crepusculonegro.storenvy.com/products/257645-cn-03-arizmenda-within-the-vacuum-of-infinity<br /><br />No copyright is intended. The rights to this video are assumed by the owner and its affiliates.',
4009,
[{
'start_time': 0,
'end_time': 707,
'title': '1. Those Beaten Paths of Confusion',
}, {
'start_time': 707,
'end_time': 1590,
'title': '2. Beyond the Shadows of Emptiness & Nothingness',
}, {
'start_time': 1590,
'end_time': 2157,
'title': '3. Poison Yourself...With Thought',
}, {
'start_time': 2157,
'end_time': 2672,
'title': '4. The Agents of Transformation',
}, {
'start_time': 2672,
'end_time': 3187,
'title': '5. Drowning in the Pain of Consciousness',
}, {
'start_time': 3187,
'end_time': 4009,
'title': '6. Deny the Disease of Life',
}]
),
(
# https://www.youtube.com/watch?v=WjL4pSzog9w
# pattern: 00:00 <title>
'<a href="https://arizmenda.bandcamp.com/merch/despairs-depths-descended-cd" class="yt-uix-servicelink " data-target-new-window="True" data-servicelink="CDAQ6TgYACITCNf1raqT2dMCFdRjGAod_o0CBSj4HQ" data-url="https://arizmenda.bandcamp.com/merch/despairs-depths-descended-cd" rel="nofollow noopener" target="_blank">https://arizmenda.bandcamp.com/merch/...</a><br /><br /><a href="#" onclick="yt.www.watch.player.seekTo(00*60+00);return false;">00:00</a> Christening Unborn Deformities <br /><a href="#" onclick="yt.www.watch.player.seekTo(07*60+08);return false;">07:08</a> Taste of Purity<br /><a href="#" onclick="yt.www.watch.player.seekTo(16*60+16);return false;">16:16</a> Sculpting Sins of a Universal Tongue<br /><a href="#" onclick="yt.www.watch.player.seekTo(24*60+45);return false;">24:45</a> Birth<br /><a href="#" onclick="yt.www.watch.player.seekTo(31*60+24);return false;">31:24</a> Neves<br /><a href="#" onclick="yt.www.watch.player.seekTo(37*60+55);return false;">37:55</a> Libations in Limbo',
2705,
[{
'start_time': 0,
'end_time': 428,
'title': 'Christening Unborn Deformities',
}, {
'start_time': 428,
'end_time': 976,
'title': 'Taste of Purity',
}, {
'start_time': 976,
'end_time': 1485,
'title': 'Sculpting Sins of a Universal Tongue',
}, {
'start_time': 1485,
'end_time': 1884,
'title': 'Birth',
}, {
'start_time': 1884,
'end_time': 2275,
'title': 'Neves',
}, {
'start_time': 2275,
'end_time': 2705,
'title': 'Libations in Limbo',
}]
),
(
# https://www.youtube.com/watch?v=o3r1sn-t3is
# pattern: <title> 00:00 <note>
'Download this show in MP3: <a href="http://sh.st/njZKK" class="yt-uix-servicelink " data-url="http://sh.st/njZKK" data-target-new-window="True" data-servicelink="CDAQ6TgYACITCK3j8_6o2dMCFVDCGAoduVAKKij4HQ" rel="nofollow noopener" target="_blank">http://sh.st/njZKK</a><br /><br />Setlist:<br />I-E-A-I-A-I-O <a href="#" onclick="yt.www.watch.player.seekTo(00*60+45);return false;">00:45</a><br />Suite-Pee <a href="#" onclick="yt.www.watch.player.seekTo(4*60+26);return false;">4:26</a> (Incomplete)<br />Attack <a href="#" onclick="yt.www.watch.player.seekTo(5*60+31);return false;">5:31</a> (First live performance since 2011)<br />Prison Song <a href="#" onclick="yt.www.watch.player.seekTo(8*60+42);return false;">8:42</a><br />Know <a href="#" onclick="yt.www.watch.player.seekTo(12*60+32);return false;">12:32</a> (First live performance since 2011)<br />Aerials <a href="#" onclick="yt.www.watch.player.seekTo(15*60+32);return false;">15:32</a><br />Soldier Side - Intro <a href="#" onclick="yt.www.watch.player.seekTo(19*60+13);return false;">19:13</a><br />B.Y.O.B. <a href="#" onclick="yt.www.watch.player.seekTo(20*60+09);return false;">20:09</a><br />Soil <a href="#" onclick="yt.www.watch.player.seekTo(24*60+32);return false;">24:32</a><br />Darts <a href="#" onclick="yt.www.watch.player.seekTo(27*60+48);return false;">27:48</a><br />Radio/Video <a href="#" onclick="yt.www.watch.player.seekTo(30*60+38);return false;">30:38</a><br />Hypnotize <a href="#" onclick="yt.www.watch.player.seekTo(35*60+05);return false;">35:05</a><br />Temper <a href="#" onclick="yt.www.watch.player.seekTo(38*60+08);return false;">38:08</a> (First live performance since 1999)<br />CUBErt <a href="#" onclick="yt.www.watch.player.seekTo(41*60+00);return false;">41:00</a><br />Needles <a href="#" onclick="yt.www.watch.player.seekTo(42*60+57);return false;">42:57</a><br />Deer Dance <a href="#" onclick="yt.www.watch.player.seekTo(46*60+27);return false;">46:27</a><br />Bounce <a href="#" onclick="yt.www.watch.player.seekTo(49*60+38);return false;">49:38</a><br />Suggestions <a href="#" onclick="yt.www.watch.player.seekTo(51*60+25);return false;">51:25</a><br />Psycho <a href="#" onclick="yt.www.watch.player.seekTo(53*60+52);return false;">53:52</a><br />Chop Suey! <a href="#" onclick="yt.www.watch.player.seekTo(58*60+13);return false;">58:13</a><br />Lonely Day <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+01*60+15);return false;">1:01:15</a><br />Question! <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+04*60+14);return false;">1:04:14</a><br />Lost in Hollywood <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+08*60+10);return false;">1:08:10</a><br />Vicinity of Obscenity <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+13*60+40);return false;">1:13:40</a>(First live performance since 2012)<br />Forest <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+16*60+17);return false;">1:16:17</a><br />Cigaro <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+20*60+02);return false;">1:20:02</a><br />Toxicity <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+23*60+57);return false;">1:23:57</a>(with Chino Moreno)<br />Sugar <a href="#" onclick="yt.www.watch.player.seekTo(1*3600+27*60+53);return false;">1:27:53</a>',
5640,
[{
'start_time': 45,
'end_time': 266,
'title': 'I-E-A-I-A-I-O',
}, {
'start_time': 266,
'end_time': 331,
'title': 'Suite-Pee (Incomplete)',
}, {
'start_time': 331,
'end_time': 522,
'title': 'Attack (First live performance since 2011)',
}, {
'start_time': 522,
'end_time': 752,
'title': 'Prison Song',
}, {
'start_time': 752,
'end_time': 932,
'title': 'Know (First live performance since 2011)',
}, {
'start_time': 932,
'end_time': 1153,
'title': 'Aerials',
}, {
'start_time': 1153,
'end_time': 1209,
'title': 'Soldier Side - Intro',
}, {
'start_time': 1209,
'end_time': 1472,
'title': 'B.Y.O.B.',
}, {
'start_time': 1472,
'end_time': 1668,
'title': 'Soil',
}, {
'start_time': 1668,
'end_time': 1838,
'title': 'Darts',
}, {
'start_time': 1838,
'end_time': 2105,
'title': 'Radio/Video',
}, {
'start_time': 2105,
'end_time': 2288,
'title': 'Hypnotize',
}, {
'start_time': 2288,
'end_time': 2460,
'title': 'Temper (First live performance since 1999)',
}, {
'start_time': 2460,
'end_time': 2577,
'title': 'CUBErt',
}, {
'start_time': 2577,
'end_time': 2787,
'title': 'Needles',
}, {
'start_time': 2787,
'end_time': 2978,
'title': 'Deer Dance',
}, {
'start_time': 2978,
'end_time': 3085,
'title': 'Bounce',
}, {
'start_time': 3085,
'end_time': 3232,
'title': 'Suggestions',
}, {
'start_time': 3232,
'end_time': 3493,
'title': 'Psycho',
}, {
'start_time': 3493,
'end_time': 3675,
'title': 'Chop Suey!',
}, {
'start_time': 3675,
'end_time': 3854,
'title': 'Lonely Day',
}, {
'start_time': 3854,
'end_time': 4090,
'title': 'Question!',
}, {
'start_time': 4090,
'end_time': 4420,
'title': 'Lost in Hollywood',
}, {
'start_time': 4420,
'end_time': 4577,
'title': 'Vicinity of Obscenity (First live performance since 2012)',
}, {
'start_time': 4577,
'end_time': 4802,
'title': 'Forest',
}, {
'start_time': 4802,
'end_time': 5037,
'title': 'Cigaro',
}, {
'start_time': 5037,
'end_time': 5273,
'title': 'Toxicity (with Chino Moreno)',
}, {
'start_time': 5273,
'end_time': 5640,
'title': 'Sugar',
}]
),
(
# https://www.youtube.com/watch?v=PkYLQbsqCE8
# pattern: <num> - <title> [<latinized title>] 0:00:00
'''Затемно (Zatemno) is an Obscure Black Metal Band from Russia.<br /><br />"Во прах (Vo prakh)'' Into The Ashes", Debut mini-album released may 6, 2016, by Death Knell Productions<br />Released on 6 panel digipak CD, limited to 100 copies only<br />And digital format on Bandcamp<br /><br />Tracklist<br /><br />1 - Во прах [Vo prakh] <a href="#" onclick="yt.www.watch.player.seekTo(0*3600+00*60+00);return false;">0:00:00</a><br />2 - Искупление [Iskupleniye] <a href="#" onclick="yt.www.watch.player.seekTo(0*3600+08*60+10);return false;">0:08:10</a><br />3 - Из серпов луны...[Iz serpov luny] <a href="#" onclick="yt.www.watch.player.seekTo(0*3600+14*60+30);return false;">0:14:30</a><br /><br />Links:<br /><a href="https://deathknellprod.bandcamp.com/album/--2" class="yt-uix-servicelink " data-target-new-window="True" data-url="https://deathknellprod.bandcamp.com/album/--2" data-servicelink="CC8Q6TgYACITCNP234Kr2dMCFcNxGAodQqsIwSj4HQ" target="_blank" rel="nofollow noopener">https://deathknellprod.bandcamp.com/a...</a><br /><a href="https://www.facebook.com/DeathKnellProd/" class="yt-uix-servicelink " data-target-new-window="True" data-url="https://www.facebook.com/DeathKnellProd/" data-servicelink="CC8Q6TgYACITCNP234Kr2dMCFcNxGAodQqsIwSj4HQ" target="_blank" rel="nofollow noopener">https://www.facebook.com/DeathKnellProd/</a><br /><br /><br />I don't have any right about this artifact, my only intention is to spread the music of the band, all rights are reserved to the Затемно (Zatemno) and his producers, Death Knell Productions.<br /><br />------------------------------------------------------------------<br /><br />Subscribe for more videos like this.<br />My link: <a href="https://web.facebook.com/AttackOfTheDragons" class="yt-uix-servicelink " data-target-new-window="True" data-url="https://web.facebook.com/AttackOfTheDragons" data-servicelink="CC8Q6TgYACITCNP234Kr2dMCFcNxGAodQqsIwSj4HQ" target="_blank" rel="nofollow noopener">https://web.facebook.com/AttackOfTheD...</a>''',
1138,
[{
'start_time': 0,
'end_time': 490,
'title': '1 - Во прах [Vo prakh]',
}, {
'start_time': 490,
'end_time': 870,
'title': '2 - Искупление [Iskupleniye]',
}, {
'start_time': 870,
'end_time': 1138,
'title': '3 - Из серпов луны...[Iz serpov luny]',
}]
),
(
# https://www.youtube.com/watch?v=xZW70zEasOk
# time point more than duration
'''● LCS Spring finals: Saturday and Sunday from <a href="#" onclick="yt.www.watch.player.seekTo(13*60+30);return false;">13:30</a> outside the venue! <br />● PAX East: Fri, Sat & Sun - more info in tomorrows video on the main channel!''',
283,
[]
),
]
def test_youtube_chapters(self):
for description, duration, expected_chapters in self._TEST_CASES:
ie = YoutubeIE()
expect_value(
self, ie._extract_chapters_from_description(description, duration),
expected_chapters, None)
if __name__ == '__main__':
unittest.main()

View File

@@ -86,13 +86,9 @@ class TestPlayerInfo(unittest.TestCase):
('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'), ('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'), ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'), ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
) )
for player_url, expected_player_id in PLAYER_URLS: for player_url, expected_player_id in PLAYER_URLS:
expected_player_type = player_url.split('.')[-1] player_id = YoutubeIE._extract_player_info(player_url)
player_type, player_id = YoutubeIE._extract_player_info(player_url)
self.assertEqual(player_type, expected_player_type)
self.assertEqual(player_id, expected_player_id) self.assertEqual(player_id, expected_player_id)

View File

@@ -163,6 +163,7 @@ class YoutubeDL(object):
simulate: Do not download the video files. simulate: Do not download the video files.
format: Video format code. See options.py for more information. format: Video format code. See options.py for more information.
outtmpl: Template for output names. outtmpl: Template for output names.
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names restrictfilenames: Do not allow "&" and spaces in file names
ignoreerrors: Do not stop on download errors. ignoreerrors: Do not stop on download errors.
force_generic_extractor: Force downloader to use the generic extractor force_generic_extractor: Force downloader to use the generic extractor
@@ -338,6 +339,8 @@ class YoutubeDL(object):
_pps = [] _pps = []
_download_retcode = None _download_retcode = None
_num_downloads = None _num_downloads = None
_playlist_level = 0
_playlist_urls = set()
_screen_file = None _screen_file = None
def __init__(self, params=None, auto_init=True): def __init__(self, params=None, auto_init=True):
@@ -656,7 +659,7 @@ class YoutubeDL(object):
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v)) template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items() for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict))) if v is not None and not isinstance(v, (list, tuple, dict)))
template_dict = collections.defaultdict(lambda: 'NA', template_dict) template_dict = collections.defaultdict(lambda: self.params.get('outtmpl_na_placeholder', 'NA'), template_dict)
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL) outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
@@ -676,8 +679,8 @@ class YoutubeDL(object):
# Missing numeric fields used together with integer presentation types # Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since # in format specification will break the argument substitution since
# string 'NA' is returned for missing fields. We will patch output # string NA placeholder is returned for missing fields. We will patch
# template for missing fields to meet string presentation type. # output template for missing fields to meet string presentation type.
for numeric_field in self._NUMERIC_FIELDS: for numeric_field in self._NUMERIC_FIELDS:
if numeric_field not in template_dict: if numeric_field not in template_dict:
# As of [1] format syntax is: # As of [1] format syntax is:
@@ -793,21 +796,14 @@ class YoutubeDL(object):
self.report_warning('The program functionality for this site has been marked as broken, ' self.report_warning('The program functionality for this site has been marked as broken, '
'and will probably not work.') 'and will probably not work.')
return self.__extract_info(url, ie, download, extra_info, process)
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
def __handle_extraction_exceptions(func):
def wrapper(self, *args, **kwargs):
try: try:
ie_result = ie.extract(url) return func(self, *args, **kwargs)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
break
if isinstance(ie_result, list):
# Backwards compatibility: old IE result format
ie_result = {
'_type': 'compat_list',
'entries': ie_result,
}
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
else:
return ie_result
except GeoRestrictedError as e: except GeoRestrictedError as e:
msg = e.msg msg = e.msg
if e.countries: if e.countries:
@@ -815,20 +811,33 @@ class YoutubeDL(object):
map(ISO3166Utils.short2full, e.countries)) map(ISO3166Utils.short2full, e.countries))
msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.' msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.'
self.report_error(msg) self.report_error(msg)
break
except ExtractorError as e: # An error we somewhat expected except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback()) self.report_error(compat_str(e), e.format_traceback())
break
except MaxDownloadsReached: except MaxDownloadsReached:
raise raise
except Exception as e: except Exception as e:
if self.params.get('ignoreerrors', False): if self.params.get('ignoreerrors', False):
self.report_error(error_to_compat_str(e), tb=encode_compat_str(traceback.format_exc())) self.report_error(error_to_compat_str(e), tb=encode_compat_str(traceback.format_exc()))
break
else: else:
raise raise
return wrapper
@__handle_extraction_exceptions
def __extract_info(self, url, ie, download, extra_info, process):
ie_result = ie.extract(url)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
return
if isinstance(ie_result, list):
# Backwards compatibility: old IE result format
ie_result = {
'_type': 'compat_list',
'entries': ie_result,
}
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
else: else:
self.report_error('no suitable InfoExtractor for URL %s' % url) return ie_result
def add_default_extra_info(self, ie_result, ie, url): def add_default_extra_info(self, ie_result, ie, url):
self.add_extra_info(ie_result, { self.add_extra_info(ie_result, {
@@ -900,116 +909,23 @@ class YoutubeDL(object):
return self.process_ie_result( return self.process_ie_result(
new_result, download=download, extra_info=extra_info) new_result, download=download, extra_info=extra_info)
elif result_type in ('playlist', 'multi_video'): elif result_type in ('playlist', 'multi_video'):
# We process each entry in the playlist # Protect from infinite recursion due to recursively nested playlists
playlist = ie_result.get('title') or ie_result.get('id') # (see https://github.com/ytdl-org/youtube-dl/issues/27833)
self.to_screen('[download] Downloading playlist: %s' % playlist) webpage_url = ie_result['webpage_url']
if webpage_url in self._playlist_urls:
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
for string_segment in format.split(','):
if '-' in string_segment:
start, end = string_segment.split('-')
for item in range(int(start), int(end) + 1):
yield int(item)
else:
yield int(string_segment)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen( self.to_screen(
'[%s] playlist %s: Downloading %d videos' % '[download] Skipping already downloaded playlist: %s'
(ie_result['extractor'], playlist, num_entries)) % ie_result.get('title') or ie_result.get('id'))
return
if isinstance(ie_entries, list): self._playlist_level += 1
n_all_entries = len(ie_entries) self._playlist_urls.add(webpage_url)
if playlistitems: try:
entries = make_playlistitems_entries(ie_entries) return self.__process_playlist(ie_result, download)
else: finally:
entries = ie_entries[playliststart:playlistend] self._playlist_level -= 1
n_entries = len(entries) if not self._playlist_level:
self.to_screen( self._playlist_urls.clear()
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
for item in playlistitems:
entries.extend(ie_entries.getslice(
item - 1, item
))
else:
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
report_download(n_entries)
else: # iterable
if playlistitems:
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
reason = self._match_entry(entry, incomplete=True)
if reason is not None:
self.to_screen('[download] ' + reason)
continue
entry_result = self.process_ie_result(entry,
download=download,
extra_info=extra)
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
elif result_type == 'compat_list': elif result_type == 'compat_list':
self.report_warning( self.report_warning(
'Extractor %s returned a compat_list result. ' 'Extractor %s returned a compat_list result. '
@@ -1034,6 +950,123 @@ class YoutubeDL(object):
else: else:
raise Exception('Invalid result type: %s' % result_type) raise Exception('Invalid result type: %s' % result_type)
def __process_playlist(self, ie_result, download):
# We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
for string_segment in format.split(','):
if '-' in string_segment:
start, end = string_segment.split('-')
for item in range(int(start), int(end) + 1):
yield int(item)
else:
yield int(string_segment)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
self.to_screen(
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
for item in playlistitems:
entries.extend(ie_entries.getslice(
item - 1, item
))
else:
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
report_download(n_entries)
else: # iterable
if playlistitems:
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
reason = self._match_entry(entry, incomplete=True)
if reason is not None:
self.to_screen('[download] ' + reason)
continue
entry_result = self.__process_iterable_entry(entry, download, extra)
# TODO: skip failed (empty) entries?
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
@__handle_extraction_exceptions
def __process_iterable_entry(self, entry, download, extra_info):
return self.process_ie_result(
entry, download=download, extra_info=extra_info)
def _build_format_filter(self, filter_spec): def _build_format_filter(self, filter_spec):
" Returns a function to filter the formats according to the filter_spec " " Returns a function to filter the formats according to the filter_spec "
@@ -1073,7 +1106,7 @@ class YoutubeDL(object):
'*=': lambda attr, value: value in attr, '*=': lambda attr, value: value in attr,
} }
str_operator_rex = re.compile(r'''(?x) str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id) \s*(?P<key>ext|acodec|vcodec|container|protocol|format_id|language)
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)? \s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+) \s*(?P<value>[a-zA-Z0-9._-]+)
\s*$ \s*$
@@ -1216,6 +1249,8 @@ class YoutubeDL(object):
group = _parse_format_selection(tokens, inside_group=True) group = _parse_format_selection(tokens, inside_group=True)
current_selector = FormatSelector(GROUP, group, []) current_selector = FormatSelector(GROUP, group, [])
elif string == '+': elif string == '+':
if inside_merge:
raise syntax_error('Unexpected "+"', start)
video_selector = current_selector video_selector = current_selector
audio_selector = _parse_format_selection(tokens, inside_merge=True) audio_selector = _parse_format_selection(tokens, inside_merge=True)
if not video_selector or not audio_selector: if not video_selector or not audio_selector:
@@ -1600,7 +1635,7 @@ class YoutubeDL(object):
if req_format is None: if req_format is None:
req_format = self._default_format_spec(info_dict, download=download) req_format = self._default_format_spec(info_dict, download=download)
if self.params.get('verbose'): if self.params.get('verbose'):
self.to_stdout('[debug] Default format spec: %s' % req_format) self._write_string('[debug] Default format spec: %s\n' % req_format)
format_selector = self.build_format_selector(req_format) format_selector = self.build_format_selector(req_format)
@@ -1767,6 +1802,8 @@ class YoutubeDL(object):
os.makedirs(dn) os.makedirs(dn)
return True return True
except (OSError, IOError) as err: except (OSError, IOError) as err:
if isinstance(err, OSError) and err.errno == errno.EEXIST:
return True
self.report_error('unable to create directory ' + error_to_compat_str(err)) self.report_error('unable to create directory ' + error_to_compat_str(err))
return False return False
@@ -1861,7 +1898,7 @@ class YoutubeDL(object):
for ph in self._progress_hooks: for ph in self._progress_hooks:
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
if self.params.get('verbose'): if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url')) self.to_screen('[debug] Invoking downloader on %r' % info.get('url'))
return fd.download(name, info) return fd.download(name, info)
if info_dict.get('requested_formats') is not None: if info_dict.get('requested_formats') is not None:
@@ -2400,7 +2437,7 @@ class YoutubeDL(object):
thumb_ext = determine_ext(t['url'], 'jpg') thumb_ext = determine_ext(t['url'], 'jpg')
suffix = '_%s' % t['id'] if len(thumbnails) > 1 else '' suffix = '_%s' % t['id'] if len(thumbnails) > 1 else ''
thumb_display_id = '%s ' % t['id'] if len(thumbnails) > 1 else '' thumb_display_id = '%s ' % t['id'] if len(thumbnails) > 1 else ''
t['filename'] = thumb_filename = os.path.splitext(filename)[0] + suffix + '.' + thumb_ext t['filename'] = thumb_filename = replace_extension(filename + suffix, thumb_ext, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)): if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)):
self.to_screen('[%s] %s: Thumbnail %sis already present' % self.to_screen('[%s] %s: Thumbnail %sis already present' %

View File

@@ -340,6 +340,7 @@ def _real_main(argv=None):
'format': opts.format, 'format': opts.format,
'listformats': opts.listformats, 'listformats': opts.listformats,
'outtmpl': outtmpl, 'outtmpl': outtmpl,
'outtmpl_na_placeholder': opts.outtmpl_na_placeholder,
'autonumber_size': opts.autonumber_size, 'autonumber_size': opts.autonumber_size,
'autonumber_start': opts.autonumber_start, 'autonumber_start': opts.autonumber_start,
'restrictfilenames': opts.restrictfilenames, 'restrictfilenames': opts.restrictfilenames,

View File

@@ -2345,7 +2345,7 @@ except ImportError: # Python <3.4
# HTMLParseError has been deprecated in Python 3.3 and removed in # HTMLParseError has been deprecated in Python 3.3 and removed in
# Python 3.5. Introducing dummy exception for Python >3.5 for compatible # Python 3.5. Introducing dummy exception for Python >3.5 for compatible
# and uniform cross-version exceptiong handling # and uniform cross-version exception handling
class compat_HTMLParseError(Exception): class compat_HTMLParseError(Exception):
pass pass

View File

@@ -97,12 +97,15 @@ class FragmentFD(FileDownloader):
def _download_fragment(self, ctx, frag_url, info_dict, headers=None): def _download_fragment(self, ctx, frag_url, info_dict, headers=None):
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], ctx['fragment_index']) fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], ctx['fragment_index'])
success = ctx['dl'].download(fragment_filename, { fragment_info_dict = {
'url': frag_url, 'url': frag_url,
'http_headers': headers or info_dict.get('http_headers'), 'http_headers': headers or info_dict.get('http_headers'),
}) }
success = ctx['dl'].download(fragment_filename, fragment_info_dict)
if not success: if not success:
return False, None return False, None
if fragment_info_dict.get('filetime'):
ctx['fragment_filetime'] = fragment_info_dict.get('filetime')
down, frag_sanitized = sanitize_open(fragment_filename, 'rb') down, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized ctx['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read() frag_content = down.read()
@@ -258,6 +261,13 @@ class FragmentFD(FileDownloader):
downloaded_bytes = ctx['complete_frags_downloaded_bytes'] downloaded_bytes = ctx['complete_frags_downloaded_bytes']
else: else:
self.try_rename(ctx['tmpfilename'], ctx['filename']) self.try_rename(ctx['tmpfilename'], ctx['filename'])
if self.params.get('updatetime', True):
filetime = ctx.get('fragment_filetime')
if filetime:
try:
os.utime(ctx['filename'], (time.time(), filetime))
except Exception:
pass
downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename'])) downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename']))
self._hook_progress({ self._hook_progress({

View File

@@ -42,11 +42,13 @@ class HlsFD(FragmentFD):
# no segments will definitely be appended to the end of the playlist. # no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of # r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4] # # event media playlists [4]
r'#EXT-X-MAP:', # media initialization [5]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4 # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2 # 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2 # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5 # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
# 5. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.5
) )
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES] check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest
@@ -141,7 +143,7 @@ class HlsFD(FragmentFD):
count = 0 count = 0
headers = info_dict.get('http_headers', {}) headers = info_dict.get('http_headers', {})
if byte_range: if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end']) headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success, frag_content = self._download_fragment( success, frag_content = self._download_fragment(
@@ -170,8 +172,12 @@ class HlsFD(FragmentFD):
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence) iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen( decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read() self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
frag_content = AES.new( # Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content) # size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content) self._append_fragment(ctx, frag_content)
# We only download the first fragment during the test # We only download the first fragment during the test
if test: if test:

View File

@@ -106,7 +106,14 @@ class HttpFD(FileDownloader):
set_range(request, range_start, range_end) set_range(request, range_start, range_end)
# Establish connection # Establish connection
try: try:
ctx.data = self.ydl.urlopen(request) try:
ctx.data = self.ydl.urlopen(request)
except (compat_urllib_error.URLError, ) as err:
# reason may not be available, e.g. for urllib2.HTTPError on python 2.6
reason = getattr(err, 'reason', None)
if isinstance(reason, socket.timeout):
raise RetryDownload(err)
raise err
# When trying to resume, Content-Range HTTP header of response has to be checked # When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers # to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range # that don't support resuming and serve a whole file with no Content-Range
@@ -218,9 +225,10 @@ class HttpFD(FileDownloader):
def retry(e): def retry(e):
to_stdout = ctx.tmpfilename == '-' to_stdout = ctx.tmpfilename == '-'
if not to_stdout: if ctx.stream is not None:
ctx.stream.close() if not to_stdout:
ctx.stream = None ctx.stream.close()
ctx.stream = None
ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename)) ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e) raise RetryDownload(e)
@@ -233,9 +241,11 @@ class HttpFD(FileDownloader):
except socket.timeout as e: except socket.timeout as e:
retry(e) retry(e)
except socket.error as e: except socket.error as e:
if e.errno not in (errno.ECONNRESET, errno.ETIMEDOUT): # SSLError on python 2 (inherits socket.error) may have
raise # no errno set but this error message
retry(e) if e.errno in (errno.ECONNRESET, errno.ETIMEDOUT) or getattr(e, 'message', None) == 'The read operation timed out':
retry(e)
raise
byte_counter += len(data_block) byte_counter += len(data_block)

View File

@@ -1,14 +1,15 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar
import re import re
import time
from .amp import AMPIE from .amp import AMPIE
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE from ..utils import (
from ..compat import compat_urlparse parse_duration,
parse_iso8601,
try_get,
)
class AbcNewsVideoIE(AMPIE): class AbcNewsVideoIE(AMPIE):
@@ -18,8 +19,8 @@ class AbcNewsVideoIE(AMPIE):
(?: (?:
abcnews\.go\.com/ abcnews\.go\.com/
(?: (?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-| (?:[^/]+/)*video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid= video/(?:embed|itemfeed)\?.*?\bid=
)| )|
fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/ fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
) )
@@ -36,6 +37,8 @@ class AbcNewsVideoIE(AMPIE):
'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.', 'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
'duration': 180, 'duration': 180,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1380454200,
'upload_date': '20130929',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@@ -47,6 +50,12 @@ class AbcNewsVideoIE(AMPIE):
}, { }, {
'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478', 'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://abcnews.go.com/video/itemfeed?id=46979033',
'only_matching': True,
}, {
'url': 'https://abcnews.go.com/GMA/News/video/history-christmas-story-67894761',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -67,28 +76,23 @@ class AbcNewsIE(InfoExtractor):
_VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)' _VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY', # Youtube Embeds
'url': 'https://abcnews.go.com/Entertainment/peter-billingsley-child-actor-christmas-story-hollywood-power/story?id=51286501',
'info_dict': { 'info_dict': {
'id': '10505354', 'id': '51286501',
'ext': 'flv', 'title': "Peter Billingsley: From child actor in 'A Christmas Story' to Hollywood power player",
'display_id': 'dramatic-video-rare-death-job-america', 'description': 'Billingsley went from a child actor to Hollywood power player.',
'title': 'Occupational Hazards',
'description': 'Nightline investigates the dangers that lurk at various jobs.',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20100428',
'timestamp': 1272412800,
}, },
'add_ie': ['AbcNewsVideo'], 'playlist_count': 5,
}, { }, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818', 'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': { 'info_dict': {
'id': '38897857', 'id': '38897857',
'ext': 'mp4', 'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single', 'title': 'Justin Timberlake Drops Hints For Secret Single',
'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.', 'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.',
'upload_date': '20160515', 'upload_date': '20160505',
'timestamp': 1463329500, 'timestamp': 1462442280,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@@ -100,49 +104,55 @@ class AbcNewsIE(InfoExtractor):
}, { }, {
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343', 'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True, 'only_matching': True,
}, {
# inline.type == 'video'
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) story_id = self._match_id(url)
display_id = mobj.group('display_id') webpage = self._download_webpage(url, story_id)
video_id = mobj.group('id') story = self._parse_json(self._search_regex(
r"window\['__abcnews__'\]\s*=\s*({.+?});",
webpage, 'data'), story_id)['page']['content']['story']['everscroll'][0]
article_contents = story.get('articleContents') or {}
webpage = self._download_webpage(url, video_id) def entries():
video_url = self._search_regex( featured_video = story.get('featuredVideo') or {}
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL') feed = try_get(featured_video, lambda x: x['video']['feed'])
full_video_url = compat_urlparse.urljoin(url, video_url) if feed:
yield {
'_type': 'url',
'id': featured_video.get('id'),
'title': featured_video.get('name'),
'url': feed,
'thumbnail': featured_video.get('images'),
'description': featured_video.get('description'),
'timestamp': parse_iso8601(featured_video.get('uploadDate')),
'duration': parse_duration(featured_video.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
youtube_url = YoutubeIE._extract_url(webpage) for inline in (article_contents.get('inlines') or []):
inline_type = inline.get('type')
if inline_type == 'iframe':
iframe_url = try_get(inline, lambda x: x['attrs']['src'])
if iframe_url:
yield self.url_result(iframe_url)
elif inline_type == 'video':
video_id = inline.get('id')
if video_id:
yield {
'_type': 'url',
'id': video_id,
'url': 'http://abcnews.go.com/video/embed?id=' + video_id,
'thumbnail': inline.get('imgSrc') or inline.get('imgDefault'),
'description': inline.get('description'),
'duration': parse_duration(inline.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
timestamp = None return self.playlist_result(
date_str = self._html_search_regex( entries(), story_id, article_contents.get('headline'),
r'<span[^>]+class="timestamp">([^<]+)</span>', article_contents.get('subHead'))
webpage, 'timestamp', fatal=False)
if date_str:
tz_offset = 0
if date_str.endswith(' ET'): # Eastern Time
tz_offset = -5
date_str = date_str[:-3]
date_formats = ['%b. %d, %Y', '%b %d, %Y, %I:%M %p']
for date_format in date_formats:
try:
timestamp = calendar.timegm(time.strptime(date_str.strip(), date_format))
except ValueError:
continue
if timestamp is not None:
timestamp -= tz_offset * 3600
entry = {
'_type': 'url_transparent',
'ie_key': AbcNewsVideoIE.ie_key(),
'url': full_video_url,
'id': video_id,
'display_id': display_id,
'timestamp': timestamp,
}
if youtube_url:
entries = [entry, self.url_result(youtube_url, ie=YoutubeIE.ie_key())]
return self.playlist_result(entries)
return entry

View File

@@ -2,21 +2,48 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import functools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
float_or_none, clean_podcast_url,
int_or_none, int_or_none,
try_get, parse_iso8601,
unified_timestamp,
OnDemandPagedList,
) )
class ACastIE(InfoExtractor): class ACastBaseIE(InfoExtractor):
def _extract_episode(self, episode, show_info):
title = episode['title']
info = {
'id': episode['id'],
'display_id': episode.get('episodeUrl'),
'url': clean_podcast_url(episode['url']),
'title': title,
'description': clean_html(episode.get('description') or episode.get('summary')),
'thumbnail': episode.get('image'),
'timestamp': parse_iso8601(episode.get('publishDate')),
'duration': int_or_none(episode.get('duration')),
'filesize': int_or_none(episode.get('contentLength')),
'season_number': int_or_none(episode.get('season')),
'episode': title,
'episode_number': int_or_none(episode.get('episode')),
}
info.update(show_info)
return info
def _extract_show_info(self, show):
return {
'creator': show.get('author'),
'series': show.get('title'),
}
def _call_api(self, path, video_id, query=None):
return self._download_json(
'https://feeder.acast.com/api/v1/shows/' + path, video_id, query=query)
class ACastIE(ACastBaseIE):
IE_NAME = 'acast' IE_NAME = 'acast'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@@ -28,15 +55,15 @@ class ACastIE(InfoExtractor):
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna', 'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': '16d936099ec5ca2d5869e3a813ee8dc4', 'md5': 'f5598f3ad1e4776fed12ec1407153e4b',
'info_dict': { 'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9', 'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3', 'ext': 'mp3',
'title': '2. Raggarmordet - Röster ur det förflutna', 'title': '2. Raggarmordet - Röster ur det förflutna',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4', 'description': 'md5:a992ae67f4d98f1c0141598f7bebbf67',
'timestamp': 1477346700, 'timestamp': 1477346700,
'upload_date': '20161024', 'upload_date': '20161024',
'duration': 2766.602563, 'duration': 2766,
'creator': 'Anton Berg & Martin Johnson', 'creator': 'Anton Berg & Martin Johnson',
'series': 'Spår', 'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna', 'episode': '2. Raggarmordet - Röster ur det förflutna',
@@ -45,7 +72,7 @@ class ACastIE(InfoExtractor):
'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015', 'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22', 'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9', 'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
@@ -54,40 +81,14 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups() channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json( episode = self._call_api(
'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id), '%s/episodes/%s' % (channel, display_id),
display_id) display_id, {'showInfo': 'true'})
media_url = s['url'] return self._extract_episode(
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id): episode, self._extract_show_info(episode.get('show') or {}))
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e.get('name') or s['title']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': media_url,
'title': title,
'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),
'season_number': int_or_none(e.get('seasonNumber')),
'episode': title,
'episode_number': int_or_none(e.get('episodeNumber')),
}
class ACastChannelIE(InfoExtractor): class ACastChannelIE(ACastBaseIE):
IE_NAME = 'acast:channel' IE_NAME = 'acast:channel'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@@ -102,34 +103,24 @@ class ACastChannelIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786', 'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus', 'title': 'Today in Focus',
'description': 'md5:9ba5564de5ce897faeb12963f4537a64', 'description': 'md5:c09ce28c91002ce4ffce71d6504abaae',
}, },
'playlist_mincount': 35, 'playlist_mincount': 200,
}, { }, {
'url': 'http://play.acast.com/s/ft-banking-weekly', 'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True, 'only_matching': True,
}] }]
_API_BASE_URL = 'https://play.acast.com/api/'
_PAGE_SIZE = 10
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url) return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
def _real_extract(self, url): def _real_extract(self, url):
channel_slug = self._match_id(url) show_slug = self._match_id(url)
channel_data = self._download_json( show = self._call_api(show_slug, show_slug)
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug) show_info = self._extract_show_info(show)
entries = OnDemandPagedList(functools.partial( entries = []
self._fetch_page, channel_slug), self._PAGE_SIZE) for episode in (show.get('episodes') or []):
return self.playlist_result(entries, compat_str( entries.append(self._extract_episode(episode, show_info))
channel_data['id']), channel_data['name'], channel_data.get('description')) return self.playlist_result(
entries, show.get('id'), show.get('title'), show.get('description'))

View File

@@ -10,6 +10,7 @@ import random
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_cbc_decrypt from ..aes import aes_cbc_decrypt
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_b64decode, compat_b64decode,
compat_ord, compat_ord,
) )
@@ -18,11 +19,14 @@ from ..utils import (
bytes_to_long, bytes_to_long,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none,
intlist_to_bytes, intlist_to_bytes,
long_to_bytes, long_to_bytes,
pkcs1pad, pkcs1pad,
strip_or_none, strip_or_none,
urljoin, try_get,
unified_strdate,
urlencode_postdata,
) )
@@ -31,16 +35,30 @@ class ADNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites', 'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'md5': 'e497370d847fd79d9d4c74be55575c7a', 'md5': '0319c99885ff5547565cacb4f3f9348d',
'info_dict': { 'info_dict': {
'id': '7778', 'id': '7778',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Blue Exorcist - Kyôto Saga - Épisode 1', 'title': 'Blue Exorcist - Kyôto Saga - Episode 1',
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5', 'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
'series': 'Blue Exorcist - Kyôto Saga',
'duration': 1467,
'release_date': '20170106',
'comment_count': int,
'average_rating': float,
'season_number': 2,
'episode': 'Début des hostilités',
'episode_number': 1,
} }
} }
_NETRC_MACHINE = 'animedigitalnetwork'
_BASE_URL = 'http://animedigitalnetwork.fr' _BASE_URL = 'http://animedigitalnetwork.fr'
_RSA_KEY = (0xc35ae1e4356b65a73b551493da94b8cb443491c0aa092a357a5aee57ffc14dda85326f42d716e539a34542a0d3f363adf16c5ec222d713d5997194030ee2e4f0d1fb328c01a81cf6868c090d50de8e169c6b13d1675b9eeed1cbc51e1fffca9b38af07f37abd790924cd3bee59d0257cfda4fe5f3f0534877e21ce5821447d1b, 65537) _API_BASE_URL = 'https://gw.api.animedigitalnetwork.fr/'
_PLAYER_BASE_URL = _API_BASE_URL + 'player/'
_HEADERS = {}
_LOGIN_ERR_MESSAGE = 'Unable to log in'
_RSA_KEY = (0x9B42B08905199A5CCE2026274399CA560ECB209EE9878A708B1C0812E1BB8CB5D1FB7441861147C1A1F2F3A0476DD63A9CAC20D3E983613346850AA6CB38F16DC7D720FD7D86FC6E5B3D5BBC72E14CD0BF9E869F2CEA2CCAD648F1DCE38F1FF916CEFB2D339B64AA0264372344BC775E265E8A852F88144AB0BD9AA06C1A4ABB, 65537)
_POS_ALIGN_MAP = { _POS_ALIGN_MAP = {
'start': 1, 'start': 1,
'end': 3, 'end': 3,
@@ -54,26 +72,24 @@ class ADNIE(InfoExtractor):
def _ass_subtitles_timecode(seconds): def _ass_subtitles_timecode(seconds):
return '%01d:%02d:%02d.%02d' % (seconds / 3600, (seconds % 3600) / 60, seconds % 60, (seconds % 1) * 100) return '%01d:%02d:%02d.%02d' % (seconds / 3600, (seconds % 3600) / 60, seconds % 60, (seconds % 1) * 100)
def _get_subtitles(self, sub_path, video_id): def _get_subtitles(self, sub_url, video_id):
if not sub_path: if not sub_url:
return None return None
enc_subtitles = self._download_webpage( enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, sub_path), sub_url, video_id, 'Downloading subtitles location', fatal=False) or '{}'
video_id, 'Downloading subtitles location', fatal=False) or '{}'
subtitle_location = (self._parse_json(enc_subtitles, video_id, fatal=False) or {}).get('location') subtitle_location = (self._parse_json(enc_subtitles, video_id, fatal=False) or {}).get('location')
if subtitle_location: if subtitle_location:
enc_subtitles = self._download_webpage( enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, subtitle_location), subtitle_location, video_id, 'Downloading subtitles data',
video_id, 'Downloading subtitles data', fatal=False, fatal=False, headers={'Origin': 'https://animedigitalnetwork.fr'})
headers={'Origin': 'https://animedigitalnetwork.fr'})
if not enc_subtitles: if not enc_subtitles:
return None return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt( dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])), bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(binascii.unhexlify(self._K + '4b8ef13ec1872730')), bytes_to_intlist(binascii.unhexlify(self._K + 'ab9f52f5baae7c72')),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24])) bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
)) ))
subtitles_json = self._parse_json( subtitles_json = self._parse_json(
@@ -117,61 +133,100 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
}]) }])
return subtitles return subtitles
def _real_initialize(self):
username, password = self._get_login_info()
if not username:
return
try:
access_token = (self._download_json(
self._API_BASE_URL + 'authentication/login', None,
'Logging in', self._LOGIN_ERR_MESSAGE, fatal=False,
data=urlencode_postdata({
'password': password,
'rememberMe': False,
'source': 'Web',
'username': username,
})) or {}).get('accessToken')
if access_token:
self._HEADERS = {'authorization': 'Bearer ' + access_token}
except ExtractorError as e:
message = None
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(
e.cause.read().decode(), None, fatal=False) or {}
message = resp.get('message') or resp.get('code')
self.report_warning(message or self._LOGIN_ERR_MESSAGE)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) video_base_url = self._PLAYER_BASE_URL + 'video/%s/' % video_id
player_config = self._parse_json(self._search_regex( player = self._download_json(
r'playerConfig\s*=\s*({.+});', webpage, video_base_url + 'configuration', video_id,
'player config', default='{}'), video_id, fatal=False) 'Downloading player config JSON metadata',
if not player_config: headers=self._HEADERS)['player']
config_url = urljoin(self._BASE_URL, self._search_regex( options = player['options']
r'(?:id="player"|class="[^"]*adn-player-container[^"]*")[^>]+data-url="([^"]+)"',
webpage, 'config url'))
player_config = self._download_json(
config_url, video_id,
'Downloading player config JSON metadata')['player']
video_info = {} user = options['user']
video_info_str = self._search_regex( if not user.get('hasAccess'):
r'videoInfo\s*=\s*({.+});', webpage, self.raise_login_required()
'video info', fatal=False)
if video_info_str:
video_info = self._parse_json(
video_info_str, video_id, fatal=False) or {}
options = player_config.get('options') or {} token = self._download_json(
metas = options.get('metas') or {} user.get('refreshTokenUrl') or (self._PLAYER_BASE_URL + 'refresh/token'),
links = player_config.get('links') or {} video_id, 'Downloading access token', headers={
sub_path = player_config.get('subtitles') 'x-player-refresh-token': user['refreshToken']
error = None }, data=b'')['token']
if not links:
links_url = player_config.get('linksurl') or options['videoUrl'] links_url = try_get(options, lambda x: x['video']['url']) or (video_base_url + 'link')
token = options['token'] self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)]) message = bytes_to_intlist(json.dumps({
message = bytes_to_intlist(json.dumps({ 'k': self._K,
'k': self._K, 't': token,
'e': 60, }))
't': token,
})) # Sometimes authentication fails for no good reason, retry with
# a different random padding
links_data = None
for _ in range(3):
padded_message = intlist_to_bytes(pkcs1pad(message, 128)) padded_message = intlist_to_bytes(pkcs1pad(message, 128))
n, e = self._RSA_KEY n, e = self._RSA_KEY
encrypted_message = long_to_bytes(pow(bytes_to_long(padded_message), e, n)) encrypted_message = long_to_bytes(pow(bytes_to_long(padded_message), e, n))
authorization = base64.b64encode(encrypted_message).decode() authorization = base64.b64encode(encrypted_message).decode()
links_data = self._download_json(
urljoin(self._BASE_URL, links_url), video_id, try:
'Downloading links JSON metadata', headers={ links_data = self._download_json(
'Authorization': 'Bearer ' + authorization, links_url, video_id, 'Downloading links JSON metadata', headers={
}) 'X-Player-Token': authorization
links = links_data.get('links') or {} }, query={
metas = metas or links_data.get('meta') or {} 'freeWithAds': 'true',
sub_path = sub_path or links_data.get('subtitles') or \ 'adaptive': 'false',
'index.php?option=com_vodapi&task=subtitles.getJSON&format=json&id=' + video_id 'withMetadata': 'true',
sub_path += '&token=' + token 'source': 'Web'
error = links_data.get('error') })
title = metas.get('title') or video_info['title'] break
except ExtractorError as e:
if not isinstance(e.cause, compat_HTTPError):
raise e
if e.cause.code == 401:
# This usually goes away with a different random pkcs1pad, so retry
continue
error = self._parse_json(e.cause.read(), video_id)
message = error.get('message')
if e.cause.code == 403 and error.get('code') == 'player-bad-geolocation-country':
self.raise_geo_restricted(msg=message)
raise ExtractorError(message)
else:
raise ExtractorError('Giving up retrying')
links = links_data.get('links') or {}
metas = links_data.get('metadata') or {}
sub_url = (links.get('subtitles') or {}).get('all')
video_info = links_data.get('video') or {}
title = metas['title']
formats = [] formats = []
for format_id, qualities in links.items(): for format_id, qualities in (links.get('streaming') or {}).items():
if not isinstance(qualities, dict): if not isinstance(qualities, dict):
continue continue
for quality, load_balancer_url in qualities.items(): for quality, load_balancer_url in qualities.items():
@@ -189,19 +244,26 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
for f in m3u8_formats: for f in m3u8_formats:
f['language'] = 'fr' f['language'] = 'fr'
formats.extend(m3u8_formats) formats.extend(m3u8_formats)
if not error:
error = options.get('error')
if not formats and error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
self._sort_formats(formats) self._sort_formats(formats)
video = (self._download_json(
self._API_BASE_URL + 'video/%s' % video_id, video_id,
'Downloading additional video metadata', fatal=False) or {}).get('video') or {}
show = video.get('show') or {}
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': strip_or_none(metas.get('summary') or video_info.get('resume')), 'description': strip_or_none(metas.get('summary') or video.get('summary')),
'thumbnail': video_info.get('image'), 'thumbnail': video_info.get('image') or player.get('image'),
'formats': formats, 'formats': formats,
'subtitles': self.extract_subtitles(sub_path, video_id), 'subtitles': self.extract_subtitles(sub_url, video_id),
'episode': metas.get('subtitle') or video_info.get('videoTitle'), 'episode': metas.get('subtitle') or video.get('name'),
'series': video_info.get('playlistTitle'), 'episode_number': int_or_none(video.get('shortNumber')),
'series': show.get('title'),
'season_number': int_or_none(video.get('season')),
'duration': int_or_none(video_info.get('duration') or video.get('duration')),
'release_date': unified_strdate(video.get('releaseDate')),
'average_rating': float_or_none(video.get('rating') or metas.get('rating')),
'comment_count': int_or_none(video.get('commentsCount')),
} }

View File

@@ -5,20 +5,32 @@ import re
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
extract_attributes,
ExtractorError, ExtractorError,
GeoRestrictedError,
int_or_none, int_or_none,
smuggle_url,
update_url_query, update_url_query,
) urlencode_postdata,
from ..compat import (
compat_urlparse,
) )
class AENetworksBaseIE(ThePlatformIE): class AENetworksBaseIE(ThePlatformIE):
_BASE_URL_REGEX = r'''(?x)https?://
(?:(?:www|play|watch)\.)?
(?P<domain>
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/'''
_THEPLATFORM_KEY = 'crazyjava' _THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t' _THEPLATFORM_SECRET = 's3cr3t'
_DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'),
'aetv.com': ('AETV', 'aetv'),
'mylifetime.com': ('LIFETIME', 'lifetime'),
'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'),
'fyi.tv': ('FYI', 'fyi'),
'historyvault.com': (None, 'historyvault'),
'biography.com': (None, 'biography'),
}
def _extract_aen_smil(self, smil_url, video_id, auth=None): def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {'mbr': 'true'} query = {'mbr': 'true'}
@@ -31,7 +43,7 @@ class AENetworksBaseIE(ThePlatformIE):
'assetTypes': 'high_video_s3' 'assetTypes': 'high_video_s3'
}, { }, {
'assetTypes': 'high_video_s3', 'assetTypes': 'high_video_s3',
'switch': 'hls_ingest_fastly' 'switch': 'hls_high_fastly',
}] }]
formats = [] formats = []
subtitles = {} subtitles = {}
@@ -44,6 +56,8 @@ class AENetworksBaseIE(ThePlatformIE):
tp_formats, tp_subtitles = self._extract_theplatform_smil( tp_formats, tp_subtitles = self._extract_theplatform_smil(
m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes'])) m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes']))
except ExtractorError as e: except ExtractorError as e:
if isinstance(e, GeoRestrictedError):
raise
last_e = e last_e = e
continue continue
formats.extend(tp_formats) formats.extend(tp_formats)
@@ -57,24 +71,45 @@ class AENetworksBaseIE(ThePlatformIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand = self._DOMAIN_MAP[domain]
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0]
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({
'title': title,
'series': result.get('seriesName'),
'season_number': int_or_none(result.get('tvSeasonNumber')),
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
})
return info
class AENetworksIE(AENetworksBaseIE): class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = r'''(?x) _VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'''(?P<id>
https?:// shows/[^/]+/season-\d+/episode-\d+|
(?:www\.)? (?:
(?P<domain> (?:movie|special)s/[^/]+|
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com| (?:shows/[^/]+/)?videos
fyi\.tv )/[^/?#&]+
)/ )'''
(?:
shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|
movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?|
specials/(?P<special_display_id>[^/]+)/(?:full-special|preview-)|
collections/[^/]+/(?P<collection_display_id>[^/]+)
)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': { 'info_dict': {
@@ -91,22 +126,23 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}, { 'skip': 'This video is only available for users of participating TV providers.',
'url': 'http://www.history.com/shows/ancient-aliens/season-1',
'info_dict': {
'id': '71889446852',
},
'playlist_mincount': 5,
}, {
'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
'info_dict': {
'id': 'SERIES4317',
'title': 'Atlanta Plastic',
},
'playlist_mincount': 2,
}, { }, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1', 'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'only_matching': True 'info_dict': {
'id': '600587331957',
'ext': 'mp4',
'title': 'Inlawful Entry',
'description': 'md5:57c12115a2b384d883fe64ca50529e08',
'timestamp': 1452634428,
'upload_date': '20160112',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True 'only_matching': True
@@ -117,78 +153,125 @@ class AENetworksIE(AENetworksBaseIE):
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie', 'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True 'only_matching': True
}, { }, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us', 'url': 'https://watch.lifetimemovieclub.com/movies/10-year-reunion/full-movie',
'only_matching': True 'only_matching': True
}, { }, {
'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special', 'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special',
'only_matching': True 'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/america-the-story-of-us/westward',
'only_matching': True
}, { }, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story/preview-hunting-jonbenets-killer-the-untold-story', 'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story/preview-hunting-jonbenets-killer-the-untold-story',
'only_matching': True 'only_matching': True
}, {
'url': 'http://www.history.com/videos/history-of-valentines-day',
'only_matching': True
}, {
'url': 'https://play.aetv.com/shows/duck-dynasty/videos/best-of-duck-dynasty-getting-quack-in-shape',
'only_matching': True
}] }]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI',
}
def _real_extract(self, url): def _real_extract(self, url):
domain, show_path, movie_display_id, special_display_id, collection_display_id = re.match(self._VALID_URL, url).groups() domain, canonical = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id or special_display_id or collection_display_id return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url)
webpage = self._download_webpage(url, display_id, headers=self.geo_verification_headers())
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
if url_parts_len == 1:
entries = []
for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
if entries:
return self.playlist_result(
entries, self._html_search_meta('aetn:SeriesId', webpage),
self._html_search_meta('aetn:SeriesTitle', webpage))
else:
# single season
url_parts_len = 2
if url_parts_len == 2:
entries = []
for episode_item in re.findall(r'(?s)<[^>]+class="[^"]*(?:episode|program)-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical'])
entries.append(self.url_result(
episode_url, 'AENetworks',
episode_attributes.get('data-videoid') or episode_attributes.get('data-video-id')))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex( class AENetworksListBaseIE(AENetworksBaseIE):
[r"media_url\s*=\s*'(?P<url>[^']+)'", def _call_api(self, resource, slug, brand, fields):
r'data-media-url=(?P<url>(?:https?:)?//[^\s>]+)', return self._download_json(
r'data-media-url=(["\'])(?P<url>(?:(?!\1).)+?)\1'], 'https://yoga.appsvcs.aetnd.com/graphql',
webpage, 'video url', group='url') slug, query={'brand': brand}, data=urlencode_postdata({
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( 'query': '''{
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id) %s(slug: "%s") {
info = self._parse_theplatform_metadata(theplatform_metadata) %s
auth = None }
if theplatform_metadata.get('AETN$isBehindWall'): }''' % (resource, slug, fields),
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain] }))['data'][resource]
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'], def _real_extract(self, url):
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'), domain, slug = re.match(self._VALID_URL, url).groups()
theplatform_metadata['ratings'][0]['rating']) _, brand = self._DOMAIN_MAP[domain]
auth = self._extract_mvpd_auth( playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
url, video_id, requestor_id, resource) base_url = 'http://watch.%s' % domain
info.update(self._search_json_ld(webpage, video_id, fatal=False))
info.update(self._extract_aen_smil(media_url, video_id, auth)) entries = []
return info for item in (playlist.get(self._ITEMS_KEY) or []):
doc = self._get_doc(item)
canonical = doc.get('canonical')
if not canonical:
continue
entries.append(self.url_result(
base_url + canonical, AENetworksIE.ie_key(), doc.get('id')))
description = None
if self._PLAYLIST_DESCRIPTION_KEY:
description = playlist.get(self._PLAYLIST_DESCRIPTION_KEY)
return self.playlist_result(
entries, playlist.get('id'),
playlist.get(self._PLAYLIST_TITLE_KEY), description)
class AENetworksCollectionIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:collection'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'(?:[^/]+/)*(?:list|collections)/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://watch.historyvault.com/list/america-the-story-of-us',
'info_dict': {
'id': '282',
'title': 'America The Story of Us',
},
'playlist_mincount': 12,
}, {
'url': 'https://watch.historyvault.com/shows/america-the-story-of-us-2/season-1/list/america-the-story-of-us',
'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/mysteryquest',
'only_matching': True
}]
_RESOURCE = 'list'
_ITEMS_KEY = 'items'
_PLAYLIST_TITLE_KEY = 'display_title'
_PLAYLIST_DESCRIPTION_KEY = None
_FIELDS = '''id
display_title
items {
... on ListVideoItem {
doc {
canonical
id
}
}
}'''
def _get_doc(self, item):
return item.get('doc') or {}
class AENetworksShowIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:show'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'shows/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'http://www.history.com/shows/ancient-aliens',
'info_dict': {
'id': 'SERIES1574',
'title': 'Ancient Aliens',
'description': 'md5:3f6d74daf2672ff3ae29ed732e37ea7f',
},
'playlist_mincount': 150,
}]
_RESOURCE = 'series'
_ITEMS_KEY = 'episodes'
_PLAYLIST_TITLE_KEY = 'title'
_PLAYLIST_DESCRIPTION_KEY = 'description'
_FIELDS = '''description
id
title
episodes {
canonical
id
}'''
def _get_doc(self, item):
return item
class HistoryTopicIE(AENetworksBaseIE): class HistoryTopicIE(AENetworksBaseIE):
@@ -204,6 +287,7 @@ class HistoryTopicIE(AENetworksBaseIE):
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7', 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729, 'timestamp': 1375819729,
'upload_date': '20130806', 'upload_date': '20130806',
'uploader': 'AENE-NEW',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@@ -212,36 +296,47 @@ class HistoryTopicIE(AENetworksBaseIE):
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}] }]
def theplatform_url_result(self, theplatform_url, video_id, query): def _real_extract(self, url):
return { display_id = self._match_id(url)
'_type': 'url_transparent', return self.url_result(
'id': video_id, 'http://www.history.com/videos/' + display_id,
'url': smuggle_url( AENetworksIE.ie_key())
update_url_query(theplatform_url, query),
{
'sig': { class HistoryPlayerIE(AENetworksBaseIE):
'key': self._THEPLATFORM_KEY, IE_NAME = 'history:player'
'secret': self._THEPLATFORM_SECRET, _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|biography)\.com)/player/(?P<id>\d+)'
}, _TESTS = []
'force_smil_url': True
}), def _real_extract(self, url):
'ie_key': 'ThePlatform', domain, video_id = re.match(self._VALID_URL, url).groups()
} return self._extract_aetn_info(domain, 'id', video_id, url)
class BiographyIE(AENetworksBaseIE):
_VALID_URL = r'https?://(?:www\.)?biography\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.biography.com/video/vincent-van-gogh-full-episode-2075049808',
'info_dict': {
'id': '30322987',
'ext': 'mp4',
'title': 'Vincent Van Gogh - Full Episode',
'description': 'A full biography about the most influential 20th century painter, Vincent Van Gogh.',
'timestamp': 1311970571,
'upload_date': '20110729',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( player_url = self._search_regex(
r'<phoenix-iframe[^>]+src="[^"]+\btpid=(\d+)', webpage, 'tpid') r'<phoenix-iframe[^>]+src="(%s)' % HistoryPlayerIE._VALID_URL,
result = self._download_json( webpage, 'player URL')
'https://feeds.video.aetnd.com/api/v2/history/videos', return self.url_result(player_url, HistoryPlayerIE.ie_key())
video_id, query={'filter[id]': video_id})['results'][0]
title = result['title']
info = self._extract_aen_smil(result['publicUrl'], video_id)
info.update({
'title': title,
'description': result.get('description'),
'duration': int_or_none(result.get('duration')),
'timestamp': int_or_none(result.get('added'), 1000),
})
return info

View File

@@ -275,7 +275,7 @@ class AfreecaTVIE(InfoExtractor):
video_element = video_xml.findall(compat_xpath('./track/video'))[-1] video_element = video_xml.findall(compat_xpath('./track/video'))[-1]
if video_element is None or video_element.text is None: if video_element is None or video_element.text is None:
raise ExtractorError( raise ExtractorError(
'Video %s video does not exist' % video_id, expected=True) 'Video %s does not exist' % video_id, expected=True)
video_url = video_element.text.strip() video_url = video_element.text.strip()

View File

@@ -1,13 +1,16 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor): class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/(?:programmes|video)/.*?/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?aljazeera\.com/(?P<type>program/[^/]+|(?:feature|video)s)/\d{4}/\d{1,2}/\d{1,2}/(?P<id>[^/?&#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html', 'url': 'https://www.aljazeera.com/program/episode/2014/9/19/deliverance',
'info_dict': { 'info_dict': {
'id': '3792260579001', 'id': '3792260579001',
'ext': 'mp4', 'ext': 'mp4',
@@ -20,14 +23,34 @@ class AlJazeeraIE(InfoExtractor):
'add_ie': ['BrightcoveNew'], 'add_ie': ['BrightcoveNew'],
'skip': 'Not accessible from Travis CI server', 'skip': 'Not accessible from Travis CI server',
}, { }, {
'url': 'http://www.aljazeera.com/video/news/2017/05/sierra-leone-709-carat-diamond-auctioned-170511100111930.html', 'url': 'https://www.aljazeera.com/videos/2017/5/11/sierra-leone-709-carat-diamond-to-be-auctioned-off',
'only_matching': True,
}, {
'url': 'https://www.aljazeera.com/features/2017/8/21/transforming-pakistans-buses-into-art',
'only_matching': True, 'only_matching': True,
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/665003303001/default_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
program_name = self._match_id(url) post_type, name = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, program_name) post_type = {
brightcove_id = self._search_regex( 'features': 'post',
r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id') 'program': 'episode',
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id) 'videos': 'video',
}[post_type.split('/')[0]]
video = self._download_json(
'https://www.aljazeera.com/graphql', name, query={
'operationName': 'SingleArticleQuery',
'variables': json.dumps({
'name': name,
'postType': post_type,
}),
}, headers={
'wp-site': 'aje',
})['data']['article']['video']
video_id = video['id']
account_id = video.get('accountId') or '665003303001'
player_id = video.get('playerId') or 'BkeSH5BDb'
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id),
'BrightcoveNew', video_id)

View File

@@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE
from .vimeo import VimeoIE
from ..utils import (
int_or_none,
parse_iso8601,
update_url_query,
)
class AmaraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?amara\.org/(?:\w+/)?videos/(?P<id>\w+)'
_TESTS = [{
# Youtube
'url': 'https://amara.org/en/videos/jVx79ZKGK1ky/info/why-jury-trials-are-becoming-less-common/?tab=video',
'md5': 'ea10daf2b6154b8c1ecf9922aca5e8ae',
'info_dict': {
'id': 'h6ZuVdvYnfE',
'ext': 'mp4',
'title': 'Why jury trials are becoming less common',
'description': 'md5:a61811c319943960b6ab1c23e0cbc2c1',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'upload_date': '20160813',
'uploader': 'PBS NewsHour',
'uploader_id': 'PBSNewsHour',
'timestamp': 1549639570,
}
}, {
# Vimeo
'url': 'https://amara.org/en/videos/kYkK1VUTWW5I/info/vimeo-at-ces-2011',
'md5': '99392c75fa05d432a8f11df03612195e',
'info_dict': {
'id': '18622084',
'ext': 'mov',
'title': 'Vimeo at CES 2011!',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'timestamp': 1294763658,
'upload_date': '20110111',
'uploader': 'Sam Morrill',
'uploader_id': 'sammorrill'
}
}, {
# Direct Link
'url': 'https://amara.org/en/videos/s8KL7I3jLmh6/info/the-danger-of-a-single-story/',
'md5': 'd3970f08512738ee60c5807311ff5d3f',
'info_dict': {
'id': 's8KL7I3jLmh6',
'ext': 'mp4',
'title': 'The danger of a single story',
'description': 'md5:d769b31139c3b8bb5be9177f62ea3f23',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'upload_date': '20091007',
'timestamp': 1254942511,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
meta = self._download_json(
'https://amara.org/api/videos/%s/' % video_id,
video_id, query={'format': 'json'})
title = meta['title']
video_url = meta['all_urls'][0]
subtitles = {}
for language in (meta.get('languages') or []):
subtitles_uri = language.get('subtitles_uri')
if not (subtitles_uri and language.get('published')):
continue
subtitle = subtitles.setdefault(language.get('code') or 'en', [])
for f in ('json', 'srt', 'vtt'):
subtitle.append({
'ext': f,
'url': update_url_query(subtitles_uri, {'format': f}),
})
info = {
'url': video_url,
'id': video_id,
'subtitles': subtitles,
'title': title,
'description': meta.get('description'),
'thumbnail': meta.get('thumbnail'),
'duration': int_or_none(meta.get('duration')),
'timestamp': parse_iso8601(meta.get('created')),
}
for ie in (YoutubeIE, VimeoIE):
if ie.suitable(video_url):
info.update({
'_type': 'url_transparent',
'ie_key': ie.ie_key(),
})
break
return info

View File

@@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
@@ -11,25 +13,22 @@ from ..utils import (
class AMCNetworksIE(ThePlatformIE): class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?P<site>amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?P<id>(?:movies|shows(?:/[^/]+)+)/[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1', 'url': 'https://www.bbcamerica.com/shows/the-graham-norton-show/videos/tina-feys-adorable-airline-themed-family-dinner--51631',
'md5': '',
'info_dict': { 'info_dict': {
'id': 's3MX01Nl4vPH', 'id': '4Lq1dzOnZGt0',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1', 'title': "The Graham Norton Show - Season 28 - Tina Fey's Adorable Airline-Themed Family Dinner",
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.', 'description': "It turns out child stewardesses are very generous with the wine! All-new episodes of 'The Graham Norton Show' premiere Fridays at 11/10c on BBC America.",
'age_limit': 17, 'upload_date': '20201120',
'upload_date': '20160505', 'timestamp': 1605904350,
'timestamp': 1462468831,
'uploader': 'AMCN', 'uploader': 'AMCN',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Requires TV provider accounts',
}, { }, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge', 'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True, 'only_matching': True,
@@ -55,32 +54,34 @@ class AMCNetworksIE(ThePlatformIE):
'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1', 'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1',
'only_matching': True, 'only_matching': True,
}] }]
_REQUESTOR_ID_MAP = {
'amc': 'AMC',
'bbcamerica': 'BBCA',
'ifc': 'IFC',
'sundancetv': 'SUNDANCE',
'wetv': 'WETV',
}
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) requestor_id = self._REQUESTOR_ID_MAP[site]
properties = self._download_json(
'https://content-delivery-gw.svc.ds.amcn.com/api/v2/content/amcn/%s/url/%s' % (requestor_id.lower(), display_id),
display_id)['data']['properties']
query = { query = {
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
} }
media_url = self._search_regex( tp_path = 'M_UwQC/media/' + properties['videoPid']
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', media_url = 'https://link.theplatform.com/s/' + tp_path
webpage, 'media url') theplatform_metadata = self._download_theplatform_metadata(tp_path, display_id)
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid'] video_id = theplatform_metadata['pid']
title = theplatform_metadata['title'] title = theplatform_metadata['title']
rating = try_get( rating = try_get(
theplatform_metadata, lambda x: x['ratings'][0]['rating']) theplatform_metadata, lambda x: x['ratings'][0]['rating'])
auth_required = self._search_regex( video_category = properties.get('videoCategory')
r'window\.authRequired\s*=\s*(true|false);', if video_category and video_category.endswith('-Auth'):
webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(
r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
webpage, 'requestor id')
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating) requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(

View File

@@ -1,82 +1,159 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
js_to_json,
try_get, try_get,
unified_strdate, unified_strdate,
unified_timestamp,
) )
class AmericasTestKitchenIE(InfoExtractor): class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:americastestkitchen|cooks(?:country|illustrated))\.com/(?P<resource_type>episode|videos)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers', 'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f', 'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': { 'info_dict': {
'id': '5b400b9ee338f922cb06450c', 'id': '5b400b9ee338f922cb06450c',
'title': 'Weeknight Japanese Suppers', 'title': 'Japanese Suppers',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8', 'description': 'md5:64e606bfee910627efc4b5f050de92b3',
'thumbnail': r're:^https?://', 'thumbnail': r're:^https?://',
'timestamp': 1523664000, 'timestamp': 1523318400,
'upload_date': '20180414', 'upload_date': '20180410',
'release_date': '20180414', 'release_date': '20180410',
'series': "America's Test Kitchen", 'series': "America's Test Kitchen",
'season_number': 18, 'season_number': 18,
'episode': 'Weeknight Japanese Suppers', 'episode': 'Japanese Suppers',
'episode_number': 15, 'episode_number': 15,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# Metadata parsing behaves differently for newer episodes (705) as opposed to older episodes (582 above)
'url': 'https://www.americastestkitchen.com/episode/705-simple-chicken-dinner',
'md5': '06451608c57651e985a498e69cec17e5',
'info_dict': {
'id': '5fbe8c61bda2010001c6763b',
'title': 'Simple Chicken Dinner',
'ext': 'mp4',
'description': 'md5:eb68737cc2fd4c26ca7db30139d109e7',
'thumbnail': r're:^https?://',
'timestamp': 1610755200,
'upload_date': '20210116',
'release_date': '20210116',
'series': "America's Test Kitchen",
'season_number': 21,
'episode': 'Simple Chicken Dinner',
'episode_number': 3,
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon', 'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.cookscountry.com/episode/564-when-only-chocolate-will-do',
'only_matching': True,
}, {
'url': 'https://www.cooksillustrated.com/videos/4478-beef-wellington',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) resource_type, video_id = re.match(self._VALID_URL, url).groups()
is_episode = resource_type == 'episode'
if is_episode:
resource_type = 'episodes'
webpage = self._download_webpage(url, video_id) resource = self._download_json(
'https://www.americastestkitchen.com/api/v6/%s/%s' % (resource_type, video_id), video_id)
video_data = self._parse_json( video = resource['video'] if is_episode else resource
self._search_regex( episode = resource if is_episode else resource.get('episode') or {}
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id, js_to_json)
ep_data = try_get(
video_data,
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
'description') or ep_meta.get('description'))
thumbnail = try_get(ep_meta, lambda x: x['photo']['image_url'])
release_date = unified_strdate(ep_data.get('aired_at'))
season_number = int_or_none(ep_meta.get('season_number'))
episode = ep_meta.get('title')
episode_number = int_or_none(ep_meta.get('episode_number'))
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id, 'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % video['zypeId'],
'ie_key': 'Zype', 'ie_key': 'Zype',
'title': title, 'description': clean_html(video.get('description')),
'description': description, 'timestamp': unified_timestamp(video.get('publishDate')),
'thumbnail': thumbnail, 'release_date': unified_strdate(video.get('publishDate')),
'release_date': release_date, 'episode_number': int_or_none(episode.get('number')),
'series': "America's Test Kitchen", 'season_number': int_or_none(episode.get('season')),
'season_number': season_number, 'series': try_get(episode, lambda x: x['show']['title']),
'episode': episode, 'episode': episode.get('title'),
'episode_number': episode_number,
} }
class AmericasTestKitchenSeasonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<show>americastestkitchen|cookscountry)\.com/episodes/browse/season_(?P<id>\d+)'
_TESTS = [{
# ATK Season
'url': 'https://www.americastestkitchen.com/episodes/browse/season_1',
'info_dict': {
'id': 'season_1',
'title': 'Season 1',
},
'playlist_count': 13,
}, {
# Cooks Country Season
'url': 'https://www.cookscountry.com/episodes/browse/season_12',
'info_dict': {
'id': 'season_12',
'title': 'Season 12',
},
'playlist_count': 13,
}]
def _real_extract(self, url):
show_name, season_number = re.match(self._VALID_URL, url).groups()
season_number = int(season_number)
slug = 'atk' if show_name == 'americastestkitchen' else 'cco'
season = 'Season %d' % season_number
season_search = self._download_json(
'https://y1fnzxui30-dsn.algolia.net/1/indexes/everest_search_%s_season_desc_production' % slug,
season, headers={
'Origin': 'https://www.%s.com' % show_name,
'X-Algolia-API-Key': '8d504d0099ed27c1b73708d22871d805',
'X-Algolia-Application-Id': 'Y1FNZXUI30',
}, query={
'facetFilters': json.dumps([
'search_season_list:' + season,
'search_document_klass:episode',
'search_show_slug:' + slug,
]),
'attributesToRetrieve': 'description,search_%s_episode_number,search_document_date,search_url,title' % slug,
'attributesToHighlight': '',
'hitsPerPage': 1000,
})
def entries():
for episode in (season_search.get('hits') or []):
search_url = episode.get('search_url')
if not search_url:
continue
yield {
'_type': 'url',
'url': 'https://www.%s.com%s' % (show_name, search_url),
'id': try_get(episode, lambda e: e['objectID'].split('_')[-1]),
'title': episode.get('title'),
'description': episode.get('description'),
'timestamp': unified_timestamp(episode.get('search_document_date')),
'season_number': season_number,
'episode_number': int_or_none(episode.get('search_%s_episode_number' % slug)),
'ie_key': AmericasTestKitchenIE.ie_key(),
}
return self.playlist_result(
entries(), 'season_%d' % season_number, season)

View File

@@ -8,6 +8,7 @@ from ..utils import (
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
parse_iso8601, parse_iso8601,
unified_timestamp,
url_or_none, url_or_none,
) )
@@ -88,7 +89,7 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date')) timestamp = unified_timestamp(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return { return {
'id': video_id, 'id': video_id,

View File

@@ -116,8 +116,6 @@ class AnimeOnDemandIE(InfoExtractor):
r'(?s)<div[^>]+itemprop="description"[^>]*>(.+?)</div>', r'(?s)<div[^>]+itemprop="description"[^>]*>(.+?)</div>',
webpage, 'anime description', default=None) webpage, 'anime description', default=None)
entries = []
def extract_info(html, video_id, num=None): def extract_info(html, video_id, num=None):
title, description = [None] * 2 title, description = [None] * 2
formats = [] formats = []
@@ -233,7 +231,7 @@ class AnimeOnDemandIE(InfoExtractor):
self._sort_formats(info['formats']) self._sort_formats(info['formats'])
f = common_info.copy() f = common_info.copy()
f.update(info) f.update(info)
entries.append(f) yield f
# Extract teaser/trailer only when full episode is not available # Extract teaser/trailer only when full episode is not available
if not info['formats']: if not info['formats']:
@@ -247,7 +245,7 @@ class AnimeOnDemandIE(InfoExtractor):
'title': m.group('title'), 'title': m.group('title'),
'url': urljoin(url, m.group('href')), 'url': urljoin(url, m.group('href')),
}) })
entries.append(f) yield f
def extract_episodes(html): def extract_episodes(html):
for num, episode_html in enumerate(re.findall( for num, episode_html in enumerate(re.findall(
@@ -275,7 +273,8 @@ class AnimeOnDemandIE(InfoExtractor):
'episode_number': episode_number, 'episode_number': episode_number,
} }
extract_entries(episode_html, video_id, common_info) for e in extract_entries(episode_html, video_id, common_info):
yield e
def extract_film(html, video_id): def extract_film(html, video_id):
common_info = { common_info = {
@@ -283,11 +282,18 @@ class AnimeOnDemandIE(InfoExtractor):
'title': anime_title, 'title': anime_title,
'description': anime_description, 'description': anime_description,
} }
extract_entries(html, video_id, common_info) for e in extract_entries(html, video_id, common_info):
yield e
extract_episodes(webpage) def entries():
has_episodes = False
for e in extract_episodes(webpage):
has_episodes = True
yield e
if not entries: if not has_episodes:
extract_film(webpage, anime_id) for e in extract_film(webpage, anime_id):
yield e
return self.playlist_result(entries, anime_id, anime_title, anime_description) return self.playlist_result(
entries(), anime_id, anime_title, anime_description)

View File

@@ -116,7 +116,76 @@ class AnvatoIE(InfoExtractor):
'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn', 'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn',
'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W', 'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W',
'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ', 'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ',
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ' 'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
'X8POa4zPPaKVZHqmWjuEzfP31b1QM9VN': 'Dn5vOY9ooDw7VSl9qztjZI5o0g08mA0z',
'M2v78QkBMpNJlSPp9diX5F2PBmBy6Bog': 'ka6K32kyo7nDZfNkjQCGWf1lpApXMd1B',
'bvJ0dQpav07l0hG5JgfVLF2dv1vARwpP': 'BzoQW24GrJZoJfmNodiJKSPeB9B8NOxj',
'lxQMLg2XZKuEZaWgsqubBxV9INZ6bryY': 'Vm2Mx6noKds9jB71h6urazwlTG3m9x8l',
'04EnjvXeoSmkbJ9ckPs7oY0mcxv7PlyN': 'aXERQP9LMfQVlEDsgGs6eEA1SWznAQ8P',
'mQbO2ge6BFRWVPYCYpU06YvNt80XLvAX': 'E2BV1NGmasN5v7eujECVPJgwflnLPm2A',
'g43oeBzJrCml7o6fa5fRL1ErCdeD8z4K': 'RX34mZ6zVH4Nr6whbxIGLv9WSbxEKo8V',
'VQrDJoP7mtdBzkxhXbSPwGB1coeElk4x': 'j2VejQx0VFKQepAF7dI0mJLKtOVJE18z',
'WxA5NzLRjCrmq0NUgaU5pdMDuZO7RJ4w': 'lyY5ADLKaIOLEgAsGQCveEMAcqnx3rY9',
'M4lpMXB71ie0PjMCjdFzVXq0SeRVqz49': 'n2zVkOqaLIv3GbLfBjcwW51LcveWOZ2e',
'dyDZGEqN8u8nkJZcJns0oxYmtP7KbGAn': 'VXOEqQW9BtEVLajfZQSLEqxgS5B7qn2D',
'E7QNjrVY5u5mGvgu67IoDgV1CjEND8QR': 'rz8AaDmdKIkLmPNhB5ILPJnjS5PnlL8d',
'a4zrqjoKlfzg0dwHEWtP31VqcLBpjm4g': 'LY9J16gwETdGWa3hjBu5o0RzuoQDjqXQ',
'dQP5BZroMsMVLO1hbmT5r2Enu86GjxA6': '7XR3oOdbPF6x3PRFLDCq9RkgsRjAo48V',
'M4lKNBO1NFe0PjMCj1tzVXq0SeRVqzA9': 'n2zoRqGLRUv3GbLfBmTwW51LcveWOZYe',
'nAZ7MZdpGCGg1pqFEbsoJOz2C60mv143': 'dYJgdqA9aT4yojETqGi7yNgoFADxqmXP',
'3y1MERYgOuE9NzbFgwhV6Wv2F0YKvbyz': '081xpZDQgC4VadLTavhWQxrku56DAgXV',
'bmQvmEXr5HWklBMCZOcpE2Z3HBYwqGyl': 'zxXPbVNyMiMAZldhr9FkOmA0fl4aKr2v',
'wA7oDNYldfr6050Hwxi52lPZiVlB86Ap': 'ZYK16aA7ni0d3l3c34uwpxD7CbReMm8Q',
'g43MbKMWmFml7o7sJoSRkXxZiXRvJ3QK': 'RX3oBJonvs4Nr6rUWBCGn3matRGqJPXV',
'mA9VdlqpLS0raGaSDvtoqNrBTzb8XY4q': '0XN4OjBD3fnW7r7IbmtJB4AyfOmlrE2r',
'mAajOwgkGt17oGoFmEuklMP9H0GnW54d': 'lXbBLPGyzikNGeGujAuAJGjZiwLRxyXR',
'vy8vjJ9kbUwrRqRu59Cj5dWZfzYErlAb': 'K8l7gpwaGcBpnAnCLNCmPZRdin3eaQX0',
'xQMWBpR8oHEZaWaSMGUb0avOHjLVYn4Y': 'm2MrN4vEaf9jB7BFy5Srb40jTrN67AYl',
'xyKEmVO3miRr6D6UVkt7oB8jtD6aJEAv': 'g2ddDebqDfqdgKgswyUKwGjbTWwzq923',
'7Qk0wa2D9FjKapacoJF27aLvUDKkLGA0': 'b2kgBEkephJaMkMTL7s1PLe4Ua6WyP2P',
'3QLg6nqmNTJ5VvVTo7f508LPidz1xwyY': 'g2L1GgpraipmAOAUqmIbBnPxHOmw4MYa',
'3y1B7zZjXTE9NZNSzZSVNPZaTNLjo6Qz': '081b5G6wzH4VagaURmcWbN5mT4JGEe2V',
'lAqnwvkw6SG6D8DSqmUg6DRLUp0w3G4x': 'O2pbP0xPDFNJjpjIEvcdryOJtpkVM4X5',
'awA7xd1N0Hr6050Hw2c52lPZiVlB864p': 'GZYKpn4aoT0d3l3c3PiwpxD7CbReMmXQ',
'jQVqPLl9YHL1WGWtR1HDgWBGT63qRNyV': '6X03ne6vrU4oWyWUN7tQVoajikxJR3Ye',
'GQRMR8mL7uZK797t7xH3eNzPIP5dOny1': 'm2vqPWGd4U31zWzSyasDRAoMT1PKRp8o',
'zydq9RdmRhXLkNkfNoTJlMzaF0lWekQB': '3X7LnvE7vH5nkEkSqLiey793Un7dLB8e',
'VQrDzwkB2IdBzjzu9MHPbEYkSB50gR4x': 'j2VebLzoKUKQeEesmVh0gM1eIp9jKz8z',
'mAa2wMamBs17oGoFmktklMP9H0GnW54d': 'lXbgP74xZTkNGeGujVUAJGjZiwLRxy8R',
'7yjB6ZLG6sW8R6RF2xcan1KGfJ5dNoyd': 'wXQkPorvPHZ45N5t4Jf6qwg5Tp4xvw29',
'a4zPpNeWGuzg0m0iX3tPeanGSkRKWXQg': 'LY9oa3QAyHdGW9Wu3Ri5JGeEik7l1N8Q',
'k2rneA2M38k25cXDwwSknTJlxPxQLZ6M': '61lyA2aEVDzklfdwmmh31saPxQx2VRjp',
'bK9Zk4OvPnvxduLgxvi8VUeojnjA02eV': 'o5jANYjbeMb4nfBaQvcLAt1jzLzYx6ze',
'5VD6EydM3R9orHmNMGInGCJwbxbQvGRw': 'w3zjmX7g4vnxzCxElvUEOiewkokXprkZ',
'70X35QbVYVYNPUmP9YfbzI06YqYQk2R1': 'vG4Aj2BMjMjoztB7zeFOnCVPJpJ8lMOa',
'26qYwQVG9p1Bks2GgBckjfDJOXOAMgG1': 'r4ev9X0mv5zqJc0yk5IBDcQOwZw8mnwQ',
'rvVKpA56MBXWlSxMw3cobT5pdkd4Dm7q': '1J7ZkY53pZ645c93owcLZuveE7E8B3rL',
'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo': 'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo',
'jdKqRGF16dKsBviMDae7IGDl7oTjEbVV': 'Q09l7vhlNxPFErIOK6BVCe7KnwUW5DVV',
'3QLkogW1OUJ5VvPsrDH56DY2u7lgZWyY': 'g2LRE1V9espmAOPhE4ubj4ZdUA57yDXa',
'wyJvWbXGBSdbkEzhv0CW8meou82aqRy8': 'M2wolPvyBIpQGkbT4juedD4ruzQGdK2y',
'7QkdZrzEkFjKap6IYDU2PB0oCNZORmA0': 'b2kN1l96qhJaMkPs9dt1lpjBfwqZoA8P',
'pvA05113MHG1w3JTYxc6DVlRCjErVz4O': 'gQXeAbblBUnDJ7vujbHvbRd1cxlz3AXO',
'mA9blJDZwT0raG1cvkuoeVjLC7ZWd54q': '0XN9jRPwMHnW7rvumgfJZOD9CJgVkWYr',
'5QwRN5qKJTvGKlDTmnf7xwNZcjRmvEy9': 'R2GP6LWBJU1QlnytwGt0B9pytWwAdDYy',
'eyn5rPPbkfw2KYxH32fG1q58CbLJzM40': 'p2gyqooZnS56JWeiDgfmOy1VugOQEBXn',
'3BABn3b5RfPJGDwilbHe7l82uBoR05Am': '7OYZG7KMVhbPdKJS3xcWEN3AuDlLNmXj',
'xA5zNGXD3HrmqMlF6OS5pdMDuZO7RJ4w': 'yY5DAm6r1IOLE3BCVMFveEMAcqnx3r29',
'g43PgW3JZfml7o6fDEURL1ErCdeD8zyK': 'RX3aQn1zrS4Nr6whDgCGLv9WSbxEKo2V',
'lAqp8WbGgiG6D8LTKJcg3O72CDdre1Qx': 'O2pnm6473HNJjpKuVosd3vVeh975yrX5',
'wyJbYEDxKSdbkJ6S6RhW8meou82aqRy8': 'M2wPm7EgRSpQGlAh70CedD4ruzQGdKYy',
'M4lgW28nLCe0PVdtaXszVXq0SeRVqzA9': 'n2zmJvg4jHv3G0ETNgiwW51LcveWOZ8e',
'5Qw3OVvp9FvGKlDTmOC7xwNZcjRmvEQ9': 'R2GzDdml9F1Qlnytw9s0B9pytWwAdD8y',
'vy8a98X7zCwrRqbHrLUjYzwDiK2b70Qb': 'K8lVwzyjZiBpnAaSGeUmnAgxuGOBxmY0',
'g4eGjJLLoiqRD3Pf9oT5O03LuNbLRDQp': '6XqD59zzpfN4EwQuaGt67qNpSyRBlnYy',
'g43OPp9boIml7o6fDOIRL1ErCdeD8z4K': 'RX33alNB4s4Nr6whDPUGLv9WSbxEKoXV',
'xA2ng9OkBcGKzDbTkKsJlx7dUK8R3dA5': 'z2aPnJvzBfObkwGC3vFaPxeBhxoMqZ8K',
'xyKEgBajZuRr6DEC0Kt7XpD1cnNW9gAv': 'g2ddlEBvRsqdgKaI4jUK9PrgfMexGZ23',
'BAogww51jIMa2JnH1BcYpXM5F658RNAL': 'rYWDmm0KptlkGv4FGJFMdZmjs9RDE6XR',
'BAokpg62VtMa2JnH1mHYpXM5F658RNAL': 'rYWryDnlNslkGv4FG4HMdZmjs9RDE62R',
'a4z1Px5e2hzg0m0iMMCPeanGSkRKWXAg': 'LY9eorNQGUdGW9WuKKf5JGeEik7l1NYQ',
'kAx69R58kF9nY5YcdecJdl2pFXP53WyX': 'gXyRxELpbfPvLeLSaRil0mp6UEzbZJ8L',
'BAoY13nwViMa2J2uo2cY6BlETgmdwryL': 'rYWwKzJmNFlkGvGtNoUM9bzwIJVzB1YR',
} }
_MCP_TO_ACCESS_KEY_TABLE = { _MCP_TO_ACCESS_KEY_TABLE = {
@@ -189,19 +258,17 @@ class AnvatoIE(InfoExtractor):
video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii') video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii')
anvrid = md5_text(time.time() * 1000 * random.random())[:30] anvrid = md5_text(time.time() * 1000 * random.random())[:30]
payload = { api = {
'api': { 'anvrid': anvrid,
'anvrid': anvrid, 'anvts': server_time,
'anvstk': md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY))),
'anvts': server_time,
},
} }
api['anvstk'] = md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY)))
return self._download_json( return self._download_json(
video_data_url, video_id, transform_source=strip_jsonp, video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8')) data=json.dumps({'api': api}).encode('utf-8'))
def _get_anvato_videos(self, access_key, video_id): def _get_anvato_videos(self, access_key, video_id):
video_data = self._get_video_json(access_key, video_id) video_data = self._get_video_json(access_key, video_id)
@@ -259,7 +326,7 @@ class AnvatoIE(InfoExtractor):
'description': video_data.get('def_description'), 'description': video_data.get('def_description'),
'tags': video_data.get('def_tags', '').split(','), 'tags': video_data.get('def_tags', '').split(','),
'categories': video_data.get('categories'), 'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('src_image_url') or video_data.get('thumbnail'),
'timestamp': int_or_none(video_data.get( 'timestamp': int_or_none(video_data.get(
'ts_published') or video_data.get('ts_added')), 'ts_published') or video_data.get('ts_added')),
'uploader': video_data.get('mcp_id'), 'uploader': video_data.get('mcp_id'),

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .yahoo import YahooIE
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
@@ -15,9 +15,9 @@ from ..utils import (
) )
class AolIE(InfoExtractor): class AolIE(YahooIE):
IE_NAME = 'aol.com' IE_NAME = 'aol.com'
_VALID_URL = r'(?:aol-video:|https?://(?:www\.)?aol\.(?:com|ca|co\.uk|de|jp)/video/(?:[^/]+/)*)(?P<id>[0-9a-f]+)' _VALID_URL = r'(?:aol-video:|https?://(?:www\.)?aol\.(?:com|ca|co\.uk|de|jp)/video/(?:[^/]+/)*)(?P<id>\d{9}|[0-9a-f]{24}|[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12})'
_TESTS = [{ _TESTS = [{
# video with 5min ID # video with 5min ID
@@ -76,10 +76,16 @@ class AolIE(InfoExtractor):
}, { }, {
'url': 'https://www.aol.jp/video/playlist/5a28e936a1334d000137da0c/5a28f3151e642219fde19831/', 'url': 'https://www.aol.jp/video/playlist/5a28e936a1334d000137da0c/5a28f3151e642219fde19831/',
'only_matching': True, 'only_matching': True,
}, {
# Yahoo video
'url': 'https://www.aol.com/video/play/991e6700-ac02-11ea-99ff-357400036f61/24bbc846-3e30-3c46-915e-fe8ccd7fcc46/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
if '-' in video_id:
return self._extract_yahoo_video(video_id, 'us')
response = self._download_json( response = self._download_json(
'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id, 'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
get_element_by_id,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
mimetype2ext, mimetype2ext,
@@ -39,23 +40,15 @@ class AparatIE(InfoExtractor):
webpage = self._download_webpage(url, video_id, fatal=False) webpage = self._download_webpage(url, video_id, fatal=False)
if not webpage: if not webpage:
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id, 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id) video_id)
options = self._parse_json( options = self._parse_json(self._search_regex(
self._search_regex( r'options\s*=\s*({.+?})\s*;', webpage, 'options'), video_id)
r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
webpage, 'options', group='value'),
video_id)
player = options['plugins']['sabaPlayerPlugin']
formats = [] formats = []
for sources in player['multiSRC']: for sources in (options.get('multiSRC') or []):
for item in sources: for item in sources:
if not isinstance(item, dict): if not isinstance(item, dict):
continue continue
@@ -85,11 +78,12 @@ class AparatIE(InfoExtractor):
info = self._search_json_ld(webpage, video_id, default={}) info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'): if not info.get('title'):
info['title'] = player['title'] info['title'] = get_element_by_id('videoTitle', webpage) or \
self._html_search_meta(['og:title', 'twitter:title', 'DC.Title', 'title'], webpage, fatal=True)
return merge_dicts(info, { return merge_dicts(info, {
'id': video_id, 'id': video_id,
'thumbnail': url_or_none(options.get('poster')), 'thumbnail': url_or_none(options.get('poster')),
'duration': int_or_none(player.get('duration')), 'duration': int_or_none(options.get('duration')),
'formats': formats, 'formats': formats,
}) })

View File

@@ -0,0 +1,61 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_podcast_url,
int_or_none,
parse_iso8601,
try_get,
)
class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'md5': 'df02e6acb11c10e844946a39e7222b08',
'info_dict': {
'id': '1000482637777',
'ext': 'mp3',
'title': '207 - Whitney Webb Returns',
'description': 'md5:13a73bade02d2e43737751e3987e1399',
'upload_date': '20200705',
'timestamp': 1593921600,
'duration': 6425,
'series': 'The Tim Dillon Show',
}
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/id1135137367?i=1000482637777',
'only_matching': True,
}]
def _real_extract(self, url):
episode_id = self._match_id(url)
webpage = self._download_webpage(url, episode_id)
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
episode = ember_data['data']['attributes']
description = episode.get('description') or {}
series = None
for inc in (ember_data.get('included') or []):
if inc.get('type') == 'media/podcast':
series = try_get(inc, lambda x: x['attributes']['name'])
return {
'id': episode_id,
'title': episode['name'],
'url': clean_podcast_url(episode['assetUrl']),
'description': description.get('standard') or description.get('short'),
'timestamp': parse_iso8601(episode.get('releaseDateTime')),
'duration': int_or_none(episode.get('durationInMilliseconds'), 1000),
'series': series,
}

View File

@@ -0,0 +1,174 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_iso8601,
try_get,
)
class ArcPublishingIE(InfoExtractor):
_UUID_REGEX = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
_VALID_URL = r'arcpublishing:(?P<org>[a-z]+):(?P<id>%s)' % _UUID_REGEX
_TESTS = [{
# https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/
'url': 'arcpublishing:adn:8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'only_matching': True,
}, {
# https://www.bostonglobe.com/video/2020/12/30/metro/footage-released-showing-officer-talking-about-striking-protesters-with-car/
'url': 'arcpublishing:bostonglobe:232b7ae6-7d73-432d-bc0a-85dbf0119ab1',
'only_matching': True,
}, {
# https://www.actionnewsjax.com/video/live-stream/
'url': 'arcpublishing:cmg:cfb1cf1b-3ab5-4d1b-86c5-a5515d311f2a',
'only_matching': True,
}, {
# https://elcomercio.pe/videos/deportes/deporte-total-futbol-peruano-seleccion-peruana-la-valorizacion-de-los-peruanos-en-el-exterior-tras-un-2020-atipico-nnav-vr-video-noticia/
'url': 'arcpublishing:elcomercio:27a7e1f8-2ec7-4177-874f-a4feed2885b3',
'only_matching': True,
}, {
# https://www.clickondetroit.com/video/community/2020/05/15/events-surrounding-woodward-dream-cruise-being-canceled/
'url': 'arcpublishing:gmg:c8793fb2-8d44-4242-881e-2db31da2d9fe',
'only_matching': True,
}, {
# https://www.wabi.tv/video/2020/12/30/trenton-company-making-equipment-pfizer-covid-vaccine/
'url': 'arcpublishing:gray:0b0ba30e-032a-4598-8810-901d70e6033e',
'only_matching': True,
}, {
# https://www.lateja.cr/el-mundo/video-china-aprueba-con-condiciones-su-primera/dfcbfa57-527f-45ff-a69b-35fe71054143/video/
'url': 'arcpublishing:gruponacion:dfcbfa57-527f-45ff-a69b-35fe71054143',
'only_matching': True,
}, {
# https://www.fifthdomain.com/video/2018/03/09/is-america-vulnerable-to-a-cyber-attack/
'url': 'arcpublishing:mco:aa0ca6fe-1127-46d4-b32c-be0d6fdb8055',
'only_matching': True,
}, {
# https://www.vl.no/kultur/2020/12/09/en-melding-fra-en-lytter-endret-julelista-til-lewi-bergrud/
'url': 'arcpublishing:mentormedier:47a12084-650b-4011-bfd0-3699b6947b2d',
'only_matching': True,
}, {
# https://www.14news.com/2020/12/30/whiskey-theft-caught-camera-henderson-liquor-store/
'url': 'arcpublishing:raycom:b89f61f8-79fa-4c09-8255-e64237119bf7',
'only_matching': True,
}, {
# https://www.theglobeandmail.com/world/video-ethiopian-woman-who-became-symbol-of-integration-in-italy-killed-on/
'url': 'arcpublishing:tgam:411b34c1-8701-4036-9831-26964711664b',
'only_matching': True,
}, {
# https://www.pilotonline.com/460f2931-8130-4719-8ea1-ffcb2d7cb685-132.html
'url': 'arcpublishing:tronc:460f2931-8130-4719-8ea1-ffcb2d7cb685',
'only_matching': True,
}]
_POWA_DEFAULTS = [
(['cmg', 'prisa'], '%s-config-prod.api.cdn.arcpublishing.com/video'),
([
'adn', 'advancelocal', 'answers', 'bonnier', 'bostonglobe', 'demo',
'gmg', 'gruponacion', 'infobae', 'mco', 'nzme', 'pmn', 'raycom',
'spectator', 'tbt', 'tgam', 'tronc', 'wapo', 'wweek',
], 'video-api-cdn.%s.arcpublishing.com/api'),
]
@staticmethod
def _extract_urls(webpage):
entries = []
# https://arcpublishing.atlassian.net/wiki/spaces/POWA/overview
for powa_el in re.findall(r'(<div[^>]+class="[^"]*\bpowa\b[^"]*"[^>]+data-uuid="%s"[^>]*>)' % ArcPublishingIE._UUID_REGEX, webpage):
powa = extract_attributes(powa_el) or {}
org = powa.get('data-org')
uuid = powa.get('data-uuid')
if org and uuid:
entries.append('arcpublishing:%s:%s' % (org, uuid))
return entries
def _real_extract(self, url):
org, uuid = re.match(self._VALID_URL, url).groups()
for orgs, tmpl in self._POWA_DEFAULTS:
if org in orgs:
base_api_tmpl = tmpl
break
else:
base_api_tmpl = '%s-prod-cdn.video-api.arcpublishing.com/api'
if org == 'wapo':
org = 'washpost'
video = self._download_json(
'https://%s/v1/ansvideos/findByUuid' % (base_api_tmpl % org),
uuid, query={'uuid': uuid})[0]
title = video['headlines']['basic']
is_live = video.get('status') == 'live'
urls = []
formats = []
for s in video.get('streams', []):
s_url = s.get('url')
if not s_url or s_url in urls:
continue
urls.append(s_url)
stream_type = s.get('stream_type')
if stream_type == 'smil':
smil_formats = self._extract_smil_formats(
s_url, uuid, fatal=False)
for f in smil_formats:
if f['url'].endswith('/cfx/st'):
f['app'] = 'cfx/st'
if not f['play_path'].startswith('mp4:'):
f['play_path'] = 'mp4:' + f['play_path']
if isinstance(f['tbr'], float):
f['vbr'] = f['tbr'] * 1000
del f['tbr']
f['format_id'] = 'rtmp-%d' % f['vbr']
formats.extend(smil_formats)
elif stream_type in ('ts', 'hls'):
m3u8_formats = self._extract_m3u8_formats(
s_url, uuid, 'mp4', 'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False)
if all([f.get('acodec') == 'none' for f in m3u8_formats]):
continue
for f in m3u8_formats:
if f.get('acodec') == 'none':
f['preference'] = -40
elif f.get('vcodec') == 'none':
f['preference'] = -50
height = f.get('height')
if not height:
continue
vbr = self._search_regex(
r'[_x]%d[_-](\d+)' % height, f['url'], 'vbr', default=None)
if vbr:
f['vbr'] = int(vbr)
formats.extend(m3u8_formats)
else:
vbr = int_or_none(s.get('bitrate'))
formats.append({
'format_id': '%s-%d' % (stream_type, vbr) if vbr else stream_type,
'vbr': vbr,
'width': int_or_none(s.get('width')),
'height': int_or_none(s.get('height')),
'filesize': int_or_none(s.get('filesize')),
'url': s_url,
'preference': -1,
})
self._sort_formats(
formats, ('preference', 'width', 'height', 'vbr', 'filesize', 'tbr', 'ext', 'format_id'))
subtitles = {}
for subtitle in (try_get(video, lambda x: x['subtitles']['urls'], list) or []):
subtitle_url = subtitle.get('url')
if subtitle_url:
subtitles.setdefault('en', []).append({'url': subtitle_url})
return {
'id': uuid,
'title': self._live_title(title) if is_live else title,
'thumbnail': try_get(video, lambda x: x['promo_image']['url']),
'description': try_get(video, lambda x: x['subheadlines']['basic']),
'formats': formats,
'duration': int_or_none(video.get('duration'), 100),
'timestamp': parse_iso8601(video.get('created_date')),
'subtitles': subtitles,
'is_live': is_live,
}

View File

@@ -187,13 +187,13 @@ class ARDMediathekIE(ARDMediathekBaseIE):
if doc.tag == 'rss': if doc.tag == 'rss':
return GenericIE()._extract_rss(url, video_id, doc) return GenericIE()._extract_rss(url, video_id, doc)
title = self._html_search_regex( title = self._og_search_title(webpage, default=None) or self._html_search_regex(
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>', [r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms\.title" content="(.*?)"/>', r'<meta name="dcterms\.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>', r'<h4 class="headline">(.*?)</h4>',
r'<title[^>]*>(.*?)</title>'], r'<title[^>]*>(.*?)</title>'],
webpage, 'title') webpage, 'title')
description = self._html_search_meta( description = self._og_search_description(webpage, default=None) or self._html_search_meta(
'dcterms.abstract', webpage, 'description', default=None) 'dcterms.abstract', webpage, 'description', default=None)
if description is None: if description is None:
description = self._html_search_meta( description = self._html_search_meta(
@@ -249,18 +249,18 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html' _VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?:video-?)?(?P<id>[0-9]+))\.html'
_TESTS = [{ _TESTS = [{
# available till 14.02.2019 # available till 7.01.2022
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html', 'url': 'https://www.daserste.de/information/talk/maischberger/videos/maischberger-die-woche-video100.html',
'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49', 'md5': '867d8aa39eeaf6d76407c5ad1bb0d4c1',
'info_dict': { 'info_dict': {
'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video', 'display_id': 'maischberger-die-woche',
'id': '102', 'id': '100',
'ext': 'mp4', 'ext': 'mp4',
'duration': 4435.0, 'duration': 3687.0,
'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?', 'title': 'maischberger. die woche vom 7. Januar 2021',
'upload_date': '20180214', 'upload_date': '20210107',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, { }, {
@@ -315,17 +315,17 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE): class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)' _VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE', 'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f', 'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
'info_dict': { 'info_dict': {
'display_id': 'die-robuste-roswita', 'display_id': 'die-robuste-roswita',
'id': '70153354', 'id': '78566716',
'title': 'Die robuste Roswita', 'title': 'Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.', 'description': r're:^Der Mord.*totgeglaubte Ehefrau Roswita',
'duration': 5316, 'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard', 'thumbnail': 'https://img.ardmediathek.de/standard/00/78/56/67/84/575672121/16x9/960?mandant=ard',
'timestamp': 1577047500, 'timestamp': 1596658200,
'upload_date': '20191222', 'upload_date': '20200805',
'ext': 'mp4', 'ext': 'mp4',
}, },
}, { }, {

View File

@@ -6,13 +6,11 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext,
parse_iso8601, parse_iso8601,
strip_jsonp, try_get,
) )
@@ -20,22 +18,27 @@ class ArkenaIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
video\.arkena\.com/play2/embed/player\?| video\.(?:arkena|qbrick)\.com/play2/embed/player\?|
play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+) play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)
) )
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411', 'url': 'https://video.qbrick.com/play2/embed/player?accountId=1034090&mediaId=d8ab4607-00090107-aab86310',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365', 'md5': '97f117754e5f3c020f5f26da4a44ebaf',
'info_dict': { 'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe', 'id': 'd8ab4607-00090107-aab86310',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Big Buck Bunny', 'title': 'EM_HT20_117_roslund_v2.mp4',
'description': 'Royalty free test video', 'timestamp': 1608285912,
'timestamp': 1432816365, 'upload_date': '20201218',
'upload_date': '20150528', 'duration': 1429.162667,
'is_live': False, 'subtitles': {
'sv': 'count:3',
},
}, },
}, {
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
'only_matching': True,
}, { }, {
'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893', 'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893',
'only_matching': True, 'only_matching': True,
@@ -72,62 +75,89 @@ class ArkenaIE(InfoExtractor):
if not video_id or not account_id: if not video_id or not account_id:
raise ExtractorError('Invalid URL', expected=True) raise ExtractorError('Invalid URL', expected=True)
playlist = self._download_json( media = self._download_json(
'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_' 'https://video.qbrick.com/api/v1/public/accounts/%s/medias/%s' % (account_id, video_id),
% (video_id, account_id), video_id, query={
video_id, transform_source=strip_jsonp)['Playlist'][0] # https://video.qbrick.com/docs/api/examples/library-api.html
'fields': 'asset/resources/*/renditions/*(height,id,language,links/*(href,mimeType),type,size,videos/*(audios/*(codec,sampleRate),bitrate,codec,duration,height,width),width),created,metadata/*(title,description),tags',
})
metadata = media.get('metadata') or {}
title = metadata['title']
media_info = playlist['MediaInfo'] duration = None
title = media_info['Title']
media_files = playlist['MediaFiles']
is_live = False
formats = [] formats = []
for kind_case, kind_formats in media_files.items(): thumbnails = []
kind = kind_case.lower() subtitles = {}
for f in kind_formats: for resource in media['asset']['resources']:
f_url = f.get('Url') for rendition in (resource.get('renditions') or []):
if not f_url: rendition_type = rendition.get('type')
continue for i, link in enumerate(rendition.get('links') or []):
is_live = f.get('Live') == 'true' href = link.get('href')
exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None)) if not href:
if kind == 'm3u8' or 'm3u8' in exts: continue
formats.extend(self._extract_m3u8_formats( if rendition_type == 'image':
f_url, video_id, 'mp4', 'm3u8_native', thumbnails.append({
m3u8_id=kind, fatal=False, live=is_live)) 'filesize': int_or_none(rendition.get('size')),
elif kind == 'flash' or 'f4m' in exts: 'height': int_or_none(rendition.get('height')),
formats.extend(self._extract_f4m_formats( 'id': rendition.get('id'),
f_url, video_id, f4m_id=kind, fatal=False)) 'url': href,
elif kind == 'dash' or 'mpd' in exts: 'width': int_or_none(rendition.get('width')),
formats.extend(self._extract_mpd_formats( })
f_url, video_id, mpd_id=kind, fatal=False)) elif rendition_type == 'subtitle':
elif kind == 'silverlight': subtitles.setdefault(rendition.get('language') or 'en', []).append({
# TODO: process when ism is supported (see 'url': href,
# https://github.com/ytdl-org/youtube-dl/issues/8118) })
continue elif rendition_type == 'video':
else: f = {
tbr = float_or_none(f.get('Bitrate'), 1000) 'filesize': int_or_none(rendition.get('size')),
formats.append({ 'format_id': rendition.get('id'),
'url': f_url, 'url': href,
'format_id': '%s-%d' % (kind, tbr) if tbr else kind, }
'tbr': tbr, video = try_get(rendition, lambda x: x['videos'][i], dict)
}) if video:
if not duration:
duration = float_or_none(video.get('duration'))
f.update({
'height': int_or_none(video.get('height')),
'tbr': int_or_none(video.get('bitrate'), 1000),
'vcodec': video.get('codec'),
'width': int_or_none(video.get('width')),
})
audio = try_get(video, lambda x: x['audios'][0], dict)
if audio:
f.update({
'acodec': audio.get('codec'),
'asr': int_or_none(audio.get('sampleRate')),
})
formats.append(f)
elif rendition_type == 'index':
mime_type = link.get('mimeType')
if mime_type == 'application/smil+xml':
formats.extend(self._extract_smil_formats(
href, video_id, fatal=False))
elif mime_type == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
href, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif mime_type == 'application/hds+xml':
formats.extend(self._extract_f4m_formats(
href, video_id, f4m_id='hds', fatal=False))
elif mime_type == 'application/dash+xml':
formats.extend(self._extract_f4m_formats(
href, video_id, f4m_id='hds', fatal=False))
elif mime_type == 'application/vnd.ms-sstr+xml':
formats.extend(self._extract_ism_formats(
href, video_id, ism_id='mss', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
description = media_info.get('Description')
video_id = media_info.get('VideoId') or video_id
timestamp = parse_iso8601(media_info.get('PublishDate'))
thumbnails = [{
'url': thumbnail['Url'],
'width': int_or_none(thumbnail.get('Size')),
} for thumbnail in (media_info.get('Poster') or []) if thumbnail.get('Url')]
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': metadata.get('description'),
'timestamp': timestamp, 'timestamp': parse_iso8601(media.get('created')),
'is_live': is_live,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'subtitles': subtitles,
'duration': duration,
'tags': media.get('tags'),
'formats': formats, 'formats': formats,
} }

View File

@@ -4,23 +4,57 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
qualities, qualities,
try_get, try_get,
unified_strdate, unified_strdate,
url_or_none,
) )
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTVBaseIE(InfoExtractor): class ArteTVBaseIE(InfoExtractor):
def _extract_from_json_url(self, json_url, video_id, lang, title=None): _ARTE_LANGUAGES = 'fr|de|en|es|it|pl'
info = self._download_json(json_url, video_id) _API_BASE = 'https://api.arte.tv/api/player/v1'
class ArteTVIE(ArteTVBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?arte\.tv/(?P<lang>%(langs)s)/videos|
api\.arte\.tv/api/player/v\d+/config/(?P<lang_2>%(langs)s)
)
/(?P<id>\d{6}-\d{3}-[AF])
''' % {'langs': ArteTVBaseIE._ARTE_LANGUAGES}
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'info_dict': {
'id': '088501-000-A',
'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive',
'upload_date': '20190628',
},
}, {
'url': 'https://www.arte.tv/pl/videos/100103-000-A/usa-dyskryminacja-na-porodowce/',
'only_matching': True,
}, {
'url': 'https://api.arte.tv/api/player/v2/config/de/100605-013-A',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang') or mobj.group('lang_2')
info = self._download_json(
'%s/config/%s/%s' % (self._API_BASE, lang, video_id), video_id)
player_info = info['videoJsonPlayer'] player_info = info['videoJsonPlayer']
vsr = try_get(player_info, lambda x: x['VSR'], dict) vsr = try_get(player_info, lambda x: x['VSR'], dict)
@@ -37,18 +71,11 @@ class ArteTVBaseIE(InfoExtractor):
if not upload_date_str: if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0] upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = (player_info.get('VTI') or title or player_info['VID']).strip() title = (player_info.get('VTI') or player_info['VID']).strip()
subtitle = player_info.get('VSU', '').strip() subtitle = player_info.get('VSU', '').strip()
if subtitle: if subtitle:
title += ' - %s' % subtitle title += ' - %s' % subtitle
info_dict = {
'id': player_info['VID'],
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ']) qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
LANGS = { LANGS = {
@@ -65,6 +92,10 @@ class ArteTVBaseIE(InfoExtractor):
formats = [] formats = []
for format_id, format_dict in vsr.items(): for format_id, format_dict in vsr.items():
f = dict(format_dict) f = dict(format_dict)
format_url = url_or_none(f.get('url'))
streamer = f.get('streamer')
if not format_url and not streamer:
continue
versionCode = f.get('versionCode') versionCode = f.get('versionCode')
l = re.escape(langcode) l = re.escape(langcode)
@@ -107,6 +138,16 @@ class ArteTVBaseIE(InfoExtractor):
else: else:
lang_pref = -1 lang_pref = -1
media_type = f.get('mediaType')
if media_type == 'hls':
m3u8_formats = self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for m3u8_format in m3u8_formats:
m3u8_format['language_preference'] = lang_pref
formats.extend(m3u8_formats)
continue
format = { format = {
'format_id': format_id, 'format_id': format_id,
'preference': -10 if f.get('videoFormat') == 'M3U8' else None, 'preference': -10 if f.get('videoFormat') == 'M3U8' else None,
@@ -118,7 +159,7 @@ class ArteTVBaseIE(InfoExtractor):
'quality': qfunc(f.get('quality')), 'quality': qfunc(f.get('quality')),
} }
if f.get('mediaType') == 'rtmp': if media_type == 'rtmp':
format['url'] = f['streamer'] format['url'] = f['streamer']
format['play_path'] = 'mp4:' + f['url'] format['play_path'] = 'mp4:' + f['url']
format['ext'] = 'flv' format['ext'] = 'flv'
@@ -127,56 +168,50 @@ class ArteTVBaseIE(InfoExtractor):
formats.append(format) formats.append(format)
self._check_formats(formats, video_id)
self._sort_formats(formats) self._sort_formats(formats)
info_dict['formats'] = formats return {
return info_dict 'id': player_info.get('VID') or video_id,
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
'formats': formats,
}
class ArteTVPlus7IE(ArteTVBaseIE): class ArteTVEmbedIE(InfoExtractor):
IE_NAME = 'arte.tv:+7' _VALID_URL = r'https?://(?:www\.)?arte\.tv/player/v\d+/index\.php\?.*?\bjson_url=.+'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
_TESTS = [{ _TESTS = [{
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/', 'url': 'https://www.arte.tv/player/v5/index.php?json_url=https%3A%2F%2Fapi.arte.tv%2Fapi%2Fplayer%2Fv2%2Fconfig%2Fde%2F100605-013-A&lang=de&autoplay=true&mute=0100605-013-A',
'info_dict': { 'info_dict': {
'id': '088501-000-A', 'id': '100605-013-A',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive', 'title': 'United we Stream November Lockdown Edition #13',
'upload_date': '20190628', 'description': 'md5:be40b667f45189632b78c1425c7c2ce1',
'upload_date': '20201116',
}, },
}, {
'url': 'https://www.arte.tv/player/v3/index.php?json_url=https://api.arte.tv/api/player/v2/config/de/100605-013-A',
'only_matching': True,
}] }]
def _real_extract(self, url): @staticmethod
lang, video_id = re.match(self._VALID_URL, url).groups() def _extract_urls(webpage):
return self._extract_from_json_url( return [url for _, url in re.findall(
'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id), r'<(?:iframe|script)[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?arte\.tv/player/v\d+/index\.php\?.*?\bjson_url=.+?)\1',
video_id, lang) webpage)]
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
https://www\.arte\.tv
/player/v3/index\.php\?json_url=
(?P<json_url>
https?://api\.arte\.tv/api/player/v1/config/
(?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
)
'''
_TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
json_url, lang, video_id = re.match(self._VALID_URL, url).groups() qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
return self._extract_from_json_url(json_url, video_id, lang) json_url = qs['json_url'][0]
video_id = ArteTVIE._match_id(json_url)
return self.url_result(
json_url, ie=ArteTVIE.ie_key(), video_id=video_id)
class ArteTVPlaylistIE(ArteTVBaseIE): class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist' _VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>%s)/videos/(?P<id>RC-\d{6})' % ArteTVBaseIE._ARTE_LANGUAGES
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
_TESTS = [{ _TESTS = [{
'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/', 'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
'info_dict': { 'info_dict': {
@@ -185,17 +220,35 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
'description': 'md5:d322c55011514b3a7241f7fb80d494c2', 'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
}, },
'playlist_mincount': 6, 'playlist_mincount': 6,
}, {
'url': 'https://www.arte.tv/pl/videos/RC-014123/arte-reportage/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
lang, playlist_id = re.match(self._VALID_URL, url).groups() lang, playlist_id = re.match(self._VALID_URL, url).groups()
collection = self._download_json( collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos' '%s/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id) % (self._API_BASE, lang, playlist_id), playlist_id)
entries = []
for video in collection['videos']:
if not isinstance(video, dict):
continue
video_url = url_or_none(video.get('url')) or url_or_none(video.get('jsonUrl'))
if not video_url:
continue
video_id = video.get('programId')
entries.append({
'_type': 'url_transparent',
'url': video_url,
'id': video_id,
'title': video.get('title'),
'alt_title': video.get('subtitle'),
'thumbnail': url_or_none(try_get(video, lambda x: x['mainImage']['url'], compat_str)),
'duration': int_or_none(video.get('durationSeconds')),
'view_count': int_or_none(video.get('views')),
'ie_key': ArteTVIE.ie_key(),
})
title = collection.get('title') title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText') description = collection.get('shortDescription') or collection.get('teaserText')
entries = [
self._extract_from_json_url(
video['jsonUrl'], video.get('programId') or playlist_id, lang)
for video in collection['videos'] if video.get('jsonUrl')]
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)

View File

@@ -1,27 +1,91 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import functools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .kaltura import KalturaIE from .kaltura import KalturaIE
from ..utils import extract_attributes from ..utils import (
extract_attributes,
int_or_none,
OnDemandPagedList,
parse_age_limit,
strip_or_none,
try_get,
)
class AsianCrushIE(InfoExtractor): class AsianCrushBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))' _VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|(?:cocoro|retrocrush)\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE _KALTURA_KEYS = [
'video_url', 'progressive_url', 'download_url', 'thumbnail_url',
'widescreen_thumbnail_url', 'screencap_widescreen',
]
_API_SUFFIX = {'retrocrush.tv': '-ott'}
def _call_api(self, host, endpoint, video_id, query, resource):
return self._download_json(
'https://api%s.%s/%s' % (self._API_SUFFIX.get(host, ''), host, endpoint), video_id,
'Downloading %s JSON metadata' % resource, query=query,
headers=self.geo_verification_headers())['objects']
def _download_object_data(self, host, object_id, resource):
return self._call_api(
host, 'search', object_id, {'id': object_id}, resource)[0]
def _get_object_description(self, obj):
return strip_or_none(obj.get('long_description') or obj.get('short_description'))
def _parse_video_data(self, video):
title = video['name']
entry_id, partner_id = [None] * 2
for k in self._KALTURA_KEYS:
k_url = video.get(k)
if k_url:
mobj = re.search(r'/p/(\d+)/.+?/entryId/([^/]+)/', k_url)
if mobj:
partner_id, entry_id = mobj.groups()
break
meta_categories = try_get(video, lambda x: x['meta']['categories'], list) or []
categories = list(filter(None, [c.get('name') for c in meta_categories]))
show_info = video.get('show_info') or {}
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': KalturaIE.ie_key(),
'id': entry_id,
'title': title,
'description': self._get_object_description(video),
'age_limit': parse_age_limit(video.get('mpaa_rating') or video.get('tv_rating')),
'categories': categories,
'series': show_info.get('show_name'),
'season_number': int_or_none(show_info.get('season_num')),
'season_id': show_info.get('season_id'),
'episode_number': int_or_none(show_info.get('episode_num')),
}
class AsianCrushIE(AsianCrushBaseIE):
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{ _TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/', 'url': 'https://www.asiancrush.com/video/004289v/women-who-flirt',
'md5': 'c3b740e48d0ba002a42c0b72857beae6', 'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'info_dict': { 'info_dict': {
'id': '1_y4tmjm5r', 'id': '1_y4tmjm5r',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Women Who Flirt', 'title': 'Women Who Flirt',
'description': 'md5:7e986615808bcfb11756eb503a751487', 'description': 'md5:b65c7e0ae03a85585476a62a186f924c',
'timestamp': 1496936429, 'timestamp': 1496936429,
'upload_date': '20170608', 'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com', 'uploader_id': 'craig@crifkin.com',
'age_limit': 13,
'categories': 'count:5',
'duration': 5812,
}, },
}, { }, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/', 'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
@@ -41,67 +105,35 @@ class AsianCrushIE(InfoExtractor):
}, { }, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/', 'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/video/true-tears/012328v-i...gave-away-my-tears',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) host, video_id = re.match(self._VALID_URL, url).groups()
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) if host == 'cocoro.tv':
webpage = self._download_webpage(url, video_id)
entry_id, partner_id, title = [None] * 3 embed_vars = self._parse_json(self._search_regex(
vars = self._parse_json(
self._search_regex(
r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars', r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
default='{}'), video_id, fatal=False) default='{}'), video_id, fatal=False) or {}
if vars: video_id = embed_vars.get('entry_id') or video_id
entry_id = vars.get('entry_id')
partner_id = vars.get('partner_id')
title = vars.get('vid_label')
if not entry_id: video = self._download_object_data(host, video_id, 'video')
entry_id = self._search_regex( return self._parse_video_data(video)
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
'kaltura id', group='id')
if not partner_id:
partner_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
description = self._html_search_regex(
r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', fatal=False)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
class AsianCrushPlaylistIE(InfoExtractor): class AsianCrushPlaylistIE(AsianCrushBaseIE):
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE _VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{ _TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/', 'url': 'https://www.asiancrush.com/series/006447s/fruity-samurai',
'info_dict': { 'info_dict': {
'id': '12481', 'id': '6447',
'title': 'Scholar Who Walks the Night', 'title': 'Fruity Samurai',
'description': 'md5:7addd7c5132a09fd4741152d96cce886', 'description': 'md5:7535174487e4a202d3872a7fc8f2f154',
}, },
'playlist_count': 20, 'playlist_count': 13,
}, { }, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/', 'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True, 'only_matching': True,
@@ -111,35 +143,58 @@ class AsianCrushPlaylistIE(InfoExtractor):
}, { }, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/', 'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/series/012355s/true-tears',
'only_matching': True,
}] }]
_PAGE_SIZE = 1000000000
def _fetch_page(self, domain, parent_id, page):
videos = self._call_api(
domain, 'getreferencedobjects', parent_id, {
'max': self._PAGE_SIZE,
'object_type': 'video',
'parent_id': parent_id,
'start': page * self._PAGE_SIZE,
}, 'page %d' % (page + 1))
for video in videos:
yield self._parse_video_data(video)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) host, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id) if host == 'cocoro.tv':
webpage = self._download_webpage(url, playlist_id)
entries = [] entries = []
for mobj in re.finditer( for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL, r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage): webpage):
attrs = extract_attributes(mobj.group(0)) attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix': if attrs.get('class') == 'clearfix':
entries.append(self.url_result( entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key())) mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = self._html_search_regex( title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage, r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title( 'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta( webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', 'twitter:title', webpage, 'title',
default=None) or self._search_regex( default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False) r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title: if title:
title = re.sub(r'\s*\|\s*.+?$', '', title) title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description( description = self._og_search_description(
webpage, default=None) or self._html_search_meta( webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False) 'twitter:description', webpage, 'description', fatal=False)
else:
show = self._download_object_data(host, playlist_id, 'show')
title = show.get('name')
description = self._get_object_description(show)
entries = OnDemandPagedList(
functools.partial(self._fetch_page, host, playlist_id),
self._PAGE_SIZE)
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)

View File

@@ -48,6 +48,7 @@ class AWAANBaseIE(InfoExtractor):
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '), 'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live, 'is_live': is_live,
'uploader_id': video_data.get('user_id'),
} }
@@ -107,6 +108,7 @@ class AWAANLiveIE(AWAANBaseIE):
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107', 'upload_date': '20150107',
'timestamp': 1420588800, 'timestamp': 1420588800,
'uploader_id': '71',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import random import random
@@ -5,10 +6,7 @@ import re
import time import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_str,
compat_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
@@ -17,30 +15,32 @@ from ..utils import (
parse_filesize, parse_filesize,
str_or_none, str_or_none,
try_get, try_get,
unescapeHTML,
update_url_query, update_url_query,
unified_strdate, unified_strdate,
unified_timestamp, unified_timestamp,
url_or_none, url_or_none,
urljoin,
) )
class BandcampIE(InfoExtractor): class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<title>[^/?#&]+)' _VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song', 'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
'md5': 'c557841d5e50261777a6585648adf439', 'md5': 'c557841d5e50261777a6585648adf439',
'info_dict': { 'info_dict': {
'id': '1812978515', 'id': '1812978515',
'ext': 'mp3', 'ext': 'mp3',
'title': "youtube-dl \"'/\\\u00e4\u21ad - youtube-dl test song \"'/\\\u00e4\u21ad", 'title': "youtube-dl \"'/\\ä↭ - youtube-dl \"'/\\ä↭ - youtube-dl test song \"'/\\ä↭",
'duration': 9.8485, 'duration': 9.8485,
'uploader': 'youtube-dl "\'/\\ä↭',
'upload_date': '20121129',
'timestamp': 1354224127,
}, },
'_skip': 'There is a limit of 200 free downloads / month for the test song' '_skip': 'There is a limit of 200 free downloads / month for the test song'
}, { }, {
# free download # free download
'url': 'http://benprunty.bandcamp.com/track/lanius-battle', 'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
'md5': '853e35bf34aa1d6fe2615ae612564b36',
'info_dict': { 'info_dict': {
'id': '2650410135', 'id': '2650410135',
'ext': 'aiff', 'ext': 'aiff',
@@ -79,11 +79,16 @@ class BandcampIE(InfoExtractor):
}, },
}] }]
def _extract_data_attr(self, webpage, video_id, attr='tralbum', fatal=True):
return self._parse_json(self._html_search_regex(
r'data-%s=(["\'])({.+?})\1' % attr, webpage,
attr + ' data', group=2), video_id, fatal=fatal)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) title = self._match_id(url)
title = mobj.group('title')
webpage = self._download_webpage(url, title) webpage = self._download_webpage(url, title)
thumbnail = self._html_search_meta('og:image', webpage, default=None) tralbum = self._extract_data_attr(webpage, title)
thumbnail = self._og_search_thumbnail(webpage)
track_id = None track_id = None
track = None track = None
@@ -91,10 +96,7 @@ class BandcampIE(InfoExtractor):
duration = None duration = None
formats = [] formats = []
track_info = self._parse_json( track_info = try_get(tralbum, lambda x: x['trackinfo'][0], dict)
self._search_regex(
r'trackinfo\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
webpage, 'track info', default='{}'), title)
if track_info: if track_info:
file_ = track_info.get('file') file_ = track_info.get('file')
if isinstance(file_, dict): if isinstance(file_, dict):
@@ -111,37 +113,25 @@ class BandcampIE(InfoExtractor):
'abr': int_or_none(abr_str), 'abr': int_or_none(abr_str),
}) })
track = track_info.get('title') track = track_info.get('title')
track_id = str_or_none(track_info.get('track_id') or track_info.get('id')) track_id = str_or_none(
track_info.get('track_id') or track_info.get('id'))
track_number = int_or_none(track_info.get('track_num')) track_number = int_or_none(track_info.get('track_num'))
duration = float_or_none(track_info.get('duration')) duration = float_or_none(track_info.get('duration'))
def extract(key): embed = self._extract_data_attr(webpage, title, 'embed', False)
return self._search_regex( current = tralbum.get('current') or {}
r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key, artist = embed.get('artist') or current.get('artist') or tralbum.get('artist')
webpage, key, default=None, group='value')
artist = extract('artist')
album = extract('album_title')
timestamp = unified_timestamp( timestamp = unified_timestamp(
extract('publish_date') or extract('album_publish_date')) current.get('publish_date') or tralbum.get('album_publish_date'))
release_date = unified_strdate(extract('album_release_date'))
download_link = self._search_regex( download_link = tralbum.get('freeDownloadPage')
r'freeDownloadPage\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'download link', default=None, group='url')
if download_link: if download_link:
track_id = self._search_regex( track_id = compat_str(tralbum['id'])
r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
webpage, 'track id')
download_webpage = self._download_webpage( download_webpage = self._download_webpage(
download_link, track_id, 'Downloading free downloads page') download_link, track_id, 'Downloading free downloads page')
blob = self._parse_json( blob = self._extract_data_attr(download_webpage, track_id, 'blob')
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
'blob', group='blob'),
track_id, transform_source=unescapeHTML)
info = try_get( info = try_get(
blob, (lambda x: x['digital_items'][0], blob, (lambda x: x['digital_items'][0],
@@ -207,20 +197,20 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': artist, 'uploader': artist,
'timestamp': timestamp, 'timestamp': timestamp,
'release_date': release_date, 'release_date': unified_strdate(tralbum.get('album_release_date')),
'duration': duration, 'duration': duration,
'track': track, 'track': track,
'track_number': track_number, 'track_number': track_number,
'track_id': track_id, 'track_id': track_id,
'artist': artist, 'artist': artist,
'album': album, 'album': embed.get('album_title'),
'formats': formats, 'formats': formats,
} }
class BandcampAlbumIE(InfoExtractor): class BandcampAlbumIE(BandcampIE):
IE_NAME = 'Bandcamp:album' IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^/?#&]+))?' _VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<id>[^/?#&]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1', 'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -230,7 +220,10 @@ class BandcampAlbumIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '1353101989', 'id': '1353101989',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Intro', 'title': 'Blazo - Intro',
'timestamp': 1311756226,
'upload_date': '20110727',
'uploader': 'Blazo',
} }
}, },
{ {
@@ -238,7 +231,10 @@ class BandcampAlbumIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '38097443', 'id': '38097443',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Kero One - Keep It Alive (Blazo remix)', 'title': 'Blazo - Kero One - Keep It Alive (Blazo remix)',
'timestamp': 1311757238,
'upload_date': '20110727',
'uploader': 'Blazo',
} }
}, },
], ],
@@ -274,6 +270,7 @@ class BandcampAlbumIE(InfoExtractor):
'title': '"Entropy" EP', 'title': '"Entropy" EP',
'uploader_id': 'jstrecords', 'uploader_id': 'jstrecords',
'id': 'entropy-ep', 'id': 'entropy-ep',
'description': 'md5:0ff22959c943622972596062f2f366a5',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
}, { }, {
@@ -283,6 +280,7 @@ class BandcampAlbumIE(InfoExtractor):
'id': 'we-are-the-plague', 'id': 'we-are-the-plague',
'title': 'WE ARE THE PLAGUE', 'title': 'WE ARE THE PLAGUE',
'uploader_id': 'insulters', 'uploader_id': 'insulters',
'description': 'md5:b3cf845ee41b2b1141dc7bde9237255f',
}, },
'playlist_count': 2, 'playlist_count': 2,
}] }]
@@ -294,41 +292,34 @@ class BandcampAlbumIE(InfoExtractor):
else super(BandcampAlbumIE, cls).suitable(url)) else super(BandcampAlbumIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) uploader_id, album_id = re.match(self._VALID_URL, url).groups()
uploader_id = mobj.group('subdomain')
album_id = mobj.group('album_id')
playlist_id = album_id or uploader_id playlist_id = album_id or uploader_id
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
track_elements = re.findall( tralbum = self._extract_data_attr(webpage, playlist_id)
r'(?s)<div[^>]*>(.*?<a[^>]+href="([^"]+?)"[^>]+itemprop="url"[^>]*>.*?)</div>', webpage) track_info = tralbum.get('trackinfo')
if not track_elements: if not track_info:
raise ExtractorError('The page doesn\'t contain any tracks') raise ExtractorError('The page doesn\'t contain any tracks')
# Only tracks with duration info have songs # Only tracks with duration info have songs
entries = [ entries = [
self.url_result( self.url_result(
compat_urlparse.urljoin(url, t_path), urljoin(url, t['title_link']), BandcampIE.ie_key(),
ie=BandcampIE.ie_key(), str_or_none(t.get('track_id') or t.get('id')), t.get('title'))
video_title=self._search_regex( for t in track_info
r'<span\b[^>]+\bitemprop=["\']name["\'][^>]*>([^<]+)', if t.get('duration')]
elem_content, 'track title', fatal=False))
for elem_content, t_path in track_elements current = tralbum.get('current') or {}
if self._html_search_meta('duration', elem_content, default=None)]
title = self._html_search_regex(
r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
webpage, 'title', fatal=False)
if title:
title = title.replace(r'\"', '"')
return { return {
'_type': 'playlist', '_type': 'playlist',
'uploader_id': uploader_id, 'uploader_id': uploader_id,
'id': playlist_id, 'id': playlist_id,
'title': title, 'title': current.get('title'),
'description': current.get('about'),
'entries': entries, 'entries': entries,
} }
class BandcampWeeklyIE(InfoExtractor): class BandcampWeeklyIE(BandcampIE):
IE_NAME = 'Bandcamp:weekly' IE_NAME = 'Bandcamp:weekly'
_VALID_URL = r'https?://(?:www\.)?bandcamp\.com/?\?(?:.*?&)?show=(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?bandcamp\.com/?\?(?:.*?&)?show=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@@ -343,29 +334,23 @@ class BandcampWeeklyIE(InfoExtractor):
'release_date': '20170404', 'release_date': '20170404',
'series': 'Bandcamp Weekly', 'series': 'Bandcamp Weekly',
'episode': 'Magic Moments', 'episode': 'Magic Moments',
'episode_number': 208,
'episode_id': '224', 'episode_id': '224',
} },
'params': {
'format': 'opus-lo',
},
}, { }, {
'url': 'https://bandcamp.com/?blah/blah@&show=228', 'url': 'https://bandcamp.com/?blah/blah@&show=228',
'only_matching': True 'only_matching': True
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) show_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, show_id)
blob = self._parse_json( blob = self._extract_data_attr(webpage, show_id, 'blob')
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', webpage,
'blob', group='blob'),
video_id, transform_source=unescapeHTML)
show = blob['bcw_show'] show = blob['bcw_data'][show_id]
# This is desired because any invalid show id redirects to `bandcamp.com`
# which happens to expose the latest Bandcamp Weekly episode.
show_id = int_or_none(show.get('show_id')) or int_or_none(video_id)
formats = [] formats = []
for format_id, format_url in show['audio_stream'].items(): for format_id, format_url in show['audio_stream'].items():
@@ -390,20 +375,8 @@ class BandcampWeeklyIE(InfoExtractor):
if subtitle: if subtitle:
title += ' - %s' % subtitle title += ' - %s' % subtitle
episode_number = None
seq = blob.get('bcw_seq')
if seq and isinstance(seq, list):
try:
episode_number = next(
int_or_none(e.get('episode_number'))
for e in seq
if isinstance(e, dict) and int_or_none(e.get('id')) == show_id)
except StopIteration:
pass
return { return {
'id': video_id, 'id': show_id,
'title': title, 'title': title,
'description': show.get('desc') or show.get('short_desc'), 'description': show.get('desc') or show.get('short_desc'),
'duration': float_or_none(show.get('audio_duration')), 'duration': float_or_none(show.get('audio_duration')),
@@ -411,7 +384,6 @@ class BandcampWeeklyIE(InfoExtractor):
'release_date': unified_strdate(show.get('published_date')), 'release_date': unified_strdate(show.get('published_date')),
'series': 'Bandcamp Weekly', 'series': 'Bandcamp Weekly',
'episode': show.get('subtitle'), 'episode': show.get('subtitle'),
'episode_number': episode_number, 'episode_id': show_id,
'episode_id': compat_str(video_id),
'formats': formats 'formats': formats
} }

View File

@@ -49,22 +49,17 @@ class BBCCoUkIE(InfoExtractor):
_LOGIN_URL = 'https://account.bbc.com/signin' _LOGIN_URL = 'https://account.bbc.com/signin'
_NETRC_MACHINE = 'bbc' _NETRC_MACHINE = 'bbc'
_MEDIASELECTOR_URLS = [ _MEDIA_SELECTOR_URL_TEMPL = 'https://open.live.bbc.co.uk/mediaselector/6/select/version/2.0/mediaset/%s/vpid/%s'
_MEDIA_SETS = [
# Provides HQ HLS streams with even better quality that pc mediaset but fails # Provides HQ HLS streams with even better quality that pc mediaset but fails
# with geolocation in some cases when it's even not geo restricted at all (e.g. # with geolocation in some cases when it's even not geo restricted at all (e.g.
# http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable. # http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s', 'iptv-all',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s', 'pc',
] ]
_MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist' _EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
_NAMESPACES = (
_MEDIASELECTION_NS,
_EMP_PLAYLIST_NS,
)
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.bbc.co.uk/programmes/b039g8p7', 'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
@@ -261,8 +256,6 @@ class BBCCoUkIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
def _login(self): def _login(self):
username, password = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
@@ -307,22 +300,14 @@ class BBCCoUkIE(InfoExtractor):
def _extract_items(self, playlist): def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS) return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
def _findall_ns(self, element, xpath):
elements = []
for ns in self._NAMESPACES:
elements.extend(element.findall(xpath % ns))
return elements
def _extract_medias(self, media_selection): def _extract_medias(self, media_selection):
error = media_selection.find('./{%s}error' % self._MEDIASELECTION_NS) error = media_selection.get('result')
if error is None: if error:
media_selection.find('./{%s}error' % self._EMP_PLAYLIST_NS) raise BBCCoUkIE.MediaSelectionError(error)
if error is not None: return media_selection.get('media') or []
raise BBCCoUkIE.MediaSelectionError(error.get('id'))
return self._findall_ns(media_selection, './{%s}media')
def _extract_connections(self, media): def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection') return media.get('connection') or []
def _get_subtitles(self, media, programme_id): def _get_subtitles(self, media, programme_id):
subtitles = {} subtitles = {}
@@ -334,13 +319,13 @@ class BBCCoUkIE(InfoExtractor):
cc_url, programme_id, 'Downloading captions', fatal=False) cc_url, programme_id, 'Downloading captions', fatal=False)
if not isinstance(captions, compat_etree_Element): if not isinstance(captions, compat_etree_Element):
continue continue
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en') subtitles['en'] = [
subtitles[lang] = [
{ {
'url': connection.get('href'), 'url': connection.get('href'),
'ext': 'ttml', 'ext': 'ttml',
}, },
] ]
break
return subtitles return subtitles
def _raise_extractor_error(self, media_selection_error): def _raise_extractor_error(self, media_selection_error):
@@ -350,10 +335,10 @@ class BBCCoUkIE(InfoExtractor):
def _download_media_selector(self, programme_id): def _download_media_selector(self, programme_id):
last_exception = None last_exception = None
for mediaselector_url in self._MEDIASELECTOR_URLS: for media_set in self._MEDIA_SETS:
try: try:
return self._download_media_selector_url( return self._download_media_selector_url(
mediaselector_url % programme_id, programme_id) self._MEDIA_SELECTOR_URL_TEMPL % (media_set, programme_id), programme_id)
except BBCCoUkIE.MediaSelectionError as e: except BBCCoUkIE.MediaSelectionError as e:
if e.id in ('notukerror', 'geolocation', 'selectionunavailable'): if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
last_exception = e last_exception = e
@@ -362,8 +347,8 @@ class BBCCoUkIE(InfoExtractor):
self._raise_extractor_error(last_exception) self._raise_extractor_error(last_exception)
def _download_media_selector_url(self, url, programme_id=None): def _download_media_selector_url(self, url, programme_id=None):
media_selection = self._download_xml( media_selection = self._download_json(
url, programme_id, 'Downloading media selection XML', url, programme_id, 'Downloading media selection JSON',
expected_status=(403, 404)) expected_status=(403, 404))
return self._process_media_selector(media_selection, programme_id) return self._process_media_selector(media_selection, programme_id)
@@ -377,7 +362,6 @@ class BBCCoUkIE(InfoExtractor):
if kind in ('video', 'audio'): if kind in ('video', 'audio'):
bitrate = int_or_none(media.get('bitrate')) bitrate = int_or_none(media.get('bitrate'))
encoding = media.get('encoding') encoding = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width')) width = int_or_none(media.get('width'))
height = int_or_none(media.get('height')) height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size')) file_size = int_or_none(media.get('media_file_size'))
@@ -392,8 +376,6 @@ class BBCCoUkIE(InfoExtractor):
supplier = connection.get('supplier') supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat') transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist # ASX playlist
if supplier == 'asx': if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)): for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
@@ -408,20 +390,11 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native', href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for f in usp_formats:
if f.get('height') and f['height'] > 720:
continue
formats.append(f)
elif transfer_format == 'hds': elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False)) href, programme_id, f4m_id=format_id, fatal=False))
else: else:
if not service and not supplier and bitrate: if not supplier and bitrate:
format_id += '-%d' % bitrate format_id += '-%d' % bitrate
fmt = { fmt = {
'format_id': format_id, 'format_id': format_id,
@@ -554,7 +527,7 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page') webpage = self._download_webpage(url, group_id, 'Downloading video page')
error = self._search_regex( error = self._search_regex(
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<', r'<div\b[^>]+\bclass=["\'](?:smp|playout)__message delta["\'][^>]*>\s*([^<]+?)\s*<',
webpage, 'error', default=None) webpage, 'error', default=None)
if error: if error:
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
@@ -607,16 +580,9 @@ class BBCIE(BBCCoUkIE):
IE_DESC = 'BBC' IE_DESC = 'BBC'
_VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)' _VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
_MEDIASELECTOR_URLS = [ _MEDIA_SETS = [
# Provides HQ HLS streams but fails with geolocation in some cases when it's 'mobile-tablet-main',
# even not geo restricted at all 'pc',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
# Provides more formats, namely direct mp4 links, but fails on some videos with
# notukerror for non UK (?) users (e.g.
# http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
'http://open.live.bbc.co.uk/mediaselector/4/mtis/stream/%s',
# Provides fewer formats, but works everywhere for everybody (hopefully)
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/journalism-pc/vpid/%s',
] ]
_TESTS = [{ _TESTS = [{
@@ -981,7 +947,7 @@ class BBCIE(BBCCoUkIE):
group_id = self._search_regex( group_id = self._search_regex(
r'<div[^>]+\bclass=["\']video["\'][^>]+\bdata-pid=["\'](%s)' % self._ID_REGEX, r'<div[^>]+\bclass=["\']video["\'][^>]+\bdata-pid=["\'](%s)' % self._ID_REGEX,
webpage, 'group id', default=None) webpage, 'group id', default=None)
if playlist_id: if group_id:
return self.url_result( return self.url_result(
'https://www.bbc.co.uk/programmes/%s' % group_id, 'https://www.bbc.co.uk/programmes/%s' % group_id,
ie=BBCCoUkIE.ie_key()) ie=BBCCoUkIE.ie_key())
@@ -1092,10 +1058,26 @@ class BBCIE(BBCCoUkIE):
self._search_regex( self._search_regex(
r'(?s)bbcthreeConfig\s*=\s*({.+?})\s*;\s*<', webpage, r'(?s)bbcthreeConfig\s*=\s*({.+?})\s*;\s*<', webpage,
'bbcthree config', default='{}'), 'bbcthree config', default='{}'),
playlist_id, transform_source=js_to_json, fatal=False) playlist_id, transform_source=js_to_json, fatal=False) or {}
if bbc3_config: payload = bbc3_config.get('payload') or {}
if payload:
clip = payload.get('currentClip') or {}
clip_vpid = clip.get('vpid')
clip_title = clip.get('title')
if clip_vpid and clip_title:
formats, subtitles = self._download_media_selector(clip_vpid)
self._sort_formats(formats)
return {
'id': clip_vpid,
'title': clip_title,
'thumbnail': dict_get(clip, ('poster', 'imageUrl')),
'description': clip.get('description'),
'duration': parse_duration(clip.get('duration')),
'formats': formats,
'subtitles': subtitles,
}
bbc3_playlist = try_get( bbc3_playlist = try_get(
bbc3_config, lambda x: x['payload']['content']['bbcMedia']['playlist'], payload, lambda x: x['content']['bbcMedia']['playlist'],
dict) dict)
if bbc3_playlist: if bbc3_playlist:
playlist_title = bbc3_playlist.get('title') or playlist_title playlist_title = bbc3_playlist.get('title') or playlist_title
@@ -1118,6 +1100,39 @@ class BBCIE(BBCCoUkIE):
return self.playlist_result( return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description) entries, playlist_id, playlist_title, playlist_description)
initial_data = self._parse_json(self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), playlist_id, fatal=False)
if initial_data:
def parse_media(media):
if not media:
return
for item in (try_get(media, lambda x: x['media']['items'], list) or []):
item_id = item.get('id')
item_title = item.get('title')
if not (item_id and item_title):
continue
formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats)
entries.append({
'id': item_id,
'title': item_title,
'thumbnail': item.get('holdingImageUrl'),
'formats': formats,
'subtitles': subtitles,
})
for resp in (initial_data.get('data') or {}).values():
name = resp.get('name')
if name == 'media-experience':
parse_media(try_get(resp, lambda x: x['data']['initialItem']['mediaItem'], dict))
elif name == 'article':
for block in (try_get(resp, lambda x: x['data']['blocks'], list) or []):
if block.get('type') != 'media':
continue
parse_media(block.get('model'))
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
def extract_all(pattern): def extract_all(pattern):
return list(filter(None, map( return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False), lambda s: self._parse_json(s, playlist_id, fatal=False),

View File

@@ -1,194 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
compat_str,
float_or_none,
int_or_none,
parse_iso8601,
try_get,
urljoin,
)
class BeamProBaseIE(InfoExtractor):
_API_BASE = 'https://mixer.com/api/v1'
_RATINGS = {'family': 0, 'teen': 13, '18+': 18}
def _extract_channel_info(self, chan):
user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
return {
'uploader': chan.get('token') or try_get(
chan, lambda x: x['user']['username'], compat_str),
'uploader_id': compat_str(user_id) if user_id else None,
'age_limit': self._RATINGS.get(chan.get('audience')),
}
class BeamProLiveIE(BeamProBaseIE):
IE_NAME = 'Mixer:live'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://mixer.com/niterhayven',
'info_dict': {
'id': '261562',
'ext': 'mp4',
'title': 'Introducing The Witcher 3 // The Grind Starts Now!',
'description': 'md5:0b161ac080f15fe05d18a07adb44a74d',
'thumbnail': r're:https://.*\.jpg$',
'timestamp': 1483477281,
'upload_date': '20170103',
'uploader': 'niterhayven',
'uploader_id': '373396',
'age_limit': 18,
'is_live': True,
'view_count': int,
},
'skip': 'niterhayven is offline',
'params': {
'skip_download': True,
},
}
_MANIFEST_URL_TEMPLATE = '%s/channels/%%s/manifest.%%s' % BeamProBaseIE._API_BASE
@classmethod
def suitable(cls, url):
return False if BeamProVodIE.suitable(url) else super(BeamProLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = self._match_id(url)
chan = self._download_json(
'%s/channels/%s' % (self._API_BASE, channel_name), channel_name)
if chan.get('online') is False:
raise ExtractorError(
'{0} is offline'.format(channel_name), expected=True)
channel_id = chan['id']
def manifest_url(kind):
return self._MANIFEST_URL_TEMPLATE % (channel_id, kind)
formats = self._extract_m3u8_formats(
manifest_url('m3u8'), channel_name, ext='mp4', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_smil_formats(
manifest_url('smil'), channel_name, fatal=False))
self._sort_formats(formats)
info = {
'id': compat_str(chan.get('id') or channel_name),
'title': self._live_title(chan.get('name') or channel_name),
'description': clean_html(chan.get('description')),
'thumbnail': try_get(
chan, lambda x: x['thumbnail']['url'], compat_str),
'timestamp': parse_iso8601(chan.get('updatedAt')),
'is_live': True,
'view_count': int_or_none(chan.get('viewersTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(chan))
return info
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
'info_dict': {
'id': '2259830',
'ext': 'mp4',
'title': 'willow8714\'s Channel',
'duration': 6828.15,
'thumbnail': r're:https://.*source\.png$',
'timestamp': 1494046474,
'upload_date': '20170506',
'uploader': 'willow8714',
'uploader_id': '6085379',
'age_limit': 13,
'view_count': int,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True,
}, {
'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
'only_matching': True,
}]
@staticmethod
def _extract_format(vod, vod_type):
if not vod.get('baseUrl'):
return []
if vod_type == 'hls':
filename, protocol = 'manifest.m3u8', 'm3u8_native'
elif vod_type == 'raw':
filename, protocol = 'source.mp4', 'https'
else:
assert False
data = vod.get('data') if isinstance(vod.get('data'), dict) else {}
format_id = [vod_type]
if isinstance(data.get('Height'), compat_str):
format_id.append('%sp' % data['Height'])
return [{
'url': urljoin(vod['baseUrl'], filename),
'format_id': '-'.join(format_id),
'ext': 'mp4',
'protocol': protocol,
'width': int_or_none(data.get('Width')),
'height': int_or_none(data.get('Height')),
'fps': int_or_none(data.get('Fps')),
'tbr': int_or_none(data.get('Bitrate'), 1000),
}]
def _real_extract(self, url):
vod_id = self._match_id(url)
vod_info = self._download_json(
'%s/recordings/%s' % (self._API_BASE, vod_id), vod_id)
state = vod_info.get('state')
if state != 'AVAILABLE':
raise ExtractorError(
'VOD %s is not available (state: %s)' % (vod_id, state),
expected=True)
formats = []
thumbnail_url = None
for vod in vod_info['vods']:
vod_type = vod.get('format')
if vod_type in ('hls', 'raw'):
formats.extend(self._extract_format(vod, vod_type))
elif vod_type == 'thumbnail':
thumbnail_url = urljoin(vod.get('baseUrl'), 'source.png')
self._sort_formats(formats)
info = {
'id': vod_id,
'title': vod_info.get('name') or vod_id,
'duration': float_or_none(vod_info.get('duration')),
'thumbnail': thumbnail_url,
'timestamp': parse_iso8601(vod_info.get('createdAt')),
'view_count': int_or_none(vod_info.get('viewsTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(vod_info.get('channel') or {}))
return info

View File

@@ -25,8 +25,8 @@ class BellMediaIE(InfoExtractor):
etalk| etalk|
marilyn marilyn
)\.ca| )\.ca|
much\.com (?:much|cp24)\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})''' )/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070', 'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950', 'md5': '36d3ef559cfe8af8efe15922cd3ce950',
@@ -62,6 +62,9 @@ class BellMediaIE(InfoExtractor):
}, { }, {
'url': 'http://www.etalk.ca/video?videoid=663455', 'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.cp24.com/video?clipId=1982548',
'only_matching': True,
}] }]
_DOMAINS = { _DOMAINS = {
'thecomedynetwork': 'comedy', 'thecomedynetwork': 'comedy',

View File

@@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import extract_attributes
class BFMTVBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/'
_VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html'
_VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _brightcove_url_result(self, video_id, video_block):
account_id = video_block.get('accountid') or '876450612001'
player_id = video_block.get('playerid') or 'I2qBTln4u'
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id),
'BrightcoveNew', video_id)
class BFMTVIE(BFMTVBaseIE):
IE_NAME = 'bfmtv'
_VALID_URL = BFMTVBaseIE._VALID_URL_TMPL % 'V'
_TESTS = [{
'url': 'https://www.bfmtv.com/politique/emmanuel-macron-l-islam-est-une-religion-qui-vit-une-crise-aujourd-hui-partout-dans-le-monde_VN-202010020146.html',
'info_dict': {
'id': '6196747868001',
'ext': 'mp4',
'title': 'Emmanuel Macron: "L\'Islam est une religion qui vit une crise aujourdhui, partout dans le monde"',
'description': 'Le Président s\'exprime sur la question du séparatisme depuis les Mureaux, dans les Yvelines.',
'uploader_id': '876450610001',
'upload_date': '20201002',
'timestamp': 1601629620,
},
}]
def _real_extract(self, url):
bfmtv_id = self._match_id(url)
webpage = self._download_webpage(url, bfmtv_id)
video_block = extract_attributes(self._search_regex(
self._VIDEO_BLOCK_REGEX, webpage, 'video block'))
return self._brightcove_url_result(video_block['videoid'], video_block)
class BFMTVLiveIE(BFMTVIE):
IE_NAME = 'bfmtv:live'
_VALID_URL = BFMTVBaseIE._VALID_URL_BASE + '(?P<id>(?:[^/]+/)?en-direct)'
_TESTS = [{
'url': 'https://www.bfmtv.com/en-direct/',
'info_dict': {
'id': '5615950982001',
'ext': 'mp4',
'title': r're:^le direct BFMTV WEB \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'uploader_id': '876450610001',
'upload_date': '20171018',
'timestamp': 1508329950,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.bfmtv.com/economie/en-direct/',
'only_matching': True,
}]
class BFMTVArticleIE(BFMTVBaseIE):
IE_NAME = 'bfmtv:article'
_VALID_URL = BFMTVBaseIE._VALID_URL_TMPL % 'A'
_TESTS = [{
'url': 'https://www.bfmtv.com/sante/covid-19-un-responsable-de-l-institut-pasteur-se-demande-quand-la-france-va-se-reconfiner_AV-202101060198.html',
'info_dict': {
'id': '202101060198',
'title': 'Covid-19: un responsable de l\'Institut Pasteur se demande "quand la France va se reconfiner"',
'description': 'md5:947974089c303d3ac6196670ae262843',
},
'playlist_count': 2,
}, {
'url': 'https://www.bfmtv.com/international/pour-bolsonaro-le-bresil-est-en-faillite-mais-il-ne-peut-rien-faire_AD-202101060232.html',
'only_matching': True,
}, {
'url': 'https://www.bfmtv.com/sante/covid-19-oui-le-vaccin-de-pfizer-distribue-en-france-a-bien-ete-teste-sur-des-personnes-agees_AN-202101060275.html',
'only_matching': True,
}]
def _real_extract(self, url):
bfmtv_id = self._match_id(url)
webpage = self._download_webpage(url, bfmtv_id)
entries = []
for video_block_el in re.findall(self._VIDEO_BLOCK_REGEX, webpage):
video_block = extract_attributes(video_block_el)
video_id = video_block.get('videoid')
if not video_id:
continue
entries.append(self._brightcove_url_result(video_id, video_block))
return self.playlist_result(
entries, bfmtv_id, self._og_search_title(webpage, fatal=False),
self._html_search_meta(['og:description', 'description'], webpage))

View File

@@ -0,0 +1,30 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class BibelTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bibeltv\.de/mediathek/videos/(?:crn/)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.bibeltv.de/mediathek/videos/329703-sprachkurs-in-malaiisch',
'md5': '252f908192d611de038b8504b08bf97f',
'info_dict': {
'id': 'ref:329703',
'ext': 'mp4',
'title': 'Sprachkurs in Malaiisch',
'description': 'md5:3e9f197d29ee164714e67351cf737dfe',
'timestamp': 1608316701,
'uploader_id': '5840105145001',
'upload_date': '20201218',
}
}, {
'url': 'https://www.bibeltv.de/mediathek/videos/crn/326374',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5840105145001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url):
crn_id = self._match_id(url)
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % crn_id, 'BrightcoveNew')

View File

@@ -3,10 +3,11 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .vk import VKIE from .vk import VKIE
from ..utils import ( from ..compat import (
HEADRequest, compat_b64decode,
int_or_none, compat_urllib_parse_unquote,
) )
from ..utils import int_or_none
class BIQLEIE(InfoExtractor): class BIQLEIE(InfoExtractor):
@@ -47,9 +48,16 @@ class BIQLEIE(InfoExtractor):
if VKIE.suitable(embed_url): if VKIE.suitable(embed_url):
return self.url_result(embed_url, VKIE.ie_key(), video_id) return self.url_result(embed_url, VKIE.ie_key(), video_id)
self._request_webpage( embed_page = self._download_webpage(
HEADRequest(embed_url), video_id, headers={'Referer': url}) embed_url, video_id, headers={'Referer': url})
video_id, sig, _, access_token = self._get_cookies(embed_url)['video_ext'].value.split('%3A') video_ext = self._get_cookies(embed_url).get('video_ext')
if video_ext:
video_ext = compat_urllib_parse_unquote(video_ext.value)
if not video_ext:
video_ext = compat_b64decode(self._search_regex(
r'video_ext\s*:\s*[\'"]([A-Za-z0-9+/=]+)',
embed_page, 'video_ext')).decode()
video_id, sig, _, access_token = video_ext.split(':')
item = self._download_json( item = self._download_json(
'https://api.vk.com/method/video.get', video_id, 'https://api.vk.com/method/video.get', video_id,
headers={'User-Agent': 'okhttp/3.4.1'}, query={ headers={'User-Agent': 'okhttp/3.4.1'}, query={

View File

@@ -90,13 +90,19 @@ class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})' _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{ _TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms', 'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1', 'md5': '670b2d73f48549da032861130488c681',
'info_dict': { 'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1', 'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'ext': 'flv', 'ext': 'mp4',
'title': 'Cena vs. Rollins Would Expose the Heavyweight Division', 'title': 'Cena vs. Rollins Would Expose the Heavyweight Division',
'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e', 'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e',
'upload_date': '20150723',
'timestamp': 1437679032,
}, },
'expected_warnings': [
'Unable to download f4m manifest'
]
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -0,0 +1,60 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
try_get,
urlencode_postdata,
)
class BongaCamsIE(InfoExtractor):
_VALID_URL = r'https?://(?P<host>(?:[^/]+\.)?bongacams\d*\.com)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://de.bongacams.com/azumi-8',
'only_matching': True,
}, {
'url': 'https://cn.bongacams.com/azumi-8',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
channel_id = mobj.group('id')
amf = self._download_json(
'https://%s/tools/amf.php' % host, channel_id,
data=urlencode_postdata((
('method', 'getRoomData'),
('args[]', channel_id),
('args[]', 'false'),
)), headers={'X-Requested-With': 'XMLHttpRequest'})
server_url = amf['localData']['videoServerUrl']
uploader_id = try_get(
amf, lambda x: x['performerData']['username'], compat_str) or channel_id
uploader = try_get(
amf, lambda x: x['performerData']['displayName'], compat_str)
like_count = int_or_none(try_get(
amf, lambda x: x['performerData']['loversCount']))
formats = self._extract_m3u8_formats(
'%s/hls/stream_%s/playlist.m3u8' % (server_url, uploader_id),
channel_id, 'mp4', m3u8_id='hls', live=True)
self._sort_formats(formats)
return {
'id': channel_id,
'title': self._live_title(uploader or uploader_id),
'uploader': uploader,
'uploader_id': uploader_id,
'like_count': like_count,
'age_limit': 18,
'is_live': True,
'formats': formats,
}

View File

@@ -0,0 +1,98 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_iso8601,
# try_get,
update_url_query,
)
class BoxIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?app\.box\.com/s/(?P<shared_name>[^/]+)/file/(?P<id>\d+)'
_TEST = {
'url': 'https://mlssoccer.app.box.com/s/0evd2o3e08l60lr4ygukepvnkord1o1x/file/510727257538',
'md5': '1f81b2fd3960f38a40a3b8823e5fcd43',
'info_dict': {
'id': '510727257538',
'ext': 'mp4',
'title': 'Garber St. Louis will be 28th MLS team +scarving.mp4',
'uploader': 'MLS Video',
'timestamp': 1566320259,
'upload_date': '20190820',
'uploader_id': '235196876',
}
}
def _real_extract(self, url):
shared_name, file_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, file_id)
request_token = self._parse_json(self._search_regex(
r'Box\.config\s*=\s*({.+?});', webpage,
'Box config'), file_id)['requestToken']
access_token = self._download_json(
'https://app.box.com/app-api/enduserapp/elements/tokens', file_id,
'Downloading token JSON metadata',
data=json.dumps({'fileIDs': [file_id]}).encode(), headers={
'Content-Type': 'application/json',
'X-Request-Token': request_token,
'X-Box-EndUser-API': 'sharedName=' + shared_name,
})[file_id]['read']
shared_link = 'https://app.box.com/s/' + shared_name
f = self._download_json(
'https://api.box.com/2.0/files/' + file_id, file_id,
'Downloading file JSON metadata', headers={
'Authorization': 'Bearer ' + access_token,
'BoxApi': 'shared_link=' + shared_link,
'X-Rep-Hints': '[dash]', # TODO: extract `hls` formats
}, query={
'fields': 'authenticated_download_url,created_at,created_by,description,extension,is_download_available,name,representations,size'
})
title = f['name']
query = {
'access_token': access_token,
'shared_link': shared_link
}
formats = []
# for entry in (try_get(f, lambda x: x['representations']['entries'], list) or []):
# entry_url_template = try_get(
# entry, lambda x: x['content']['url_template'])
# if not entry_url_template:
# continue
# representation = entry.get('representation')
# if representation == 'dash':
# TODO: append query to every fragment URL
# formats.extend(self._extract_mpd_formats(
# entry_url_template.replace('{+asset_path}', 'manifest.mpd'),
# file_id, query=query))
authenticated_download_url = f.get('authenticated_download_url')
if authenticated_download_url and f.get('is_download_available'):
formats.append({
'ext': f.get('extension') or determine_ext(title),
'filesize': f.get('size'),
'format_id': 'download',
'url': update_url_query(authenticated_download_url, query),
})
self._sort_formats(formats)
creator = f.get('created_by') or {}
return {
'id': file_id,
'title': title,
'formats': formats,
'description': f.get('description') or None,
'uploader': creator.get('name'),
'timestamp': parse_iso8601(f.get('created_at')),
'uploader_id': creator.get('id'),
}

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BravoTVIE(AdobePassIE): class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?P<req_id>bravotv|oxygen)\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is', 'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'md5': 'e34684cfea2a96cd2ee1ef3a60909de9', 'md5': 'e34684cfea2a96cd2ee1ef3a60909de9',
@@ -28,10 +28,13 @@ class BravoTVIE(AdobePassIE):
}, { }, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1', 'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
settings = self._parse_json(self._search_regex( settings = self._parse_json(self._search_regex(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>({.+?})</script>', webpage, 'drupal settings'), r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>({.+?})</script>', webpage, 'drupal settings'),
@@ -53,11 +56,14 @@ class BravoTVIE(AdobePassIE):
tp_path = release_pid = tve['release_pid'] tp_path = release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth': if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('tve_adobe_auth', {}) adobe_pass = settings.get('tve_adobe_auth', {})
if site == 'bravotv':
site = 'bravo'
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'), adobe_pass.get('adobePassResourceId') or site,
tve['title'], release_pid, tve.get('rating')) tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource) url, release_pid,
adobe_pass.get('adobePassRequestorId') or site, resource)
else: else:
shared_playlist = settings['ls_playlist'] shared_playlist = settings['ls_playlist']
account_pid = shared_playlist['account_pid'] account_pid = shared_playlist['account_pid']

View File

@@ -28,6 +28,7 @@ from ..utils import (
parse_iso8601, parse_iso8601,
smuggle_url, smuggle_url,
str_or_none, str_or_none,
try_get,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
UnsupportedError, UnsupportedError,
@@ -147,7 +148,7 @@ class BrightcoveLegacyIE(InfoExtractor):
] ]
@classmethod @classmethod
def _build_brighcove_url(cls, object_str): def _build_brightcove_url(cls, object_str):
""" """
Build a Brightcove url from a xml string containing Build a Brightcove url from a xml string containing
<object class="BrightcoveExperience">{params}</object> <object class="BrightcoveExperience">{params}</object>
@@ -217,7 +218,7 @@ class BrightcoveLegacyIE(InfoExtractor):
return cls._make_brightcove_url(params) return cls._make_brightcove_url(params)
@classmethod @classmethod
def _build_brighcove_url_from_js(cls, object_js): def _build_brightcove_url_from_js(cls, object_js):
# The layout of JS is as follows: # The layout of JS is as follows:
# customBC.createVideo = function (width, height, playerID, playerKey, videoPlayer, VideoRandomID) { # customBC.createVideo = function (width, height, playerID, playerKey, videoPlayer, VideoRandomID) {
# // build Brightcove <object /> XML # // build Brightcove <object /> XML
@@ -272,12 +273,12 @@ class BrightcoveLegacyIE(InfoExtractor):
).+?>\s*</object>''', ).+?>\s*</object>''',
webpage) webpage)
if matches: if matches:
return list(filter(None, [cls._build_brighcove_url(m) for m in matches])) return list(filter(None, [cls._build_brightcove_url(m) for m in matches]))
matches = re.findall(r'(customBC\.createVideo\(.+?\);)', webpage) matches = re.findall(r'(customBC\.createVideo\(.+?\);)', webpage)
if matches: if matches:
return list(filter(None, [ return list(filter(None, [
cls._build_brighcove_url_from_js(custom_bc) cls._build_brightcove_url_from_js(custom_bc)
for custom_bc in matches])) for custom_bc in matches]))
return [src for _, src in re.findall( return [src for _, src in re.findall(
r'<iframe[^>]+src=([\'"])((?:https?:)?//link\.brightcove\.com/services/player/(?!\1).+)\1', webpage)] r'<iframe[^>]+src=([\'"])((?:https?:)?//link\.brightcove\.com/services/player/(?!\1).+)\1', webpage)]
@@ -426,7 +427,7 @@ class BrightcoveNewIE(AdobePassIE):
# [2] looks like: # [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall( for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx) r'''(?isx)
(<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>) (<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(?:.*? (?:.*?
(<script[^>]+ (<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/ src=["\'](?:https?:)?//players\.brightcove\.net/
@@ -470,13 +471,18 @@ class BrightcoveNewIE(AdobePassIE):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}): def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip() title = json_data['name'].strip()
num_drm_sources = 0
formats = [] formats = []
for source in json_data.get('sources', []): sources = json_data.get('sources') or []
for source in sources:
container = source.get('container') container = source.get('container')
ext = mimetype2ext(source.get('type')) ext = mimetype2ext(source.get('type'))
src = source.get('src') src = source.get('src')
# https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object # https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if ext == 'ism' or container == 'WVM' or source.get('key_systems'): if container == 'WVM' or source.get('key_systems'):
num_drm_sources += 1
continue
elif ext == 'ism':
continue continue
elif ext == 'm3u8' or container == 'M2TS': elif ext == 'm3u8' or container == 'M2TS':
if not src: if not src:
@@ -533,20 +539,15 @@ class BrightcoveNewIE(AdobePassIE):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
if not formats:
# for sonyliv.com DRM protected videos
s3_source_url = json_data.get('custom_fields', {}).get('s3sourceurl')
if s3_source_url:
formats.append({
'url': s3_source_url,
'format_id': 'source',
})
errors = json_data.get('errors') if not formats:
if not formats and errors: errors = json_data.get('errors')
error = errors[0] if errors:
raise ExtractorError( error = errors[0]
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True) raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
if sources and num_drm_sources == len(sources):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats) self._sort_formats(formats)
@@ -600,24 +601,27 @@ class BrightcoveNewIE(AdobePassIE):
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x) store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
def extract_policy_key(): def extract_policy_key():
webpage = self._download_webpage( base_url = 'http://players.brightcove.net/%s/%s_%s/' % (account_id, player_id, embed)
'http://players.brightcove.net/%s/%s_%s/index.min.js' config = self._download_json(
% (account_id, player_id, embed), video_id) base_url + 'config.json', video_id, fatal=False) or {}
policy_key = try_get(
policy_key = None config, lambda x: x['video_cloud']['policy_key'])
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key: if not policy_key:
policy_key = self._search_regex( webpage = self._download_webpage(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1', base_url + 'index.min.js', video_id)
webpage, 'policy key', group='pk')
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
store_pk(policy_key) store_pk(policy_key)
return policy_key return policy_key

View File

@@ -7,12 +7,12 @@ from .common import InfoExtractor
from .gigya import GigyaBaseIE from .gigya import GigyaBaseIE
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
extract_attributes,
ExtractorError, ExtractorError,
strip_or_none, strip_or_none,
float_or_none, float_or_none,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
parse_iso8601,
str_or_none, str_or_none,
url_or_none, url_or_none,
) )
@@ -37,6 +37,7 @@ class CanvasIE(InfoExtractor):
'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e', 'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'only_matching': True, 'only_matching': True,
}] }]
_GEO_BYPASS = False
_HLS_ENTRY_PROTOCOLS_MAP = { _HLS_ENTRY_PROTOCOLS_MAP = {
'HLS': 'm3u8_native', 'HLS': 'm3u8_native',
'HLS_AES': 'm3u8', 'HLS_AES': 'm3u8',
@@ -47,29 +48,34 @@ class CanvasIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id') site_id, video_id = mobj.group('site_id'), mobj.group('id')
# Old API endpoint, serves more formats but may fail for some videos data = None
data = self._download_json( if site_id != 'vrtvideo':
'https://mediazone.vrt.be/api/v1/%s/assets/%s' # Old API endpoint, serves more formats but may fail for some videos
% (site_id, video_id), video_id, 'Downloading asset JSON', data = self._download_json(
'Unable to download asset JSON', fatal=False) 'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
# New API endpoint # New API endpoint
if not data: if not data:
headers = self.geo_verification_headers()
headers.update({'Content-Type': 'application/json'})
token = self._download_json( token = self._download_json(
'%s/tokens' % self._REST_API_BASE, video_id, '%s/tokens' % self._REST_API_BASE, video_id,
'Downloading token', data=b'', 'Downloading token', data=b'', headers=headers)['vrtPlayerToken']
headers={'Content-Type': 'application/json'})['vrtPlayerToken']
data = self._download_json( data = self._download_json(
'%s/videos/%s' % (self._REST_API_BASE, video_id), '%s/videos/%s' % (self._REST_API_BASE, video_id),
video_id, 'Downloading video JSON', fatal=False, query={ video_id, 'Downloading video JSON', query={
'vrtPlayerToken': token, 'vrtPlayerToken': token,
'client': '%s@PROD' % site_id, 'client': '%s@PROD' % site_id,
}, expected_status=400) }, expected_status=400)
message = data.get('message') if not data.get('title'):
if message and not data.get('title'): code = data.get('code')
if data.get('code') == 'AUTHENTICATION_REQUIRED': if code == 'AUTHENTICATION_REQUIRED':
self.raise_login_required(message) self.raise_login_required()
raise ExtractorError(message, expected=True) elif code == 'INVALID_LOCATION':
self.raise_geo_restricted(countries=['BE'])
raise ExtractorError(data.get('message') or code, expected=True)
title = data['title'] title = data['title']
description = data.get('description') description = data.get('description')
@@ -205,20 +211,24 @@ class CanvasEenIE(InfoExtractor):
class VrtNUIE(GigyaBaseIE): class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be' IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?vrt\.be/vrtnu/a-z/(?:[^/]+/){2}(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# Available via old API endpoint # Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/', 'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1989/postbus-x-s1989a1/',
'info_dict': { 'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de', 'id': 'pbs-pub-e8713dac-899e-41de-9313-81269f4c04ac$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'mp4', 'ext': 'mp4',
'title': 'De zwarte weduwe', 'title': 'Postbus X - Aflevering 1 (Seizoen 1989)',
'description': 'md5:db1227b0f318c849ba5eab1fef895ee4', 'description': 'md5:b704f669eb9262da4c55b33d7c6ed4b7',
'duration': 1457.04, 'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'season': 'Season 1', 'series': 'Postbus X',
'season_number': 1, 'season': 'Seizoen 1989',
'season_number': 1989,
'episode': 'De zwarte weduwe',
'episode_number': 1, 'episode_number': 1,
'timestamp': 1595822400,
'upload_date': '20200727',
}, },
'skip': 'This video is only available for registered users', 'skip': 'This video is only available for registered users',
'params': { 'params': {
@@ -300,69 +310,25 @@ class VrtNUIE(GigyaBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, display_id) webpage = self._download_webpage(url, display_id)
attrs = extract_attributes(self._search_regex(
r'(<nui-media[^>]+>)', webpage, 'media element'))
video_id = attrs['videoid']
publication_id = attrs.get('publicationid')
if publication_id:
video_id = publication_id + '$' + video_id
page = (self._parse_json(self._search_regex(
r'digitalData\s*=\s*({.+?});', webpage, 'digial data',
default='{}'), video_id, fatal=False) or {}).get('page') or {}
info = self._search_json_ld(webpage, display_id, default={}) info = self._search_json_ld(webpage, display_id, default={})
# title is optional here since it may be extracted by extractor
# that is delegated from here
title = strip_or_none(self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>',
webpage, 'title', default=None))
description = self._html_search_regex(
r'(?ms)<div class="content__description">(.+?)</div>',
webpage, 'description', default=None)
season = self._html_search_regex(
[r'''(?xms)<div\ class="tabs__tab\ tabs__tab--active">\s*
<span>seizoen\ (.+?)</span>\s*
</div>''',
r'<option value="seizoen (\d{1,3})" data-href="[^"]+?" selected>'],
webpage, 'season', default=None)
season_number = int_or_none(season)
episode_number = int_or_none(self._html_search_regex(
r'''(?xms)<div\ class="content__episode">\s*
<abbr\ title="aflevering">afl</abbr>\s*<span>(\d+)</span>
</div>''',
webpage, 'episode_number', default=None))
release_date = parse_iso8601(self._html_search_regex(
r'(?ms)<div class="content__broadcastdate">\s*<time\ datetime="(.+?)"',
webpage, 'release_date', default=None))
# If there's a ? or a # in the URL, remove them and everything after
clean_url = urlh.geturl().split('?')[0].split('#')[0].strip('/')
securevideo_url = clean_url + '.mssecurevideo.json'
try:
video = self._download_json(securevideo_url, display_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
self.raise_login_required()
raise
# We are dealing with a '../<show>.relevant' URL
redirect_url = video.get('url')
if redirect_url:
return self.url_result(self._proto_relative_url(redirect_url, 'https:'))
# There is only one entry, but with an unknown key, so just get
# the first one
video_id = list(video.values())[0].get('videoid')
return merge_dicts(info, { return merge_dicts(info, {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id, 'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(), 'ie_key': CanvasIE.ie_key(),
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'season_number': int_or_none(page.get('episode_season')),
'description': description,
'season': season,
'season_number': season_number,
'episode_number': episode_number,
'release_date': release_date,
}) })

View File

@@ -11,7 +11,47 @@ from ..utils import (
class CBSLocalIE(AnvatoIE): class CBSLocalIE(AnvatoIE):
_VALID_URL = r'https?://[a-z]+\.cbslocal\.com/(?:\d+/\d+/\d+|video)/(?P<id>[0-9a-z-]+)' _VALID_URL_BASE = r'https?://[a-z]+\.cbslocal\.com/'
_VALID_URL = _VALID_URL_BASE + r'video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
mcp_id = self._match_id(url)
return self.url_result(
'anvato:anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67:' + mcp_id, 'Anvato', mcp_id)
class CBSLocalArticleIE(AnvatoIE):
_VALID_URL = CBSLocalIE._VALID_URL_BASE + r'\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
_TESTS = [{ _TESTS = [{
# Anvato backend # Anvato backend
@@ -52,31 +92,6 @@ class CBSLocalIE(AnvatoIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -8,8 +9,8 @@ from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601,
parse_resolution, parse_resolution,
try_get,
url_or_none, url_or_none,
) )
@@ -24,8 +25,9 @@ class CCMAIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'L\'espot de La Marató de TV3', 'title': 'L\'espot de La Marató de TV3',
'description': 'md5:f12987f320e2f6e988e9908e4fe97765', 'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
'timestamp': 1470918540, 'timestamp': 1478608140,
'upload_date': '20160811', 'upload_date': '20161108',
'age_limit': 0,
} }
}, { }, {
'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/', 'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/',
@@ -35,8 +37,24 @@ class CCMAIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'title': 'El Consell de Savis analitza el derbi', 'title': 'El Consell de Savis analitza el derbi',
'description': 'md5:e2a3648145f3241cb9c6b4b624033e53', 'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
'upload_date': '20171205', 'upload_date': '20170512',
'timestamp': 1512507300, 'timestamp': 1494622500,
'vcodec': 'none',
'categories': ['Esports'],
}
}, {
'url': 'http://www.ccma.cat/tv3/alacarta/crims/crims-josep-tallada-lespereu-me-capitol-1/video/6031387/',
'md5': 'b43c3d3486f430f3032b5b160d80cbc3',
'info_dict': {
'id': '6031387',
'ext': 'mp4',
'title': 'Crims - Josep Talleda, l\'"Espereu-me" (capítol 1)',
'description': 'md5:7cbdafb640da9d0d2c0f62bad1e74e60',
'timestamp': 1582577700,
'upload_date': '20200224',
'subtitles': 'mincount:4',
'age_limit': 16,
'series': 'Crims',
} }
}] }]
@@ -72,17 +90,27 @@ class CCMAIE(InfoExtractor):
informacio = media['informacio'] informacio = media['informacio']
title = informacio['titol'] title = informacio['titol']
durada = informacio.get('durada', {}) durada = informacio.get('durada') or {}
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text')) duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc')) tematica = try_get(informacio, lambda x: x['tematica']['text'])
timestamp = None
data_utc = try_get(informacio, lambda x: x['data_emissio']['utc'])
try:
timestamp = datetime.datetime.strptime(
data_utc, '%Y-%d-%mT%H:%M:%S%z').timestamp()
except TypeError:
pass
subtitles = {} subtitles = {}
subtitols = media.get('subtitols', {}) subtitols = media.get('subtitols') or []
if subtitols: if isinstance(subtitols, dict):
sub_url = subtitols.get('url') subtitols = [subtitols]
for st in subtitols:
sub_url = st.get('url')
if sub_url: if sub_url:
subtitles.setdefault( subtitles.setdefault(
subtitols.get('iso') or subtitols.get('text') or 'ca', []).append({ st.get('iso') or st.get('text') or 'ca', []).append({
'url': sub_url, 'url': sub_url,
}) })
@@ -97,6 +125,16 @@ class CCMAIE(InfoExtractor):
'height': int_or_none(imatges.get('alcada')), 'height': int_or_none(imatges.get('alcada')),
}] }]
age_limit = None
codi_etic = try_get(informacio, lambda x: x['codi_etic']['id'])
if codi_etic:
codi_etic_s = codi_etic.split('_')
if len(codi_etic_s) == 2:
if codi_etic_s[1] == 'TP':
age_limit = 0
else:
age_limit = int_or_none(codi_etic_s[1])
return { return {
'id': media_id, 'id': media_id,
'title': title, 'title': title,
@@ -106,4 +144,9 @@ class CCMAIE(InfoExtractor):
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
'age_limit': age_limit,
'alt_title': informacio.get('titol_complet'),
'episode_number': int_or_none(informacio.get('capitol')),
'categories': [tematica] if tematica else None,
'series': informacio.get('programa'),
} }

View File

@@ -5,10 +5,16 @@ import codecs
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
merge_dicts,
multipart_encode, multipart_encode,
parse_duration, parse_duration,
random_birthday, random_birthday,
@@ -90,7 +96,7 @@ class CDAIE(InfoExtractor):
raise ExtractorError('This video is only available for premium users.', expected=True) raise ExtractorError('This video is only available for premium users.', expected=True)
need_confirm_age = False need_confirm_age = False
if self._html_search_regex(r'(<form[^>]+action="/a/validatebirth")', if self._html_search_regex(r'(<form[^>]+action="[^"]*/a/validatebirth[^"]*")',
webpage, 'birthday validate form', default=None): webpage, 'birthday validate form', default=None):
webpage = self._download_age_confirm_page( webpage = self._download_age_confirm_page(
url, video_id, note='Confirming age') url, video_id, note='Confirming age')
@@ -107,8 +113,9 @@ class CDAIE(InfoExtractor):
r'Odsłony:(?:\s|&nbsp;)*([0-9]+)', webpage, r'Odsłony:(?:\s|&nbsp;)*([0-9]+)', webpage,
'view_count', default=None) 'view_count', default=None)
average_rating = self._search_regex( average_rating = self._search_regex(
r'<(?:span|meta)[^>]+itemprop=(["\'])ratingValue\1[^>]*>(?P<rating_value>[0-9.]+)', (r'<(?:span|meta)[^>]+itemprop=(["\'])ratingValue\1[^>]*>(?P<rating_value>[0-9.]+)',
webpage, 'rating', fatal=False, group='rating_value') r'<span[^>]+\bclass=["\']rating["\'][^>]*>(?P<rating_value>[0-9.]+)'), webpage, 'rating', fatal=False,
group='rating_value')
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
@@ -123,6 +130,24 @@ class CDAIE(InfoExtractor):
'age_limit': 18 if need_confirm_age else 0, 'age_limit': 18 if need_confirm_age else 0,
} }
# Source: https://www.cda.pl/js/player.js?t=1606154898
def decrypt_file(a):
for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'):
a = a.replace(p, '')
a = compat_urllib_parse_unquote(a)
b = []
for c in a:
f = compat_ord(c)
b.append(compat_chr(33 + (f + 14) % 94) if 33 <= f and 126 >= f else compat_chr(f))
a = ''.join(b)
a = a.replace('.cda.mp4', '')
for p in ('.2cda.pl', '.3cda.pl'):
a = a.replace(p, '.cda.pl')
if '/upstream' in a:
a = a.replace('/upstream', '.mp4/upstream')
return 'https://' + a
return 'https://' + a + '.mp4'
def extract_format(page, version): def extract_format(page, version):
json_str = self._html_search_regex( json_str = self._html_search_regex(
r'player_data=(\\?["\'])(?P<player_data>.+?)\1', page, r'player_data=(\\?["\'])(?P<player_data>.+?)\1', page,
@@ -141,6 +166,8 @@ class CDAIE(InfoExtractor):
video['file'] = codecs.decode(video['file'], 'rot_13') video['file'] = codecs.decode(video['file'], 'rot_13')
if video['file'].endswith('adc.mp4'): if video['file'].endswith('adc.mp4'):
video['file'] = video['file'].replace('adc.mp4', '.mp4') video['file'] = video['file'].replace('adc.mp4', '.mp4')
elif not video['file'].startswith('http'):
video['file'] = decrypt_file(video['file'])
f = { f = {
'url': video['file'], 'url': video['file'],
} }
@@ -179,4 +206,6 @@ class CDAIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
return info_dict info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts(info_dict, info)

View File

@@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..utils import smuggle_url
@@ -38,7 +39,7 @@ class CNBCIE(InfoExtractor):
class CNBCVideoIE(InfoExtractor): class CNBCVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cnbc\.com/video/(?:[^/]+/)+(?P<id>[^./?#&]+)' _VALID_URL = r'https?://(?:www\.)?cnbc\.com(?P<path>/video/(?:[^/]+/)+(?P<id>[^./?#&]+)\.html)'
_TEST = { _TEST = {
'url': 'https://www.cnbc.com/video/2018/07/19/trump-i-dont-necessarily-agree-with-raising-rates.html', 'url': 'https://www.cnbc.com/video/2018/07/19/trump-i-dont-necessarily-agree-with-raising-rates.html',
'info_dict': { 'info_dict': {
@@ -56,11 +57,15 @@ class CNBCVideoIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) video_id = self._download_json(
video_id = self._search_regex( 'https://webql-redesign.cnbcfm.com/graphql', display_id, query={
r'content_id["\']\s*:\s*["\'](\d+)', webpage, display_id, 'query': '''{
'video id') page(path: "%s") {
vcpsId
}
}''' % path,
})['data']['page']['vcpsId']
return self.url_result( return self.url_result(
'http://video.cnbc.com/gallery/?video=%s' % video_id, 'http://video.cnbc.com/gallery/?video=%d' % video_id,
CNBCIE.ie_key()) CNBCIE.ie_key())

View File

@@ -96,7 +96,10 @@ class CNNIE(TurnerBaseIE):
config['data_src'] % path, page_title, { config['data_src'] % path, page_title, {
'default': { 'default': {
'media_src': config['media_src'], 'media_src': config['media_src'],
} },
'f4m': {
'host': 'cnn-vh.akamaihd.net',
},
}) })

View File

@@ -1,142 +1,51 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
from .common import InfoExtractor
class ComedyCentralIE(MTVServicesInfoExtractor): class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/ _VALID_URL = r'https?://(?:www\.)?cc\.com/(?:episodes|video(?:-clips)?)/(?P<id>[0-9a-z]{6})'
(video-clips|episodes|cc-studios|video-collections|shows(?=/[^/]+/(?!full-episodes)))
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/' _FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother', 'url': 'http://www.cc.com/video-clips/5ke9v2/the-daily-show-with-trevor-noah-doc-rivers-and-steve-ballmer---the-nba-player-strike',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8', 'md5': 'b8acb347177c680ff18a292aa2166f80',
'info_dict': { 'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354', 'id': '89ccc86e-1b02-4f83-b0c9-1d9592ecd025',
'ext': 'mp4', 'ext': 'mp4',
'title': 'CC:Stand-Up|August 18, 2013|1|0101|Uncensored - Too Good of a Mother', 'title': 'The Daily Show with Trevor Noah|August 28, 2020|25|25149|Doc Rivers and Steve Ballmer - The NBA Player Strike',
'description': 'After a certain point, breastfeeding becomes c**kblocking.', 'description': 'md5:5334307c433892b85f4f5e5ac9ef7498',
'timestamp': 1376798400, 'timestamp': 1598670000,
'upload_date': '20130818', 'upload_date': '20200829',
}, },
}, { }, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview', 'url': 'http://www.cc.com/episodes/pnzzci/drawn-together--american-idol--parody-clip-show-season-3-ep-314',
'only_matching': True, 'only_matching': True,
}]
class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(?:full-episodes|shows(?=/[^/]+/full-episodes))
/(?P<id>[^?]+)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.cc.com/full-episodes/pv391a/the-daily-show-with-trevor-noah-november-28--2016---ryan-speedo-green-season-22-ep-22028',
'info_dict': {
'description': 'Donald Trump is accused of exploiting his president-elect status for personal gain, Cuban leader Fidel Castro dies, and Ryan Speedo Green discusses "Sing for Your Life."',
'title': 'November 28, 2016 - Ryan Speedo Green',
},
'playlist_count': 4,
}, { }, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes', 'url': 'https://www.cc.com/video/k3sdvm/the-daily-show-with-jon-stewart-exclusive-the-fourth-estate',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
mgid = self._extract_triforce_mgid(webpage, data_zone='t2_lc_promo1')
videos_info = self._get_videos_info(mgid)
return videos_info
class ToshIE(MTVServicesInfoExtractor):
IE_DESC = 'Tosh.0'
_VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
_FEED_URL = 'http://tosh.cc.com/feeds/mrss'
_TESTS = [{
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'info_dict': {
'description': 'Tosh asked fans to share their summer plans.',
'title': 'Twitter Users Share Summer Plans',
},
'playlist': [{
'md5': 'f269e88114c1805bb6d7653fecea9e06',
'info_dict': {
'id': '90498ec2-ed00-11e0-aca6-0026b9414f30',
'ext': 'mp4',
'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
'description': 'Tosh asked fans to share their summer plans.',
'thumbnail': r're:^https?://.*\.jpg',
# It's really reported to be published on year 2077
'upload_date': '20770610',
'timestamp': 3390510600,
'subtitles': {
'en': 'mincount:3',
},
},
}]
}, {
'url': 'http://tosh.cc.com/video-collections/x2iz7k/just-plain-foul/m5q4fp',
'only_matching': True, 'only_matching': True,
}] }]
class ComedyCentralTVIE(MTVServicesInfoExtractor): class ComedyCentralTVIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/folgen/(?P<id>[0-9a-z]{6})'
_TESTS = [{ _TESTS = [{
'url': 'http://www.comedycentral.tv/staffeln/7436-the-mindy-project-staffel-4', 'url': 'https://www.comedycentral.tv/folgen/pxdpec/josh-investigates-klimawandel-staffel-1-ep-1',
'info_dict': { 'info_dict': {
'id': 'local_playlist-f99b626bdfe13568579a', 'id': '15907dc3-ec3c-11e8-a442-0e40cf2fc285',
'ext': 'flv', 'ext': 'mp4',
'title': 'Episode_the-mindy-project_shows_season-4_episode-3_full-episode_part1', 'title': 'Josh Investigates',
'description': 'Steht uns das Ende der Welt bevor?',
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.comedycentral.tv/shows/1074-workaholics',
'only_matching': True,
}, {
'url': 'http://www.comedycentral.tv/shows/1727-the-mindy-project/bonus',
'only_matching': True,
}] }]
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
_GEO_COUNTRIES = ['DE']
def _real_extract(self, url): def _get_feed_query(self, uri):
video_id = self._match_id(url) return {
'accountOverride': 'intl.mtvi.com',
webpage = self._download_webpage(url, video_id) 'arcEp': 'web.cc.tv',
'ep': 'b9032c3a',
mrss_url = self._search_regex( 'imageEp': 'web.cc.tv',
r'data-mrss=(["\'])(?P<url>(?:(?!\1).)+)\1', 'mgid': uri,
webpage, 'mrss url', group='url')
return self._get_videos_info_from_url(mrss_url, video_id)
class ComedyCentralShortnameIE(InfoExtractor):
_VALID_URL = r'^:(?P<id>tds|thedailyshow|theopposition)$'
_TESTS = [{
'url': ':tds',
'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}, {
'url': ':theopposition',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'theopposition': 'http://www.cc.com/shows/the-opposition-with-jordan-klepper/full-episodes',
} }
return self.url_result(shortcut_map[video_id])

View File

@@ -10,6 +10,7 @@ import os
import random import random
import re import re
import socket import socket
import ssl
import sys import sys
import time import time
import math import math
@@ -67,6 +68,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
sanitize_filename, sanitize_filename,
str_or_none, str_or_none,
str_to_int,
strip_or_none, strip_or_none,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
@@ -334,8 +336,8 @@ class InfoExtractor(object):
object, each element of which is a valid dictionary by this specification. object, each element of which is a valid dictionary by this specification.
Additionally, playlists can have "id", "title", "description", "uploader", Additionally, playlists can have "id", "title", "description", "uploader",
"uploader_id", "uploader_url" attributes with the same semantics as videos "uploader_id", "uploader_url", "duration" attributes with the same semantics
(see above). as videos (see above).
_type "multi_video" indicates that there are multiple videos that _type "multi_video" indicates that there are multiple videos that
@@ -623,9 +625,12 @@ class InfoExtractor(object):
url_or_request = update_url_query(url_or_request, query) url_or_request = update_url_query(url_or_request, query)
if data is not None or headers: if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers) url_or_request = sanitized_Request(url_or_request, data, headers)
exceptions = [compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error]
if hasattr(ssl, 'CertificateError'):
exceptions.append(ssl.CertificateError)
try: try:
return self._downloader.urlopen(url_or_request) return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except tuple(exceptions) as err:
if isinstance(err, compat_urllib_error.HTTPError): if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status): if self.__can_accept_status_code(err, expected_status):
# Retain reference to error to prevent file object from # Retain reference to error to prevent file object from
@@ -1232,8 +1237,16 @@ class InfoExtractor(object):
'ViewAction': 'view', 'ViewAction': 'view',
} }
def extract_interaction_type(e):
interaction_type = e.get('interactionType')
if isinstance(interaction_type, dict):
interaction_type = interaction_type.get('@type')
return str_or_none(interaction_type)
def extract_interaction_statistic(e): def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic') interaction_statistic = e.get('interactionStatistic')
if isinstance(interaction_statistic, dict):
interaction_statistic = [interaction_statistic]
if not isinstance(interaction_statistic, list): if not isinstance(interaction_statistic, list):
return return
for is_e in interaction_statistic: for is_e in interaction_statistic:
@@ -1241,10 +1254,13 @@ class InfoExtractor(object):
continue continue
if is_e.get('@type') != 'InteractionCounter': if is_e.get('@type') != 'InteractionCounter':
continue continue
interaction_type = is_e.get('interactionType') interaction_type = extract_interaction_type(is_e)
if not isinstance(interaction_type, compat_str): if not interaction_type:
continue continue
interaction_count = int_or_none(is_e.get('userInteractionCount')) # For interaction count some sites provide string instead of
# an integer (as per spec) with non digit characters (e.g. ",")
# so extracting count with more relaxed str_to_int
interaction_count = str_to_int(is_e.get('userInteractionCount'))
if interaction_count is None: if interaction_count is None:
continue continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1]) count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
@@ -1264,6 +1280,7 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')), 'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')), 'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')), 'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
'filesize': float_or_none(e.get('contentSize')), 'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')), 'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')), 'width': int_or_none(e.get('width')),
@@ -1447,9 +1464,10 @@ class InfoExtractor(object):
try: try:
self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers) self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
return True return True
except ExtractorError: except ExtractorError as e:
self.to_screen( self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item)) '%s: %s URL is invalid, skipping: %s'
% (video_id, item, error_to_compat_str(e.cause)))
return False return False
def http_scheme(self): def http_scheme(self):
@@ -1654,7 +1672,7 @@ class InfoExtractor(object):
# just the media without qualities renditions. # just the media without qualities renditions.
# Fortunately, master playlist can be easily distinguished from media # Fortunately, master playlist can be easily distinguished from media
# playlist based on particular tags availability. As of [1, 4.3.3, 4.3.4] # playlist based on particular tags availability. As of [1, 4.3.3, 4.3.4]
# master playlist tags MUST NOT appear in a media playist and vice versa. # master playlist tags MUST NOT appear in a media playlist and vice versa.
# As of [1, 4.3.3.1] #EXT-X-TARGETDURATION tag is REQUIRED for every # As of [1, 4.3.3.1] #EXT-X-TARGETDURATION tag is REQUIRED for every
# media playlist and MUST NOT appear in master playlist thus we can # media playlist and MUST NOT appear in master playlist thus we can
# clearly detect media playlist with this criterion. # clearly detect media playlist with this criterion.
@@ -2046,7 +2064,7 @@ class InfoExtractor(object):
}) })
return entries return entries
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}, data=None, headers={}, query={}): def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
res = self._download_xml_handle( res = self._download_xml_handle(
mpd_url, video_id, mpd_url, video_id,
note=note or 'Downloading MPD manifest', note=note or 'Downloading MPD manifest',
@@ -2060,10 +2078,9 @@ class InfoExtractor(object):
mpd_base_url = base_url(urlh.geturl()) mpd_base_url = base_url(urlh.geturl())
return self._parse_mpd_formats( return self._parse_mpd_formats(
mpd_doc, mpd_id=mpd_id, mpd_base_url=mpd_base_url, mpd_doc, mpd_id, mpd_base_url, mpd_url)
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None): def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', mpd_url=None):
""" """
Parse formats from MPD manifest. Parse formats from MPD manifest.
References: References:
@@ -2341,15 +2358,7 @@ class InfoExtractor(object):
else: else:
# Assuming direct URL to unfragmented media. # Assuming direct URL to unfragmented media.
f['url'] = base_url f['url'] = base_url
formats.append(f)
# According to [1, 5.3.5.2, Table 7, page 35] @id of Representation
# is not necessarily unique within a Period thus formats with
# the same `format_id` are quite possible. There are numerous examples
# of such manifests (see https://github.com/ytdl-org/youtube-dl/issues/15111,
# https://github.com/ytdl-org/youtube-dl/issues/13919)
full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f)
formats.append(full_info)
else: else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type) self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats return formats
@@ -2503,16 +2512,18 @@ class InfoExtractor(object):
# amp-video and amp-audio are very similar to their HTML5 counterparts # amp-video and amp-audio are very similar to their HTML5 counterparts
# so we wll include them right here (see # so we wll include them right here (see
# https://www.ampproject.org/docs/reference/components/amp-video) # https://www.ampproject.org/docs/reference/components/amp-video)
media_tags = [(media_tag, media_type, '') # For dl8-* tags see https://delight-vr.com/documentation/dl8-video/
for media_tag, media_type _MEDIA_TAG_NAME_RE = r'(?:(?:amp|dl8(?:-live)?)-)?(video|audio)'
in re.findall(r'(?s)(<(?:amp-)?(video|audio)[^>]*/>)', webpage)] media_tags = [(media_tag, media_tag_name, media_type, '')
for media_tag, media_tag_name, media_type
in re.findall(r'(?s)(<(%s)[^>]*/>)' % _MEDIA_TAG_NAME_RE, webpage)]
media_tags.extend(re.findall( media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'. # We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see # Allowing more characters may end up in significant slow down (see
# https://github.com/ytdl-org/youtube-dl/issues/11979, example URL: # https://github.com/ytdl-org/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml). # http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage)) r'(?s)(<(?P<tag>%s)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>' % _MEDIA_TAG_NAME_RE, webpage))
for media_tag, media_type, media_content in media_tags: for media_tag, _, media_type, media_content in media_tags:
media_info = { media_info = {
'formats': [], 'formats': [],
'subtitles': {}, 'subtitles': {},
@@ -2585,7 +2596,15 @@ class InfoExtractor(object):
return entries return entries
def _extract_akamai_formats(self, manifest_url, video_id, hosts={}): def _extract_akamai_formats(self, manifest_url, video_id, hosts={}):
signed = 'hdnea=' in manifest_url
if not signed:
# https://learn.akamai.com/en-us/webhelp/media-services-on-demand/stream-packaging-user-guide/GUID-BE6C0F73-1E06-483B-B0EA-57984B91B7F9.html
manifest_url = re.sub(
r'(?:b=[\d,-]+|(?:__a__|attributes)=off|__b__=\d+)&?',
'', manifest_url).strip('?')
formats = [] formats = []
hdcore_sign = 'hdcore=3.7.0' hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://[^/]+)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m') f4m_url = re.sub(r'(https?://[^/]+)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
hds_host = hosts.get('hds') hds_host = hosts.get('hds')
@@ -2598,13 +2617,38 @@ class InfoExtractor(object):
for entry in f4m_formats: for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign}) entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats) formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://[^/]+)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8') m3u8_url = re.sub(r'(https?://[^/]+)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
hls_host = hosts.get('hls') hls_host = hosts.get('hls')
if hls_host: if hls_host:
m3u8_url = re.sub(r'(https?://)[^/]+', r'\1' + hls_host, m3u8_url) m3u8_url = re.sub(r'(https?://)[^/]+', r'\1' + hls_host, m3u8_url)
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats)
http_host = hosts.get('http')
if http_host and m3u8_formats and not signed:
REPL_REGEX = r'https?://[^/]+/i/([^,]+),([^/]+),([^/]+)\.csmil/.+'
qualities = re.match(REPL_REGEX, m3u8_url).group(2).split(',')
qualities_length = len(qualities)
if len(m3u8_formats) in (qualities_length, qualities_length + 1):
i = 0
for f in m3u8_formats:
if f['vcodec'] != 'none':
for protocol in ('http', 'https'):
http_f = f.copy()
del http_f['manifest_url']
http_url = re.sub(
REPL_REGEX, protocol + r'://%s/\g<1>%s\3' % (http_host, qualities[i]), f['url'])
http_f.update({
'format_id': http_f['format_id'].replace('hls-', protocol + '-'),
'url': http_url,
'protocol': protocol,
})
formats.append(http_f)
i += 1
return formats return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]): def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):

View File

@@ -16,6 +16,8 @@ from ..utils import (
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
parse_iso8601, parse_iso8601,
strip_or_none,
try_get,
) )
@@ -82,6 +84,7 @@ class CondeNastIE(InfoExtractor):
'uploader': 'gq', 'uploader': 'gq',
'upload_date': '20170321', 'upload_date': '20170321',
'timestamp': 1490126427, 'timestamp': 1490126427,
'description': 'How much grimmer would things be if these people were competent?',
}, },
}, { }, {
# JS embed # JS embed
@@ -93,7 +96,7 @@ class CondeNastIE(InfoExtractor):
'title': '3D printed TSA Travel Sentry keys really do open TSA locks', 'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
'uploader': 'arstechnica', 'uploader': 'arstechnica',
'upload_date': '20150916', 'upload_date': '20150916',
'timestamp': 1442434955, 'timestamp': 1442434920,
} }
}, { }, {
'url': 'https://player.cnevids.com/inline/video/59138decb57ac36b83000005.js?target=js-cne-player', 'url': 'https://player.cnevids.com/inline/video/59138decb57ac36b83000005.js?target=js-cne-player',
@@ -196,6 +199,13 @@ class CondeNastIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
for t, caption in video_info.get('captions', {}).items():
caption_url = caption.get('src')
if not (t in ('vtt', 'srt', 'tml') and caption_url):
continue
subtitles.setdefault('en', []).append({'url': caption_url})
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
@@ -208,6 +218,7 @@ class CondeNastIE(InfoExtractor):
'season': video_info.get('season_title'), 'season': video_info.get('season_title'),
'timestamp': parse_iso8601(video_info.get('premiere_date')), 'timestamp': parse_iso8601(video_info.get('premiere_date')),
'categories': video_info.get('categories'), 'categories': video_info.get('categories'),
'subtitles': subtitles,
} }
def _real_extract(self, url): def _real_extract(self, url):
@@ -225,8 +236,16 @@ class CondeNastIE(InfoExtractor):
if url_type == 'series': if url_type == 'series':
return self._extract_series(url, webpage) return self._extract_series(url, webpage)
else: else:
params = self._extract_video_params(webpage, display_id) video = try_get(self._parse_json(self._search_regex(
info = self._search_json_ld( r'__PRELOADED_STATE__\s*=\s*({.+?});', webpage,
webpage, display_id, fatal=False) 'preload state', '{}'), display_id),
lambda x: x['transformed']['video'])
if video:
params = {'videoId': video['id']}
info = {'description': strip_or_none(video.get('description'))}
else:
params = self._extract_video_params(webpage, display_id)
info = self._search_json_ld(
webpage, display_id, fatal=False)
info.update(self._extract_video(params)) info.update(self._extract_video(params))
return info return info

View File

@@ -8,9 +8,14 @@ from ..utils import (
ExtractorError, ExtractorError,
extract_attributes, extract_attributes,
find_xpath_attr, find_xpath_attr,
get_element_by_attribute,
get_element_by_class, get_element_by_class,
int_or_none, int_or_none,
js_to_json,
merge_dicts,
parse_iso8601,
smuggle_url, smuggle_url,
str_to_int,
unescapeHTML, unescapeHTML,
) )
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
@@ -98,6 +103,48 @@ class CSpanIE(InfoExtractor):
bc_attr['data-bcid']) bc_attr['data-bcid'])
return self.url_result(smuggle_url(bc_url, {'source_url': url})) return self.url_result(smuggle_url(bc_url, {'source_url': url}))
def add_referer(formats):
for f in formats:
f.setdefault('http_headers', {})['Referer'] = url
# As of 01.12.2020 this path looks to cover all cases making the rest
# of the code unnecessary
jwsetup = self._parse_json(
self._search_regex(
r'(?s)jwsetup\s*=\s*({.+?})\s*;', webpage, 'jwsetup',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
if jwsetup:
info = self._parse_jwplayer_data(
jwsetup, video_id, require_title=False, m3u8_id='hls',
base_url=url)
add_referer(info['formats'])
for subtitles in info['subtitles'].values():
for subtitle in subtitles:
ext = determine_ext(subtitle['url'])
if ext == 'php':
ext = 'vtt'
subtitle['ext'] = ext
ld_info = self._search_json_ld(webpage, video_id, default={})
title = get_element_by_class('video-page-title', webpage) or \
self._og_search_title(webpage)
description = get_element_by_attribute('itemprop', 'description', webpage) or \
self._html_search_meta(['og:description', 'description'], webpage)
return merge_dicts(info, ld_info, {
'title': title,
'thumbnail': get_element_by_attribute('itemprop', 'thumbnailUrl', webpage),
'description': description,
'timestamp': parse_iso8601(get_element_by_attribute('itemprop', 'uploadDate', webpage)),
'location': get_element_by_attribute('itemprop', 'contentLocation', webpage),
'duration': int_or_none(self._search_regex(
r'jwsetup\.seclength\s*=\s*(\d+);',
webpage, 'duration', fatal=False)),
'view_count': str_to_int(self._search_regex(
r"<span[^>]+class='views'[^>]*>([\d,]+)\s+Views</span>",
webpage, 'views', fatal=False)),
})
# Obsolete
# We first look for clipid, because clipprog always appears before # We first look for clipid, because clipprog always appears before
patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')] patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
results = list(filter(None, (re.search(p, webpage) for p in patterns))) results = list(filter(None, (re.search(p, webpage) for p in patterns)))
@@ -165,6 +212,7 @@ class CSpanIE(InfoExtractor):
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
path, video_id, 'mp4', entry_protocol='m3u8_native', path, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }] m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }]
add_referer(formats)
self._sort_formats(formats) self._sort_formats(formats)
entries.append({ entries.append({
'id': '%s_%d' % (video_id, partnum + 1), 'id': '%s_%d' % (video_id, partnum + 1),

View File

@@ -0,0 +1,52 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/(?P<id>(?:show|movie)s/[^/]+/[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ctv.ca/shows/your-morning/wednesday-december-23-2020-s5e88',
'info_dict': {
'id': '2102249',
'ext': 'flv',
'title': 'Wednesday, December 23, 2020',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'Your Morning delivers original perspectives and unique insights into the headlines of the day.',
'timestamp': 1608732000,
'upload_date': '20201223',
'series': 'Your Morning',
'season': '2020-2021',
'season_number': 5,
'episode_number': 88,
'tags': ['Your Morning'],
'categories': ['Talk Show'],
'duration': 7467.126,
},
}, {
'url': 'https://www.ctv.ca/movies/adam-sandlers-eight-crazy-nights/adam-sandlers-eight-crazy-nights',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
content = self._download_json(
'https://www.ctv.ca/space-graphql/graphql', display_id, query={
'query': '''{
resolvedPath(path: "/%s") {
lastSegment {
content {
... on AxisContent {
axisId
videoPlayerDestCode
}
}
}
}
}''' % display_id,
})['data']['resolvedPath']['lastSegment']['content']
video_id = content['axisId']
return self.url_result(
'9c9media:%s:%s' % (content['videoPlayerDestCode'], video_id),
'NineCNineMedia', video_id)

View File

@@ -7,7 +7,7 @@ from .dplay import DPlayIE
class DiscoveryNetworksDeIE(DPlayIE): class DiscoveryNetworksDeIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show)/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show|sendungen)/(?P<programme>[^/]+)/(?:video/)?(?P<alternate_id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.tlc.de/programme/breaking-amish/video/die-welt-da-drauen/DCB331270001100', 'url': 'https://www.tlc.de/programme/breaking-amish/video/die-welt-da-drauen/DCB331270001100',
@@ -29,6 +29,9 @@ class DiscoveryNetworksDeIE(DPlayIE):
}, { }, {
'url': 'https://www.dplay.co.uk/show/ghost-adventures/video/hotel-leger-103620/EHD_280313B', 'url': 'https://www.dplay.co.uk/show/ghost-adventures/video/hotel-leger-103620/EHD_280313B',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://tlc.de/sendungen/breaking-amish/die-welt-da-drauen/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -17,7 +17,12 @@ from ..utils import (
class DPlayIE(InfoExtractor): class DPlayIE(InfoExtractor):
_VALID_URL = r'''(?x)https?:// _VALID_URL = r'''(?x)https?://
(?P<domain> (?P<domain>
(?:www\.)?(?P<host>dplay\.(?P<country>dk|fi|jp|se|no))| (?:www\.)?(?P<host>d
(?:
play\.(?P<country>dk|fi|jp|se|no)|
iscoveryplus\.(?P<plus_country>dk|es|fi|it|se|no)
)
)|
(?P<subdomain_country>es|it)\.dplay\.com (?P<subdomain_country>es|it)\.dplay\.com
)/[^/]+/(?P<id>[^/]+/[^/?#]+)''' )/[^/]+/(?P<id>[^/]+/[^/?#]+)'''
@@ -126,6 +131,24 @@ class DPlayIE(InfoExtractor):
}, { }, {
'url': 'https://www.dplay.jp/video/gold-rush/24086', 'url': 'https://www.dplay.jp/video/gold-rush/24086',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.discoveryplus.se/videos/nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.dk/videoer/ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.no/videoer/i-kongens-klr/sesong-1-episode-7',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.it/videos/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.es/videos/la-fiebre-del-oro/temporada-8-episodio-1',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.fi/videot/shifting-gears-with-aaron-kaufman/episode-16',
'only_matching': True,
}] }]
def _get_disco_api_info(self, url, display_id, disco_host, realm, country): def _get_disco_api_info(self, url, display_id, disco_host, realm, country):
@@ -241,7 +264,7 @@ class DPlayIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') display_id = mobj.group('id')
domain = mobj.group('domain').lstrip('www.') domain = mobj.group('domain').lstrip('www.')
country = mobj.group('country') or mobj.group('subdomain_country') country = mobj.group('country') or mobj.group('subdomain_country') or mobj.group('plus_country')
host = 'disco-api.' + domain if domain.startswith('dplay.') else 'eu2-prod.disco-api.com' host = 'disco-api.' + domain if domain[0] == 'd' else 'eu2-prod.disco-api.com'
return self._get_disco_api_info( return self._get_disco_api_info(
url, display_id, host, 'dplay' + country, country) url, display_id, host, 'dplay' + country, country)

View File

@@ -29,7 +29,7 @@ class DRTVIE(InfoExtractor):
https?:// https?://
(?: (?:
(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*| (?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*|
(?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode)/ (?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode|program)/
) )
(?P<id>[\da-z_-]+) (?P<id>[\da-z_-]+)
''' '''
@@ -111,6 +111,9 @@ class DRTVIE(InfoExtractor):
}, { }, {
'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769', 'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.dr.dk/drtv/program/jagten_220924',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -12,7 +12,14 @@ from ..utils import (
) )
class EggheadCourseIE(InfoExtractor): class EggheadBaseIE(InfoExtractor):
def _call_api(self, path, video_id, resource, fatal=True):
return self._download_json(
'https://app.egghead.io/api/v1/' + path,
video_id, 'Downloading %s JSON' % resource, fatal=fatal)
class EggheadCourseIE(EggheadBaseIE):
IE_DESC = 'egghead.io course' IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course' IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)' _VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
@@ -28,10 +35,9 @@ class EggheadCourseIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
series_path = 'series/' + playlist_id
lessons = self._download_json( lessons = self._call_api(
'https://egghead.io/api/v1/series/%s/lessons' % playlist_id, series_path + '/lessons', playlist_id, 'course lessons')
playlist_id, 'Downloading course lessons JSON')
entries = [] entries = []
for lesson in lessons: for lesson in lessons:
@@ -44,9 +50,8 @@ class EggheadCourseIE(InfoExtractor):
entries.append(self.url_result( entries.append(self.url_result(
lesson_url, ie=EggheadLessonIE.ie_key(), video_id=lesson_id)) lesson_url, ie=EggheadLessonIE.ie_key(), video_id=lesson_id))
course = self._download_json( course = self._call_api(
'https://egghead.io/api/v1/series/%s' % playlist_id, series_path, playlist_id, 'course', False) or {}
playlist_id, 'Downloading course JSON', fatal=False) or {}
playlist_id = course.get('id') playlist_id = course.get('id')
if playlist_id: if playlist_id:
@@ -57,7 +62,7 @@ class EggheadCourseIE(InfoExtractor):
course.get('description')) course.get('description'))
class EggheadLessonIE(InfoExtractor): class EggheadLessonIE(EggheadBaseIE):
IE_DESC = 'egghead.io lesson' IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson' IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)' _VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
@@ -74,7 +79,7 @@ class EggheadLessonIE(InfoExtractor):
'upload_date': '20161209', 'upload_date': '20161209',
'duration': 304, 'duration': 304,
'view_count': 0, 'view_count': 0,
'tags': ['javascript', 'free'], 'tags': 'count:2',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -88,8 +93,8 @@ class EggheadLessonIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
lesson = self._download_json( lesson = self._call_api(
'https://egghead.io/api/v1/lessons/%s' % display_id, display_id) 'lessons/' + display_id, display_id, 'lesson')
lesson_id = compat_str(lesson['id']) lesson_id = compat_str(lesson['id'])
title = lesson['title'] title = lesson['title']

View File

@@ -16,7 +16,7 @@ from ..utils import (
class EpornerIE(InfoExtractor): class EpornerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eporner\.com/(?:hd-porn|embed)/(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?' _VALID_URL = r'https?://(?:www\.)?eporner\.com/(?:(?:hd-porn|embed)/|video-)(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/', 'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
'md5': '39d486f046212d8e1b911c52ab4691f8', 'md5': '39d486f046212d8e1b911c52ab4691f8',
@@ -43,7 +43,10 @@ class EpornerIE(InfoExtractor):
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0', 'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0', 'url': 'http://www.eporner.com/embed/3YRUtzMcWn0',
'only_matching': True,
}, {
'url': 'https://www.eporner.com/video-FJsA19J3Y3H/one-of-the-greats/',
'only_matching': True, 'only_matching': True,
}] }]
@@ -57,7 +60,7 @@ class EpornerIE(InfoExtractor):
video_id = self._match_id(urlh.geturl()) video_id = self._match_id(urlh.geturl())
hash = self._search_regex( hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash') r'hash\s*[:=]\s*["\']([\da-f]{32})', webpage, 'hash')
title = self._og_search_title(webpage, default=None) or self._html_search_regex( title = self._og_search_title(webpage, default=None) or self._html_search_regex(
r'<title>(.+?) - EPORNER', webpage, 'title') r'<title>(.+?) - EPORNER', webpage, 'title')
@@ -115,8 +118,8 @@ class EpornerIE(InfoExtractor):
duration = parse_duration(self._html_search_meta( duration = parse_duration(self._html_search_meta(
'duration', webpage, default=None)) 'duration', webpage, default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'id="cinemaviews">\s*([0-9,]+)\s*<small>views', r'id=["\']cinemaviews1["\'][^>]*>\s*([0-9,]+)',
webpage, 'view count', fatal=False)) webpage, 'view count', default=None))
return merge_dicts(json_ld, { return merge_dicts(json_ld, {
'id': video_id, 'id': video_id,

View File

@@ -60,7 +60,7 @@ class EuropaIE(InfoExtractor):
title = get_item('title', preferred_langs) or video_id title = get_item('title', preferred_langs) or video_id
description = get_item('description', preferred_langs) description = get_item('description', preferred_langs)
thumbnmail = xpath_text(playlist, './info/thumburl', 'thumbnail') thumbnail = xpath_text(playlist, './info/thumburl', 'thumbnail')
upload_date = unified_strdate(xpath_text(playlist, './info/date', 'upload date')) upload_date = unified_strdate(xpath_text(playlist, './info/date', 'upload date'))
duration = parse_duration(xpath_text(playlist, './info/duration', 'duration')) duration = parse_duration(xpath_text(playlist, './info/duration', 'duration'))
view_count = int_or_none(xpath_text(playlist, './info/views', 'views')) view_count = int_or_none(xpath_text(playlist, './info/views', 'views'))
@@ -85,7 +85,7 @@ class EuropaIE(InfoExtractor):
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnmail, 'thumbnail': thumbnail,
'upload_date': upload_date, 'upload_date': upload_date,
'duration': duration, 'duration': duration,
'view_count': view_count, 'view_count': view_count,

View File

@@ -1,77 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
sanitized_Request,
)
class EveryonesMixtapeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?everyonesmixtape\.com/#/mix/(?P<id>[0-9a-zA-Z]+)(?:/(?P<songnr>[0-9]))?$'
_TESTS = [{
'url': 'http://everyonesmixtape.com/#/mix/m7m0jJAbMQi/5',
'info_dict': {
'id': '5bfseWNmlds',
'ext': 'mp4',
'title': "Passion Pit - \"Sleepyhead\" (Official Music Video)",
'uploader': 'FKR.TV',
'uploader_id': 'frenchkissrecords',
'description': "Music video for \"Sleepyhead\" from Passion Pit's debut EP Chunk Of Change.\nBuy on iTunes: https://itunes.apple.com/us/album/chunk-of-change-ep/id300087641\n\nDirected by The Wilderness.\n\nhttp://www.passionpitmusic.com\nhttp://www.frenchkissrecords.com",
'upload_date': '20081015'
},
'params': {
'skip_download': True, # This is simply YouTube
}
}, {
'url': 'http://everyonesmixtape.com/#/mix/m7m0jJAbMQi',
'info_dict': {
'id': 'm7m0jJAbMQi',
'title': 'Driving',
},
'playlist_count': 24
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
pllist_url = 'http://everyonesmixtape.com/mixtape.php?a=getMixes&u=-1&linked=%s&explore=' % playlist_id
pllist_req = sanitized_Request(pllist_url)
pllist_req.add_header('X-Requested-With', 'XMLHttpRequest')
playlist_list = self._download_json(
pllist_req, playlist_id, note='Downloading playlist metadata')
try:
playlist_no = next(playlist['id']
for playlist in playlist_list
if playlist['code'] == playlist_id)
except StopIteration:
raise ExtractorError('Playlist id not found')
pl_url = 'http://everyonesmixtape.com/mixtape.php?a=getMix&id=%s&userId=null&code=' % playlist_no
pl_req = sanitized_Request(pl_url)
pl_req.add_header('X-Requested-With', 'XMLHttpRequest')
playlist = self._download_json(
pl_req, playlist_id, note='Downloading playlist info')
entries = [{
'_type': 'url',
'url': t['url'],
'title': t['title'],
} for t in playlist['tracks']]
if mobj.group('songnr'):
songnr = int(mobj.group('songnr')) - 1
return entries[songnr]
playlist_title = playlist['mixData']['name']
return {
'_type': 'playlist',
'id': playlist_id,
'title': playlist_title,
'entries': entries,
}

View File

@@ -15,7 +15,7 @@ from ..utils import (
class ExpressenIE(InfoExtractor): class ExpressenIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:www\.)?expressen\.se/ (?:www\.)?(?:expressen|di)\.se/
(?:(?:tvspelare/video|videoplayer/embed)/)? (?:(?:tvspelare/video|videoplayer/embed)/)?
tv/(?:[^/]+/)* tv/(?:[^/]+/)*
(?P<id>[^/?#&]+) (?P<id>[^/?#&]+)
@@ -42,13 +42,16 @@ class ExpressenIE(InfoExtractor):
}, { }, {
'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di', 'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.di.se/videoplayer/embed/tv/ditv/borsmorgon/implantica-rusar-70--under-borspremiaren-hor-styrelsemedlemmen/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return [ return [
mobj.group('url') for mobj in re.finditer( mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?expressen\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1', r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?(?:expressen|di)\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -30,14 +30,22 @@ from .adobetv import (
from .adultswim import AdultSwimIE from .adultswim import AdultSwimIE
from .aenetworks import ( from .aenetworks import (
AENetworksIE, AENetworksIE,
AENetworksCollectionIE,
AENetworksShowIE,
HistoryTopicIE, HistoryTopicIE,
HistoryPlayerIE,
BiographyIE,
) )
from .afreecatv import AfreecaTVIE from .afreecatv import AfreecaTVIE
from .airmozilla import AirMozillaIE from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE from .alphaporno import AlphaPornoIE
from .amara import AmaraIE
from .amcnetworks import AMCNetworksIE from .amcnetworks import AMCNetworksIE
from .americastestkitchen import AmericasTestKitchenIE from .americastestkitchen import (
AmericasTestKitchenIE,
AmericasTestKitchenSeasonIE,
)
from .animeondemand import AnimeOnDemandIE from .animeondemand import AnimeOnDemandIE
from .anvato import AnvatoIE from .anvato import AnvatoIE
from .aol import AolIE from .aol import AolIE
@@ -50,7 +58,9 @@ from .appletrailers import (
AppleTrailersIE, AppleTrailersIE,
AppleTrailersSectionIE, AppleTrailersSectionIE,
) )
from .applepodcasts import ApplePodcastsIE
from .archiveorg import ArchiveOrgIE from .archiveorg import ArchiveOrgIE
from .arcpublishing import ArcPublishingIE
from .arkena import ArkenaIE from .arkena import ArkenaIE
from .ard import ( from .ard import (
ARDBetaMediathekIE, ARDBetaMediathekIE,
@@ -58,7 +68,7 @@ from .ard import (
ARDMediathekIE, ARDMediathekIE,
) )
from .arte import ( from .arte import (
ArteTVPlus7IE, ArteTVIE,
ArteTVEmbedIE, ArteTVEmbedIE,
ArteTVPlaylistIE, ArteTVPlaylistIE,
) )
@@ -88,16 +98,18 @@ from .bbc import (
BBCCoUkPlaylistIE, BBCCoUkPlaylistIE,
BBCIE, BBCIE,
) )
from .beampro import (
BeamProLiveIE,
BeamProVodIE,
)
from .beeg import BeegIE from .beeg import BeegIE
from .behindkink import BehindKinkIE from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE from .bellmedia import BellMediaIE
from .beatport import BeatportIE from .beatport import BeatportIE
from .bet import BetIE from .bet import BetIE
from .bfi import BFIPlayerIE from .bfi import BFIPlayerIE
from .bfmtv import (
BFMTVIE,
BFMTVLiveIE,
BFMTVArticleIE,
)
from .bibeltv import BibelTVIE
from .bigflix import BigflixIE from .bigflix import BigflixIE
from .bild import BildIE from .bild import BildIE
from .bilibili import ( from .bilibili import (
@@ -120,7 +132,9 @@ from .bleacherreport import (
from .blinkx import BlinkxIE from .blinkx import BlinkxIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE
from .bostonglobe import BostonGlobeIE from .bostonglobe import BostonGlobeIE
from .box import BoxIE
from .bpb import BpbIE from .bpb import BpbIE
from .br import ( from .br import (
BRIE, BRIE,
@@ -163,7 +177,10 @@ from .cbc import (
CBCOlympicsIE, CBCOlympicsIE,
) )
from .cbs import CBSIE from .cbs import CBSIE
from .cbslocal import CBSLocalIE from .cbslocal import (
CBSLocalIE,
CBSLocalArticleIE,
)
from .cbsinteractive import CBSInteractiveIE from .cbsinteractive import CBSInteractiveIE
from .cbsnews import ( from .cbsnews import (
CBSNewsEmbedIE, CBSNewsEmbedIE,
@@ -218,11 +235,8 @@ from .cnn import (
) )
from .coub import CoubIE from .coub import CoubIE
from .comedycentral import ( from .comedycentral import (
ComedyCentralFullEpisodesIE,
ComedyCentralIE, ComedyCentralIE,
ComedyCentralShortnameIE,
ComedyCentralTVIE, ComedyCentralTVIE,
ToshIE,
) )
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import ( from .commonprotocols import (
@@ -241,6 +255,7 @@ from .crunchyroll import (
) )
from .cspan import CSpanIE from .cspan import CSpanIE
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import ( from .curiositystream import (
@@ -327,7 +342,6 @@ from .espn import (
) )
from .esri import EsriVideoIE from .esri import EsriVideoIE
from .europa import EuropaIE from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .expotv import ExpoTVIE from .expotv import ExpoTVIE
from .expressen import ExpressenIE from .expressen import ExpressenIE
from .extremetube import ExtremeTubeIE from .extremetube import ExtremeTubeIE
@@ -391,10 +405,10 @@ from .frontendmasters import (
FrontendMastersLessonIE, FrontendMastersLessonIE,
FrontendMastersCourseIE FrontendMastersCourseIE
) )
from .fujitv import FujiTVFODPlus7IE
from .funimation import FunimationIE from .funimation import FunimationIE
from .funk import FunkIE from .funk import FunkIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE from .gaia import GaiaIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
@@ -415,7 +429,10 @@ from .go import GoIE
from .godtube import GodTubeIE from .godtube import GodTubeIE
from .golem import GolemIE from .golem import GolemIE
from .googledrive import GoogleDriveIE from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE from .googlepodcasts import (
GooglePodcastsIE,
GooglePodcastsFeedIE,
)
from .googlesearch import GoogleSearchIE from .googlesearch import GoogleSearchIE
from .goshgay import GoshgayIE from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE from .gputechconf import GPUTechConfIE
@@ -456,6 +473,10 @@ from .ign import (
OneUPIE, OneUPIE,
PCMagIE, PCMagIE,
) )
from .iheart import (
IHeartRadioIE,
IHeartRadioPodcastIE,
)
from .imdb import ( from .imdb import (
ImdbIE, ImdbIE,
ImdbListIE ImdbListIE
@@ -500,13 +521,15 @@ from .joj import JojIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .kakao import KakaoIE from .kakao import KakaoIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE from .kankan import KankanIE
from .karaoketv import KaraoketvIE from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE from .keezmovies import KeezMoviesIE
from .ketnet import KetnetIE from .ketnet import KetnetIE
from .khanacademy import KhanAcademyIE from .khanacademy import (
KhanAcademyIE,
KhanAcademyUnitIE,
)
from .kickstarter import KickStarterIE from .kickstarter import KickStarterIE
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE from .kinopoisk import KinoPoiskIE
@@ -529,6 +552,10 @@ from .laola1tv import (
EHFTVIE, EHFTVIE,
ITTFIE, ITTFIE,
) )
from .lbry import (
LBRYIE,
LBRYChannelIE,
)
from .lci import LCIIE from .lci import LCIIE
from .lcp import ( from .lcp import (
LcpPlayIE, LcpPlayIE,
@@ -603,6 +630,7 @@ from .markiza import (
from .massengeschmacktv import MassengeschmackTVIE from .massengeschmacktv import MassengeschmackTVIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .medaltv import MedalTVIE
from .mediaset import MediasetIE from .mediaset import MediasetIE
from .mediasite import ( from .mediasite import (
MediasiteIE, MediasiteIE,
@@ -623,6 +651,11 @@ from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE, MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE, MicrosoftVirtualAcademyCourseIE,
) )
from .minds import (
MindsIE,
MindsChannelIE,
MindsGroupIE,
)
from .ministrygrid import MinistryGridIE from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE from .minoto import MinotoIE
from .miomio import MioMioIE from .miomio import MioMioIE
@@ -673,9 +706,15 @@ from .nationalgeographic import (
NationalGeographicTVIE, NationalGeographicTVIE,
) )
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import (
NBAWatchEmbedIE,
NBAWatchIE,
NBAWatchCollectionIE,
NBAEmbedIE,
NBAIE,
NBAChannelIE,
)
from .nbc import ( from .nbc import (
CSNNEIE,
NBCIE, NBCIE,
NBCNewsIE, NBCNewsIE,
NBCOlympicsIE, NBCOlympicsIE,
@@ -718,8 +757,14 @@ from .nexx import (
NexxIE, NexxIE,
NexxEmbedIE, NexxEmbedIE,
) )
from .nfl import NFLIE from .nfl import (
from .nhk import NhkVodIE NFLIE,
NFLArticleIE,
)
from .nhk import (
NhkVodIE,
NhkVodProgramIE,
)
from .nhl import NHLIE from .nhl import NHLIE
from .nick import ( from .nick import (
NickIE, NickIE,
@@ -735,7 +780,6 @@ from .ninenow import NineNowIE
from .nintendo import NintendoIE from .nintendo import NintendoIE
from .njpwworld import NJPWWorldIE from .njpwworld import NJPWWorldIE
from .nobelprize import NobelPrizeIE from .nobelprize import NobelPrizeIE
from .noco import NocoIE
from .nonktube import NonkTubeIE from .nonktube import NonkTubeIE
from .noovo import NoovoIE from .noovo import NoovoIE
from .normalboots import NormalbootsIE from .normalboots import NormalbootsIE
@@ -768,6 +812,7 @@ from .nrk import (
NRKSkoleIE, NRKSkoleIE,
NRKTVIE, NRKTVIE,
NRKTVDirekteIE, NRKTVDirekteIE,
NRKRadioPodkastIE,
NRKTVEpisodeIE, NRKTVEpisodeIE,
NRKTVEpisodesIE, NRKTVEpisodesIE,
NRKTVSeasonIE, NRKTVSeasonIE,
@@ -780,6 +825,7 @@ from .ntvru import NTVRuIE
from .nytimes import ( from .nytimes import (
NYTimesIE, NYTimesIE,
NYTimesArticleIE, NYTimesArticleIE,
NYTimesCookingIE,
) )
from .nuvid import NuvidIE from .nuvid import NuvidIE
from .nzz import NZZIE from .nzz import NZZIE
@@ -842,6 +888,10 @@ from .picarto import (
) )
from .piksel import PikselIE from .piksel import PikselIE
from .pinkbike import PinkbikeIE from .pinkbike import PinkbikeIE
from .pinterest import (
PinterestIE,
PinterestCollectionIE,
)
from .pladform import PladformIE from .pladform import PladformIE
from .platzi import ( from .platzi import (
PlatziIE, PlatziIE,
@@ -918,7 +968,9 @@ from .rbmaradio import RBMARadioIE
from .rds import RDSIE from .rds import RDSIE
from .redbulltv import ( from .redbulltv import (
RedBullTVIE, RedBullTVIE,
RedBullEmbedIE,
RedBullTVRrnContentIE, RedBullTVRrnContentIE,
RedBullIE,
) )
from .reddit import ( from .reddit import (
RedditIE, RedditIE,
@@ -955,6 +1007,7 @@ from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETe
from .rtvnh import RTVNHIE from .rtvnh import RTVNHIE
from .rtvs import RTVSIE from .rtvs import RTVSIE
from .ruhd import RUHDIE from .ruhd import RUHDIE
from .rumble import RumbleEmbedIE
from .rutube import ( from .rutube import (
RutubeIE, RutubeIE,
RutubeChannelIE, RutubeChannelIE,
@@ -1005,6 +1058,16 @@ from .shared import (
from .showroomlive import ShowRoomLiveIE from .showroomlive import ShowRoomLiveIE
from .sina import SinaIE from .sina import SinaIE
from .sixplay import SixPlayIE from .sixplay import SixPlayIE
from .skyit import (
SkyItPlayerIE,
SkyItVideoIE,
SkyItVideoLiveIE,
SkyItIE,
SkyItAcademyIE,
SkyItArteIE,
CieloTVItIE,
TV8ItIE,
)
from .skylinewebcams import SkylineWebcamsIE from .skylinewebcams import SkylineWebcamsIE
from .skynewsarabia import ( from .skynewsarabia import (
SkyNewsArabiaIE, SkyNewsArabiaIE,
@@ -1013,16 +1076,11 @@ from .skynewsarabia import (
from .sky import ( from .sky import (
SkyNewsIE, SkyNewsIE,
SkySportsIE, SkySportsIE,
SkySportsNewsIE,
) )
from .slideshare import SlideshareIE from .slideshare import SlideshareIE
from .slideslive import SlidesLiveIE from .slideslive import SlidesLiveIE
from .slutload import SlutloadIE from .slutload import SlutloadIE
from .smotri import (
SmotriIE,
SmotriCommunityIE,
SmotriUserIE,
SmotriBroadcastIE,
)
from .snotr import SnotrIE from .snotr import SnotrIE
from .sohu import SohuIE from .sohu import SohuIE
from .sonyliv import SonyLIVIE from .sonyliv import SonyLIVIE
@@ -1051,16 +1109,28 @@ from .spankbang import (
SpankBangPlaylistIE, SpankBangPlaylistIE,
) )
from .spankwire import SpankwireIE from .spankwire import SpankwireIE
from .spiegel import SpiegelIE, SpiegelArticleIE from .spiegel import SpiegelIE
from .spiegeltv import SpiegeltvIE
from .spike import ( from .spike import (
BellatorIE, BellatorIE,
ParamountNetworkIE, ParamountNetworkIE,
) )
from .stitcher import StitcherIE from .stitcher import (
StitcherIE,
StitcherShowIE,
)
from .sport5 import Sport5IE from .sport5 import Sport5IE
from .sportbox import SportBoxIE from .sportbox import SportBoxIE
from .sportdeutschland import SportDeutschlandIE from .sportdeutschland import SportDeutschlandIE
from .spotify import (
SpotifyIE,
SpotifyShowIE,
)
from .spreaker import (
SpreakerIE,
SpreakerPageIE,
SpreakerShowIE,
SpreakerShowPageIE,
)
from .springboardplatform import SpringboardPlatformIE from .springboardplatform import SpringboardPlatformIE
from .sprout import SproutIE from .sprout import SproutIE
from .srgssr import ( from .srgssr import (
@@ -1095,7 +1165,6 @@ from .tagesschau import (
TagesschauIE, TagesschauIE,
) )
from .tass import TassIE from .tass import TassIE
from .tastytrade import TastyTradeIE
from .tbs import TBSIE from .tbs import TBSIE
from .tdslifeway import TDSLifewayIE from .tdslifeway import TDSLifewayIE
from .teachable import ( from .teachable import (
@@ -1122,6 +1191,7 @@ from .telequebec import (
TeleQuebecSquatIE, TeleQuebecSquatIE,
TeleQuebecEmissionIE, TeleQuebecEmissionIE,
TeleQuebecLiveIE, TeleQuebecLiveIE,
TeleQuebecVideoIE,
) )
from .teletask import TeleTaskIE from .teletask import TeleTaskIE
from .telewebion import TelewebionIE from .telewebion import TelewebionIE
@@ -1158,13 +1228,20 @@ from .tnaflix import (
EMPFlixIE, EMPFlixIE,
MovieFapIE, MovieFapIE,
) )
from .toggle import ToggleIE from .toggle import (
ToggleIE,
MeWatchIE,
)
from .tonline import TOnlineIE from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE from .toongoggles import ToonGogglesIE
from .toutv import TouTvIE from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE from .trilulilu import TriluliluIE
from .trovo import (
TrovoIE,
TrovoVodIE,
)
from .trunews import TruNewsIE from .trunews import TruNewsIE
from .trutv import TruTVIE from .trutv import TruTVIE
from .tube8 import Tube8IE from .tube8 import Tube8IE
@@ -1183,6 +1260,7 @@ from .tv2 import (
TV2IE, TV2IE,
TV2ArticleIE, TV2ArticleIE,
KatsomoIE, KatsomoIE,
MTVUutisetArticleIE,
) )
from .tv2dk import ( from .tv2dk import (
TV2DKIE, TV2DKIE,
@@ -1191,7 +1269,14 @@ from .tv2dk import (
from .tv2hu import TV2HuIE from .tv2hu import TV2HuIE
from .tv4 import TV4IE from .tv4 import TV4IE
from .tv5mondeplus import TV5MondePlusIE from .tv5mondeplus import TV5MondePlusIE
from .tva import TVAIE from .tv5unis import (
TV5UnisVideoIE,
TV5UnisIE,
)
from .tva import (
TVAIE,
QubIE,
)
from .tvanouvelles import ( from .tvanouvelles import (
TVANouvellesIE, TVANouvellesIE,
TVANouvellesArticleIE, TVANouvellesArticleIE,
@@ -1200,6 +1285,7 @@ from .tvc import (
TVCIE, TVCIE,
TVCArticleIE, TVCArticleIE,
) )
from .tver import TVerIE
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvn24 import TVN24IE from .tvn24 import TVN24IE
@@ -1229,14 +1315,11 @@ from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE from .twentythreevideo import TwentyThreeVideoIE
from .twitcasting import TwitCastingIE from .twitcasting import TwitCastingIE
from .twitch import ( from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE, TwitchVodIE,
TwitchProfileIE, TwitchCollectionIE,
TwitchAllVideosIE, TwitchVideosIE,
TwitchUploadsIE, TwitchVideosClipsIE,
TwitchPastBroadcastsIE, TwitchVideosCollectionsIE,
TwitchHighlightsIE,
TwitchStreamIE, TwitchStreamIE,
TwitchClipsIE, TwitchClipsIE,
) )
@@ -1316,7 +1399,6 @@ from .vidme import (
VidmeUserIE, VidmeUserIE,
VidmeUserLikesIE, VidmeUserLikesIE,
) )
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE from .vier import VierIE, VierVideosIE
from .viewlift import ( from .viewlift import (
ViewLiftIE, ViewLiftIE,
@@ -1357,8 +1439,8 @@ from .vk import (
) )
from .vlive import ( from .vlive import (
VLiveIE, VLiveIE,
VLivePostIE,
VLiveChannelIE, VLiveChannelIE,
VLivePlaylistIE
) )
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
from .vodpl import VODPlIE from .vodpl import VODPlIE
@@ -1376,10 +1458,14 @@ from .vrv import (
VRVSeriesIE, VRVSeriesIE,
) )
from .vshare import VShareIE from .vshare import VShareIE
from .vtm import VTMIE
from .medialaan import MedialaanIE from .medialaan import MedialaanIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE from .vvvvid import (
VVVVIDIE,
VVVVIDShowIE,
)
from .vyborymos import VyboryMosIE from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .wakanim import WakanimIE from .wakanim import WakanimIE
@@ -1410,7 +1496,10 @@ from .weibo import (
WeiboMobileIE WeiboMobileIE
) )
from .weiqitv import WeiqiTVIE from .weiqitv import WeiqiTVIE
from .wistia import WistiaIE from .wistia import (
WistiaIE,
WistiaPlaylistIE,
)
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .wsj import ( from .wsj import (
WSJIE, WSJIE,
@@ -1454,6 +1543,8 @@ from .yandexmusic import (
YandexMusicTrackIE, YandexMusicTrackIE,
YandexMusicAlbumIE, YandexMusicAlbumIE,
YandexMusicPlaylistIE, YandexMusicPlaylistIE,
YandexMusicArtistTracksIE,
YandexMusicArtistAlbumsIE,
) )
from .yandexvideo import YandexVideoIE from .yandexvideo import YandexVideoIE
from .yapfiles import YapFilesIE from .yapfiles import YapFilesIE
@@ -1475,25 +1566,22 @@ from .yourporn import YourPornIE
from .yourupload import YourUploadIE from .yourupload import YourUploadIE
from .youtube import ( from .youtube import (
YoutubeIE, YoutubeIE,
YoutubeChannelIE,
YoutubeFavouritesIE, YoutubeFavouritesIE,
YoutubeHistoryIE, YoutubeHistoryIE,
YoutubeLiveIE, YoutubeTabIE,
YoutubePlaylistIE, YoutubePlaylistIE,
YoutubePlaylistsIE,
YoutubeRecommendedIE, YoutubeRecommendedIE,
YoutubeSearchDateIE, YoutubeSearchDateIE,
YoutubeSearchIE, YoutubeSearchIE,
YoutubeSearchURLIE, #YoutubeSearchURLIE,
YoutubeShowIE,
YoutubeSubscriptionsIE, YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE, YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE, YoutubeTruncatedURLIE,
YoutubeUserIE, YoutubeYtBeIE,
YoutubeYtUserIE,
YoutubeWatchLaterIE, YoutubeWatchLaterIE,
) )
from .zapiks import ZapiksIE from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zattoo import ( from .zattoo import (
BBVTVIE, BBVTVIE,
EinsUndEinsTVIE, EinsUndEinsTVIE,

View File

@@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
import socket import socket
@@ -8,6 +9,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_http_client, compat_http_client,
compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
@@ -16,14 +18,17 @@ from ..utils import (
clean_html, clean_html,
error_to_compat_str, error_to_compat_str,
ExtractorError, ExtractorError,
float_or_none,
get_element_by_id, get_element_by_id,
int_or_none, int_or_none,
js_to_json, js_to_json,
limit_length, limit_length,
parse_count, parse_count,
qualities,
sanitized_Request, sanitized_Request,
try_get, try_get,
urlencode_postdata, urlencode_postdata,
urljoin,
) )
@@ -39,11 +44,13 @@ class FacebookIE(InfoExtractor):
photo\.php| photo\.php|
video\.php| video\.php|
video/embed| video/embed|
story\.php story\.php|
watch(?:/live)?/?
)\?(?:.*?)(?:v|video_id|story_fbid)=| )\?(?:.*?)(?:v|video_id|story_fbid)=|
[^/]+/videos/(?:[^/]+/)?| [^/]+/videos/(?:[^/]+/)?|
[^/]+/posts/| [^/]+/posts/|
groups/[^/]+/permalink/ groups/[^/]+/permalink/|
watchparty/
)| )|
facebook: facebook:
) )
@@ -54,8 +61,6 @@ class FacebookIE(InfoExtractor):
_NETRC_MACHINE = 'facebook' _NETRC_MACHINE = 'facebook'
IE_NAME = 'facebook' IE_NAME = 'facebook'
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s' _VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary' _VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary'
@@ -72,6 +77,7 @@ class FacebookIE(InfoExtractor):
}, },
'skip': 'Requires logging in', 'skip': 'Requires logging in',
}, { }, {
# data.video
'url': 'https://www.facebook.com/video.php?v=274175099429670', 'url': 'https://www.facebook.com/video.php?v=274175099429670',
'info_dict': { 'info_dict': {
'id': '274175099429670', 'id': '274175099429670',
@@ -133,6 +139,7 @@ class FacebookIE(InfoExtractor):
}, },
}, { }, {
# have 1080P, but only up to 720p in swf params # have 1080P, but only up to 720p in swf params
# data.video.story.attachments[].media
'url': 'https://www.facebook.com/cnn/videos/10155529876156509/', 'url': 'https://www.facebook.com/cnn/videos/10155529876156509/',
'md5': '9571fae53d4165bbbadb17a94651dcdc', 'md5': '9571fae53d4165bbbadb17a94651dcdc',
'info_dict': { 'info_dict': {
@@ -147,6 +154,7 @@ class FacebookIE(InfoExtractor):
}, },
}, { }, {
# bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall # bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://www.facebook.com/yaroslav.korpan/videos/1417995061575415/', 'url': 'https://www.facebook.com/yaroslav.korpan/videos/1417995061575415/',
'info_dict': { 'info_dict': {
'id': '1417995061575415', 'id': '1417995061575415',
@@ -174,6 +182,7 @@ class FacebookIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://www.facebook.com/groups/1024490957622648/permalink/1396382447100162/', 'url': 'https://www.facebook.com/groups/1024490957622648/permalink/1396382447100162/',
'info_dict': { 'info_dict': {
'id': '1396382447100162', 'id': '1396382447100162',
@@ -193,18 +202,23 @@ class FacebookIE(InfoExtractor):
'url': 'https://www.facebook.com/amogood/videos/1618742068337349/?fref=nf', 'url': 'https://www.facebook.com/amogood/videos/1618742068337349/?fref=nf',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.mediaset.currMedia.edges
'url': 'https://www.facebook.com/ChristyClarkForBC/videos/vb.22819070941/10153870694020942/?type=2&theater', 'url': 'https://www.facebook.com/ChristyClarkForBC/videos/vb.22819070941/10153870694020942/?type=2&theater',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video.story.attachments[].media
'url': 'facebook:544765982287235', 'url': 'facebook:544765982287235',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/', 'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video.creation_story.attachments[].media
'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/', 'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video
'url': 'https://www.facebookcorewwwi.onion/video.php?v=274175099429670', 'url': 'https://www.facebookcorewwwi.onion/video.php?v=274175099429670',
'only_matching': True, 'only_matching': True,
}, { }, {
@@ -212,6 +226,7 @@ class FacebookIE(InfoExtractor):
'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/', 'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video
'url': 'https://www.facebook.com/WatchESLOne/videos/359649331226507/', 'url': 'https://www.facebook.com/WatchESLOne/videos/359649331226507/',
'info_dict': { 'info_dict': {
'id': '359649331226507', 'id': '359649331226507',
@@ -222,7 +237,64 @@ class FacebookIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.all_subattachments.nodes[].media
'url': 'https://www.facebook.com/100033620354545/videos/106560053808006/',
'info_dict': {
'id': '106560053808006',
},
'playlist_count': 2,
}, {
# data.video.story.attachments[].media
'url': 'https://www.facebook.com/watch/?v=647537299265662',
'only_matching': True,
}, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.all_subattachments.nodes[].media
'url': 'https://www.facebook.com/PankajShahLondon/posts/10157667649866271',
'info_dict': {
'id': '10157667649866271',
},
'playlist_count': 3,
}, {
# data.nodes[].comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://m.facebook.com/Alliance.Police.Department/posts/4048563708499330',
'info_dict': {
'id': '117576630041613',
'ext': 'mp4',
# TODO: title can be extracted from video page
'title': 'Facebook video #117576630041613',
'uploader_id': '189393014416438',
'upload_date': '20201123',
'timestamp': 1606162592,
},
'skip': 'Requires logging in',
}, {
# node.comet_sections.content.story.attached_story.attachments.style_type_renderer.attachment.media
'url': 'https://www.facebook.com/groups/ateistiskselskab/permalink/10154930137678856/',
'info_dict': {
'id': '211567722618337',
'ext': 'mp4',
'title': 'Facebook video #211567722618337',
'uploader_id': '127875227654254',
'upload_date': '20161122',
'timestamp': 1479793574,
},
}, {
# data.video.creation_story.attachments[].media
'url': 'https://www.facebook.com/watch/live/?v=1823658634322275',
'only_matching': True,
}, {
'url': 'https://www.facebook.com/watchparty/211641140192478',
'info_dict': {
'id': '211641140192478',
},
'playlist_count': 1,
'skip': 'Requires logging in',
}] }]
_SUPPORTED_PAGLETS_REGEX = r'(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_[0-9a-f]+)'
_api_config = {
'graphURI': '/api/graphql/'
}
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@@ -305,23 +377,24 @@ class FacebookIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _extract_from_url(self, url, video_id, fatal_if_no_video=True): def _extract_from_url(self, url, video_id):
req = sanitized_Request(url) webpage = self._download_webpage(
req.add_header('User-Agent', self._CHROME_USER_AGENT) url.replace('://m.facebook.com/', '://www.facebook.com/'), video_id)
webpage = self._download_webpage(req, video_id)
video_data = None video_data = None
def extract_video_data(instances): def extract_video_data(instances):
video_data = []
for item in instances: for item in instances:
if item[1][0] == 'VideoConfig': if try_get(item, lambda x: x[1][0]) == 'VideoConfig':
video_item = item[2][0] video_item = item[2][0]
if video_item.get('video_id'): if video_item.get('video_id'):
return video_item['videoData'] video_data.append(video_item['videoData'])
return video_data
server_js_data = self._parse_json(self._search_regex( server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})(?:\);|,")', webpage, [r'handleServerJS\(({.+})(?:\);|,")', r'\bs\.handle\(({.+?})\);'],
'server js data', default='{}'), video_id, fatal=False) webpage, 'server js data', default='{}'), video_id, fatal=False)
if server_js_data: if server_js_data:
video_data = extract_video_data(server_js_data.get('instances', [])) video_data = extract_video_data(server_js_data.get('instances', []))
@@ -331,17 +404,118 @@ class FacebookIE(InfoExtractor):
return extract_video_data(try_get( return extract_video_data(try_get(
js_data, lambda x: x['jsmods']['instances'], list) or []) js_data, lambda x: x['jsmods']['instances'], list) or [])
def extract_dash_manifest(video, formats):
dash_manifest = video.get('dash_manifest')
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
def process_formats(formats):
# Downloads with browser's User-Agent are rate limited. Working around
# with non-browser User-Agent.
for f in formats:
f.setdefault('http_headers', {})['User-Agent'] = 'facebookexternalhit/1.1'
self._sort_formats(formats)
def extract_relay_data(_filter):
return self._parse_json(self._search_regex(
r'handleWithCustomApplyEach\([^,]+,\s*({.*?%s.*?})\);' % _filter,
webpage, 'replay data', default='{}'), video_id, fatal=False) or {}
def extract_relay_prefetched_data(_filter):
replay_data = extract_relay_data(_filter)
for require in (replay_data.get('require') or []):
if require[0] == 'RelayPrefetchedStreamCache':
return try_get(require, lambda x: x[3][1]['__bbox']['result']['data'], dict) or {}
if not video_data: if not video_data:
server_js_data = self._parse_json( server_js_data = self._parse_json(self._search_regex([
self._search_regex( r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+' + self._SUPPORTED_PAGLETS_REGEX,
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_\d+)', r'bigPipe\.onPageletArrive\(({.*?id\s*:\s*"%s".*?})\);' % self._SUPPORTED_PAGLETS_REGEX
webpage, 'js data', default='{}'), ], webpage, 'js data', default='{}'), video_id, js_to_json, False)
video_id, transform_source=js_to_json, fatal=False)
video_data = extract_from_jsmods_instances(server_js_data) video_data = extract_from_jsmods_instances(server_js_data)
if not video_data: if not video_data:
if not fatal_if_no_video: data = extract_relay_prefetched_data(
return webpage, False r'"(?:dash_manifest|playable_url(?:_quality_hd)?)"\s*:\s*"[^"]+"')
if data:
entries = []
def parse_graphql_video(video):
formats = []
q = qualities(['sd', 'hd'])
for (suffix, format_id) in [('', 'sd'), ('_quality_hd', 'hd')]:
playable_url = video.get('playable_url' + suffix)
if not playable_url:
continue
formats.append({
'format_id': format_id,
'quality': q(format_id),
'url': playable_url,
})
extract_dash_manifest(video, formats)
process_formats(formats)
v_id = video.get('videoId') or video.get('id') or video_id
info = {
'id': v_id,
'formats': formats,
'thumbnail': try_get(video, lambda x: x['thumbnailImage']['uri']),
'uploader_id': try_get(video, lambda x: x['owner']['id']),
'timestamp': int_or_none(video.get('publish_time')),
'duration': float_or_none(video.get('playable_duration_in_ms'), 1000),
}
description = try_get(video, lambda x: x['savable_description']['text'])
title = video.get('name')
if title:
info.update({
'title': title,
'description': description,
})
else:
info['title'] = description or 'Facebook video #%s' % v_id
entries.append(info)
def parse_attachment(attachment, key='media'):
media = attachment.get(key) or {}
if media.get('__typename') == 'Video':
return parse_graphql_video(media)
nodes = data.get('nodes') or []
node = data.get('node') or {}
if not nodes and node:
nodes.append(node)
for node in nodes:
story = try_get(node, lambda x: x['comet_sections']['content']['story'], dict) or {}
attachments = try_get(story, [
lambda x: x['attached_story']['attachments'],
lambda x: x['attachments']
], list) or []
for attachment in attachments:
attachment = try_get(attachment, lambda x: x['style_type_renderer']['attachment'], dict)
ns = try_get(attachment, lambda x: x['all_subattachments']['nodes'], list) or []
for n in ns:
parse_attachment(n)
parse_attachment(attachment)
edges = try_get(data, lambda x: x['mediaset']['currMedia']['edges'], list) or []
for edge in edges:
parse_attachment(edge, key='node')
video = data.get('video') or {}
if video:
attachments = try_get(video, [
lambda x: x['story']['attachments'],
lambda x: x['creation_story']['attachments']
], list) or []
for attachment in attachments:
parse_attachment(attachment)
if not entries:
parse_graphql_video(video)
return self.playlist_result(entries, video_id)
if not video_data:
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage) m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None: if m_msg is not None:
raise ExtractorError( raise ExtractorError(
@@ -350,6 +524,43 @@ class FacebookIE(InfoExtractor):
elif '>You must log in to continue' in webpage: elif '>You must log in to continue' in webpage:
self.raise_login_required() self.raise_login_required()
if not video_data and '/watchparty/' in url:
post_data = {
'doc_id': 3731964053542869,
'variables': json.dumps({
'livingRoomID': video_id,
}),
}
prefetched_data = extract_relay_prefetched_data(r'"login_data"\s*:\s*{')
if prefetched_data:
lsd = try_get(prefetched_data, lambda x: x['login_data']['lsd'], dict)
if lsd:
post_data[lsd['name']] = lsd['value']
relay_data = extract_relay_data(r'\[\s*"RelayAPIConfigDefaults"\s*,')
for define in (relay_data.get('define') or []):
if define[0] == 'RelayAPIConfigDefaults':
self._api_config = define[2]
living_room = self._download_json(
urljoin(url, self._api_config['graphURI']), video_id,
data=urlencode_postdata(post_data))['data']['living_room']
entries = []
for edge in (try_get(living_room, lambda x: x['recap']['watched_content']['edges']) or []):
video = try_get(edge, lambda x: x['node']['video']) or {}
v_id = video.get('id')
if not v_id:
continue
v_id = compat_str(v_id)
entries.append(self.url_result(
self._VIDEO_PAGE_TEMPLATE % v_id,
self.ie_key(), v_id, video.get('name')))
return self.playlist_result(entries, video_id)
if not video_data:
# Video info not in first request, do a secondary request using # Video info not in first request, do a secondary request using
# tahoe player specific URL # tahoe player specific URL
tahoe_data = self._download_webpage( tahoe_data = self._download_webpage(
@@ -379,8 +590,19 @@ class FacebookIE(InfoExtractor):
if not video_data: if not video_data:
raise ExtractorError('Cannot parse data') raise ExtractorError('Cannot parse data')
subtitles = {} if len(video_data) > 1:
entries = []
for v in video_data:
video_url = v[0].get('video_url')
if not video_url:
continue
entries.append(self.url_result(urljoin(
url, video_url), self.ie_key(), v[0].get('video_id')))
return self.playlist_result(entries, video_id)
video_data = video_data[0]
formats = [] formats = []
subtitles = {}
for f in video_data: for f in video_data:
format_id = f['stream_type'] format_id = f['stream_type']
if f and isinstance(f, dict): if f and isinstance(f, dict):
@@ -399,22 +621,14 @@ class FacebookIE(InfoExtractor):
'url': src, 'url': src,
'preference': preference, 'preference': preference,
}) })
dash_manifest = f[0].get('dash_manifest') extract_dash_manifest(f[0], formats)
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
subtitles_src = f[0].get('subtitles_src') subtitles_src = f[0].get('subtitles_src')
if subtitles_src: if subtitles_src:
subtitles.setdefault('en', []).append({'url': subtitles_src}) subtitles.setdefault('en', []).append({'url': subtitles_src})
if not formats: if not formats:
raise ExtractorError('Cannot find video formats') raise ExtractorError('Cannot find video formats')
# Downloads with browser's User-Agent are rate limited. Working around process_formats(formats)
# with non-browser User-Agent.
for f in formats:
f.setdefault('http_headers', {})['User-Agent'] = 'facebookexternalhit/1.1'
self._sort_formats(formats)
video_title = self._html_search_regex( video_title = self._html_search_regex(
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage, r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage,
@@ -454,35 +668,13 @@ class FacebookIE(InfoExtractor):
'subtitles': subtitles, 'subtitles': subtitles,
} }
return webpage, info_dict return info_dict
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
real_url = self._VIDEO_PAGE_TEMPLATE % video_id if url.startswith('facebook:') else url real_url = self._VIDEO_PAGE_TEMPLATE % video_id if url.startswith('facebook:') else url
webpage, info_dict = self._extract_from_url(real_url, video_id, fatal_if_no_video=False) return self._extract_from_url(real_url, video_id)
if info_dict:
return info_dict
if '/posts/' in url:
video_id_json = self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
default='')
if video_id_json:
entries = [
self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
for vid in self._parse_json(video_id_json, video_id)]
return self.playlist_result(entries, video_id)
# Single Video?
video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
else:
_, info_dict = self._extract_from_url(
self._VIDEO_PAGE_TEMPLATE % video_id,
video_id, fatal_if_no_video=True)
return info_dict
class FacebookPluginsVideoIE(InfoExtractor): class FacebookPluginsVideoIE(InfoExtractor):

View File

@@ -11,7 +11,7 @@ from ..utils import (
class FranceCultureIE(InfoExtractor): class FranceCultureIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emissions/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emissions/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'http://www.franceculture.fr/emissions/carnet-nomade/rendez-vous-au-pays-des-geeks', 'url': 'http://www.franceculture.fr/emissions/carnet-nomade/rendez-vous-au-pays-des-geeks',
'info_dict': { 'info_dict': {
'id': 'rendez-vous-au-pays-des-geeks', 'id': 'rendez-vous-au-pays-des-geeks',
@@ -20,10 +20,14 @@ class FranceCultureIE(InfoExtractor):
'title': 'Rendez-vous au pays des geeks', 'title': 'Rendez-vous au pays des geeks',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20140301', 'upload_date': '20140301',
'timestamp': 1393642916, 'timestamp': 1393700400,
'vcodec': 'none', 'vcodec': 'none',
} }
} }, {
# no thumbnail
'url': 'https://www.franceculture.fr/emissions/la-recherche-montre-en-main/la-recherche-montre-en-main-du-mercredi-10-octobre-2018',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@@ -36,19 +40,19 @@ class FranceCultureIE(InfoExtractor):
</h1>| </h1>|
<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*> <div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
).*? ).*?
(<button[^>]+data-asset-source="[^"]+"[^>]+>) (<button[^>]+data-(?:url|asset-source)="[^"]+"[^>]+>)
''', ''',
webpage, 'video data')) webpage, 'video data'))
video_url = video_data['data-asset-source'] video_url = video_data.get('data-url') or video_data['data-asset-source']
title = video_data.get('data-asset-title') or self._og_search_title(webpage) title = video_data.get('data-asset-title') or video_data.get('data-diffusion-title') or self._og_search_title(webpage)
description = self._html_search_regex( description = self._html_search_regex(
r'(?s)<div[^>]+class="intro"[^>]*>.*?<h2>(.+?)</h2>', r'(?s)<div[^>]+class="intro"[^>]*>.*?<h2>(.+?)</h2>',
webpage, 'description', default=None) webpage, 'description', default=None)
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'(?s)<figure[^>]+itemtype="https://schema.org/ImageObject"[^>]*>.*?<img[^>]+(?:data-dejavu-)?src="([^"]+)"', r'(?s)<figure[^>]+itemtype="https://schema.org/ImageObject"[^>]*>.*?<img[^>]+(?:data-dejavu-)?src="([^"]+)"',
webpage, 'thumbnail', fatal=False) webpage, 'thumbnail', default=None)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'(?s)<span class="author">(.*?)</span>', r'(?s)<span class="author">(.*?)</span>',
webpage, 'uploader', default=None) webpage, 'uploader', default=None)
@@ -64,6 +68,6 @@ class FranceCultureIE(InfoExtractor):
'ext': ext, 'ext': ext,
'vcodec': 'none' if ext == 'mp3' else None, 'vcodec': 'none' if ext == 'mp3' else None,
'uploader': uploader, 'uploader': uploader,
'timestamp': int_or_none(video_data.get('data-asset-created-date')), 'timestamp': int_or_none(video_data.get('data-start-time')) or int_or_none(video_data.get('data-asset-created-date')),
'duration': int_or_none(video_data.get('data-duration')), 'duration': int_or_none(video_data.get('data-duration')),
} }

View File

@@ -16,6 +16,7 @@ class FranceInterIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'title': 'Affaire Cahuzac : le contentieux du compte en Suisse', 'title': 'Affaire Cahuzac : le contentieux du compte en Suisse',
'description': 'md5:401969c5d318c061f86bda1fa359292b', 'description': 'md5:401969c5d318c061f86bda1fa359292b',
'thumbnail': r're:^https?://.*\.jpg',
'upload_date': '20160907', 'upload_date': '20160907',
}, },
} }
@@ -31,6 +32,7 @@ class FranceInterIE(InfoExtractor):
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
thumbnail = self._html_search_meta(['og:image', 'twitter:image'], webpage)
upload_date_str = self._search_regex( upload_date_str = self._search_regex(
r'class=["\']\s*cover-emission-period\s*["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<', r'class=["\']\s*cover-emission-period\s*["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
@@ -48,6 +50,7 @@ class FranceInterIE(InfoExtractor):
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date, 'upload_date': upload_date,
'formats': [{ 'formats': [{
'url': video_url, 'url': video_url,

View File

@@ -17,6 +17,7 @@ from ..utils import (
parse_duration, parse_duration,
try_get, try_get,
url_or_none, url_or_none,
urljoin,
) )
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
@@ -128,18 +129,38 @@ class FranceTVIE(InfoExtractor):
is_live = None is_live = None
formats = [] videos = []
for video in info['videos']:
if video['statut'] != 'ONLINE': for video in (info.get('videos') or []):
if video.get('statut') != 'ONLINE':
continue continue
video_url = video['url'] if not video.get('url'):
continue
videos.append(video)
if not videos:
for device_type in ['desktop', 'mobile']:
fallback_info = self._download_json(
'https://player.webservices.francetelevisions.fr/v1/videos/%s' % video_id,
video_id, 'Downloading fallback %s video JSON' % device_type, query={
'device_type': device_type,
'browser': 'chrome',
}, fatal=False)
if fallback_info and fallback_info.get('video'):
videos.append(fallback_info['video'])
formats = []
for video in videos:
video_url = video.get('url')
if not video_url: if not video_url:
continue continue
if is_live is None: if is_live is None:
is_live = (try_get( is_live = (try_get(
video, lambda x: x['plages_ouverture'][0]['direct'], video, lambda x: x['plages_ouverture'][0]['direct'], bool) is True
bool) is True) or '/live.francetv.fr/' in video_url or video.get('is_live') is True
format_id = video['format'] or '/live.francetv.fr/' in video_url)
format_id = video.get('format')
ext = determine_ext(video_url) ext = determine_ext(video_url)
if ext == 'f4m': if ext == 'f4m':
if georestricted: if georestricted:
@@ -154,6 +175,9 @@ class FranceTVIE(InfoExtractor):
sign(video_url, format_id), video_id, 'mp4', sign(video_url, format_id), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id=format_id, entry_protocol='m3u8_native', m3u8_id=format_id,
fatal=False)) fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
sign(video_url, format_id), video_id, mpd_id=format_id, fatal=False))
elif video_url.startswith('rtmp'): elif video_url.startswith('rtmp'):
formats.append({ formats.append({
'url': video_url, 'url': video_url,
@@ -166,6 +190,7 @@ class FranceTVIE(InfoExtractor):
'url': video_url, 'url': video_url,
'format_id': format_id, 'format_id': format_id,
}) })
self._sort_formats(formats) self._sort_formats(formats)
title = info['titre'] title = info['titre']
@@ -185,10 +210,10 @@ class FranceTVIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': self._live_title(title) if is_live else title, 'title': self._live_title(title) if is_live else title,
'description': clean_html(info['synopsis']), 'description': clean_html(info.get('synopsis')),
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', info['image']), 'thumbnail': urljoin('https://sivideo.webservices.francetelevisions.fr', info.get('image')),
'duration': int_or_none(info.get('real_duration')) or parse_duration(info['duree']), 'duration': int_or_none(info.get('real_duration')) or parse_duration(info.get('duree')),
'timestamp': int_or_none(info['diffusion']['timestamp']), 'timestamp': int_or_none(try_get(info, lambda x: x['diffusion']['timestamp'])),
'is_live': is_live, 'is_live': is_live,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,

View File

@@ -0,0 +1,35 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class FujiTVFODPlus7IE(InfoExtractor):
_VALID_URL = r'https?://i\.fod\.fujitv\.co\.jp/plus7/web/[0-9a-z]{4}/(?P<id>[0-9a-z]+)'
_BASE_URL = 'http://i.fod.fujitv.co.jp/'
_BITRATE_MAP = {
300: (320, 180),
800: (640, 360),
1200: (1280, 720),
2000: (1280, 720),
}
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id)
for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh:
f.update({
'width': wh[0],
'height': wh[1],
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'formats': formats,
'thumbnail': self._BASE_URL + 'pc/image/wbtn/wbtn_%s.jpg' % video_id,
}

View File

@@ -1,77 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .adobepass import AdobePassIE
from ..utils import (
extract_attributes,
int_or_none,
parse_age_limit,
smuggle_url,
update_url_query,
)
class FXNetworksIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?(?:fxnetworks|simpsonsworld)\.com/video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.fxnetworks.com/video/1032565827847',
'md5': '8d99b97b4aa7a202f55b6ed47ea7e703',
'info_dict': {
'id': 'dRzwHC_MMqIv',
'ext': 'mp4',
'title': 'First Look: Better Things - Season 2',
'description': 'Because real life is like a fart. Watch this FIRST LOOK to see what inspired the new season of Better Things.',
'age_limit': 14,
'uploader': 'NEWA-FNG-FX',
'upload_date': '20170825',
'timestamp': 1503686274,
'episode_number': 0,
'season_number': 2,
'series': 'Better Things',
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.simpsonsworld.com/video/716094019682',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if 'The content you are trying to access is not available in your region.' in webpage:
self.raise_geo_restricted()
video_data = extract_attributes(self._search_regex(
r'(<a.+?rel="https?://link\.theplatform\.com/s/.+?</a>)', webpage, 'video data'))
player_type = self._search_regex(r'playerType\s*=\s*[\'"]([^\'"]+)', webpage, 'player type', default=None)
release_url = video_data['rel']
title = video_data['data-title']
rating = video_data.get('data-rating')
query = {
'mbr': 'true',
}
if player_type == 'movies':
query.update({
'manifest': 'm3u',
})
else:
query.update({
'switch': 'http',
})
if video_data.get('data-req-auth') == '1':
resource = self._get_mvpd_resource(
video_data['data-channel'], title,
video_data.get('data-guid'), rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fx', resource)
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'series': video_data.get('data-show-title'),
'episode_number': int_or_none(video_data.get('data-episode')),
'season_number': int_or_none(video_data.get('data-season')),
'thumbnail': video_data.get('data-large-thumb'),
'age_limit': parse_age_limit(rating),
'ie_key': 'ThePlatform',
}

View File

@@ -1,16 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .once import OnceIE from .once import OnceIE
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_urllib_parse_unquote,
)
from ..utils import (
unescapeHTML,
url_basename,
dict_get,
)
class GameSpotIE(OnceIE): class GameSpotIE(OnceIE):
@@ -24,17 +15,16 @@ class GameSpotIE(OnceIE):
'title': 'Arma 3 - Community Guide: SITREP I', 'title': 'Arma 3 - Community Guide: SITREP I',
'description': 'Check out this video where some of the basics of Arma 3 is explained.', 'description': 'Check out this video where some of the basics of Arma 3 is explained.',
}, },
'skip': 'manifest URL give HTTP Error 404: Not Found',
}, { }, {
'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/', 'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/',
'md5': '173ea87ad762cf5d3bf6163dceb255a6',
'info_dict': { 'info_dict': {
'id': 'gs-2300-6424837', 'id': 'gs-2300-6424837',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Now Playing - The Witcher 3: Wild Hunt', 'title': 'Now Playing - The Witcher 3: Wild Hunt',
'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.', 'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.',
}, },
'params': {
'skip_download': True, # m3u8 downloads
},
}, { }, {
'url': 'https://www.gamespot.com/videos/embed/6439218/', 'url': 'https://www.gamespot.com/videos/embed/6439218/',
'only_matching': True, 'only_matching': True,
@@ -49,90 +39,40 @@ class GameSpotIE(OnceIE):
def _real_extract(self, url): def _real_extract(self, url):
page_id = self._match_id(url) page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
data_video_json = self._search_regex( data_video = self._parse_json(self._html_search_regex(
r'data-video=["\'](.*?)["\']', webpage, 'data video') r'data-video=(["\'])({.*?})\1', webpage,
data_video = self._parse_json(unescapeHTML(data_video_json), page_id) 'video data', group=2), page_id)
title = compat_urllib_parse_unquote(data_video['title'])
streams = data_video['videoStreams'] streams = data_video['videoStreams']
manifest_url = None
formats = [] formats = []
f4m_url = streams.get('f4m_stream')
if f4m_url: m3u8_url = streams.get('adaptive_stream')
manifest_url = f4m_url
formats.extend(self._extract_f4m_formats(
f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
m3u8_url = dict_get(streams, ('m3u8_stream', 'adaptive_stream'))
if m3u8_url: if m3u8_url:
manifest_url = m3u8_url
m3u8_formats = self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
m3u8_url, page_id, 'mp4', 'm3u8_native', m3u8_url, page_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False) m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats) for f in m3u8_formats:
progressive_url = dict_get( formats.append(f)
streams, ('progressive_hd', 'progressive_high', 'progressive_low', 'other_lr')) http_f = f.copy()
if progressive_url and manifest_url: del http_f['manifest_url']
qualities_basename = self._search_regex( http_f.update({
r'/([^/]+)\.csmil/', 'format_id': f['format_id'].replace('hls-', 'http-'),
manifest_url, 'qualities basename', default=None) 'protocol': 'http',
if qualities_basename: 'url': f['url'].replace('.m3u8', '.mp4'),
QUALITIES_RE = r'((,\d+)+,?)' })
qualities = self._search_regex( formats.append(http_f)
QUALITIES_RE, qualities_basename,
'qualities', default=None)
if qualities:
qualities = list(map(lambda q: int(q), qualities.strip(',').split(',')))
qualities.sort()
http_template = re.sub(QUALITIES_RE, r'%d', qualities_basename)
http_url_basename = url_basename(progressive_url)
if m3u8_formats:
self._sort_formats(m3u8_formats)
m3u8_formats = list(filter(
lambda f: f.get('vcodec') != 'none', m3u8_formats))
if len(qualities) == len(m3u8_formats):
for q, m3u8_format in zip(qualities, m3u8_formats):
f = m3u8_format.copy()
f.update({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(f)
else:
for q in qualities:
formats.append({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'ext': 'mp4',
'format_id': 'http-%d' % q,
'tbr': q,
})
onceux_json = self._search_regex( mpd_url = streams.get('adaptive_dash')
r'data-onceux-options=["\'](.*?)["\']', webpage, 'data video', default=None) if mpd_url:
if onceux_json: formats.extend(self._extract_mpd_formats(
onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri') mpd_url, page_id, mpd_id='dash', fatal=False))
if onceux_url:
formats.extend(self._extract_once_formats(re.sub(
r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url),
http_formats_preference=-1))
if not formats:
for quality in ['sd', 'hd']:
# It's actually a link to a flv file
flv_url = streams.get('f4m_{0}'.format(quality))
if flv_url is not None:
formats.append({
'url': flv_url,
'ext': 'flv',
'format_id': quality,
})
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': data_video['guid'], 'id': data_video.get('guid') or page_id,
'display_id': page_id, 'display_id': page_id,
'title': compat_urllib_parse_unquote(data_video['title']), 'title': title,
'formats': formats, 'formats': formats,
'description': self._html_search_meta('description', webpage), 'description': self._html_search_meta('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),

View File

@@ -20,19 +20,24 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
int_or_none,
is_html, is_html,
js_to_json, js_to_json,
KNOWN_EXTENSIONS, KNOWN_EXTENSIONS,
merge_dicts, merge_dicts,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
parse_duration,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
unescapeHTML, unescapeHTML,
unified_strdate, unified_timestamp,
unsmuggle_url, unsmuggle_url,
UnsupportedError, UnsupportedError,
url_or_none,
xpath_attr,
xpath_text, xpath_text,
xpath_with_ns,
) )
from .commonprotocols import RtmpIE from .commonprotocols import RtmpIE
from .brightcove import ( from .brightcove import (
@@ -48,7 +53,6 @@ from .ooyala import OoyalaIE
from .rutv import RUTVIE from .rutv import RUTVIE
from .tvc import TVCIE from .tvc import TVCIE
from .sportbox import SportBoxIE from .sportbox import SportBoxIE
from .smotri import SmotriIE
from .myvi import MyviIE from .myvi import MyviIE
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .udn import UDNEmbedIE from .udn import UDNEmbedIE
@@ -63,7 +67,10 @@ from .tube8 import Tube8IE
from .mofosex import MofosexEmbedIE from .mofosex import MofosexEmbedIE
from .spankwire import SpankwireIE from .spankwire import SpankwireIE
from .youporn import YouPornIE from .youporn import YouPornIE
from .vimeo import VimeoIE from .vimeo import (
VimeoIE,
VHXEmbedIE,
)
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
@@ -91,6 +98,7 @@ from .piksel import PikselIE
from .videa import VideaIE from .videa import VideaIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .ustream import UstreamIE from .ustream import UstreamIE
from .arte import ArteTVEmbedIE
from .videopress import VideoPressIE from .videopress import VideoPressIE
from .rutube import RutubeIE from .rutube import RutubeIE
from .limelight import LimelightBaseIE from .limelight import LimelightBaseIE
@@ -119,6 +127,8 @@ from .expressen import ExpressenIE
from .zype import ZypeIE from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .kinja import KinjaEmbedIE from .kinja import KinjaEmbedIE
from .arcpublishing import ArcPublishingIE
from .medialaan import MedialaanIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -197,11 +207,48 @@ class GenericIE(InfoExtractor):
{ {
'url': 'http://podcastfeeds.nbcnews.com/audio/podcast/MSNBC-MADDOW-NETCAST-M4V.xml', 'url': 'http://podcastfeeds.nbcnews.com/audio/podcast/MSNBC-MADDOW-NETCAST-M4V.xml',
'info_dict': { 'info_dict': {
'id': 'pdv_maddow_netcast_m4v-02-27-2015-201624', 'id': 'http://podcastfeeds.nbcnews.com/nbcnews/video/podcast/MSNBC-MADDOW-NETCAST-M4V.xml',
'ext': 'm4v', 'title': 'MSNBC Rachel Maddow (video)',
'upload_date': '20150228', 'description': 're:.*her unique approach to storytelling.*',
'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624', },
} 'playlist': [{
'info_dict': {
'ext': 'mov',
'id': 'pdv_maddow_netcast_mov-12-04-2020-224335',
'title': 're:MSNBC Rachel Maddow',
'description': 're:.*her unique approach to storytelling.*',
'timestamp': int,
'upload_date': compat_str,
'duration': float,
},
}],
},
# RSS feed with item with description and thumbnails
{
'url': 'https://anchor.fm/s/dd00e14/podcast/rss',
'info_dict': {
'id': 'https://anchor.fm/s/dd00e14/podcast/rss',
'title': 're:.*100% Hydrogen.*',
'description': 're:.*In this episode.*',
},
'playlist': [{
'info_dict': {
'ext': 'm4a',
'id': 'c1c879525ce2cb640b344507e682c36d',
'title': 're:Hydrogen!',
'description': 're:.*In this episode we are going.*',
'timestamp': 1567977776,
'upload_date': '20190908',
'duration': 459,
'thumbnail': r're:^https?://.*\.jpg$',
'episode_number': 1,
'season_number': 1,
'age_limit': 0,
},
}],
'params': {
'skip_download': True,
},
}, },
# RSS feed with enclosures and unsupported link URLs # RSS feed with enclosures and unsupported link URLs
{ {
@@ -841,7 +888,7 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
} }
}, },
# MTVSercices embed # MTVServices embed
{ {
'url': 'http://www.vulture.com/2016/06/new-key-peele-sketches-released.html', 'url': 'http://www.vulture.com/2016/06/new-key-peele-sketches-released.html',
'md5': 'ca1aef97695ef2c1d6973256a57e5252', 'md5': 'ca1aef97695ef2c1d6973256a57e5252',
@@ -1982,22 +2029,6 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': [SpringboardPlatformIE.ie_key()], 'add_ie': [SpringboardPlatformIE.ie_key()],
}, },
{
'url': 'https://www.youtube.com/shared?ci=1nEzmT-M4fU',
'info_dict': {
'id': 'uPDB5I9wfp8',
'ext': 'webm',
'title': 'Pocoyo: 90 minutos de episódios completos Português para crianças - PARTE 3',
'description': 'md5:d9e4d9346a2dfff4c7dc4c8cec0f546d',
'upload_date': '20160219',
'uploader': 'Pocoyo - Português (BR)',
'uploader_id': 'PocoyoBrazil',
},
'add_ie': [YoutubeIE.ie_key()],
'params': {
'skip_download': True,
},
},
{ {
'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html', 'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html',
'info_dict': { 'info_dict': {
@@ -2102,23 +2133,23 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{ # {
# Zype embed # # Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites', # 'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
'info_dict': { # 'info_dict': {
'id': '5b400b834b32992a310622b9', # 'id': '5b400b834b32992a310622b9',
'ext': 'mp4', # 'ext': 'mp4',
'title': 'Smoky Barbecue Favorites', # 'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g', # 'thumbnail': r're:^https?://.*\.jpe?g',
'description': 'md5:5ff01e76316bd8d46508af26dc86023b', # 'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
'upload_date': '20170909', # 'upload_date': '20170909',
'timestamp': 1504915200, # 'timestamp': 1504915200,
}, # },
'add_ie': [ZypeIE.ie_key()], # 'add_ie': [ZypeIE.ie_key()],
'params': { # 'params': {
'skip_download': True, # 'skip_download': True,
}, # },
}, # },
{ {
# videojs embed # videojs embed
'url': 'https://video.sibnet.ru/shell.php?videoid=3422904', 'url': 'https://video.sibnet.ru/shell.php?videoid=3422904',
@@ -2167,7 +2198,46 @@ class GenericIE(InfoExtractor):
# 'params': { # 'params': {
# 'force_generic_extractor': True, # 'force_generic_extractor': True,
# }, # },
# } # },
{
# VHX Embed
'url': 'https://demo.vhx.tv/category-c/videos/file-example-mp4-480-1-5mg-copy',
'info_dict': {
'id': '858208',
'ext': 'mp4',
'title': 'Untitled',
'uploader_id': 'user80538407',
'uploader': 'OTT Videos',
},
},
{
# ArcPublishing PoWa video player
'url': 'https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/',
'md5': 'b03b2fac8680e1e5a7cc81a5c27e71b3',
'info_dict': {
'id': '8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'ext': 'mp4',
'title': 'Senate candidates wave to voters on Anchorage streets',
'description': 'md5:91f51a6511f090617353dc720318b20e',
'timestamp': 1604378735,
'upload_date': '20201103',
'duration': 1581,
},
},
{
# MyChannels SDK embed
# https://www.24kitchen.nl/populair/deskundige-dit-waarom-sommigen-gevoelig-zijn-voor-voedselallergieen
'url': 'https://www.demorgen.be/nieuws/burgemeester-rotterdam-richt-zich-in-videoboodschap-tot-relschoppers-voelt-het-goed~b0bcfd741/',
'md5': '90c0699c37006ef18e198c032d81739c',
'info_dict': {
'id': '194165',
'ext': 'mp4',
'title': 'Burgemeester Aboutaleb spreekt relschoppers toe',
'timestamp': 1611740340,
'upload_date': '20210127',
'duration': 159,
},
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@@ -2179,6 +2249,10 @@ class GenericIE(InfoExtractor):
playlist_desc_el = doc.find('./channel/description') playlist_desc_el = doc.find('./channel/description')
playlist_desc = None if playlist_desc_el is None else playlist_desc_el.text playlist_desc = None if playlist_desc_el is None else playlist_desc_el.text
NS_MAP = {
'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
}
entries = [] entries = []
for it in doc.findall('./channel/item'): for it in doc.findall('./channel/item'):
next_url = None next_url = None
@@ -2194,10 +2268,33 @@ class GenericIE(InfoExtractor):
if not next_url: if not next_url:
continue continue
def itunes(key):
return xpath_text(
it, xpath_with_ns('./itunes:%s' % key, NS_MAP),
default=None)
duration = itunes('duration')
explicit = (itunes('explicit') or '').lower()
if explicit in ('true', 'yes'):
age_limit = 18
elif explicit in ('false', 'no'):
age_limit = 0
else:
age_limit = None
entries.append({ entries.append({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': next_url, 'url': next_url,
'title': it.find('title').text, 'title': it.find('title').text,
'description': xpath_text(it, 'description', default=None),
'timestamp': unified_timestamp(
xpath_text(it, 'pubDate', default=None)),
'duration': int_or_none(duration) or parse_duration(duration),
'thumbnail': url_or_none(xpath_attr(it, xpath_with_ns('./itunes:image', NS_MAP), 'href')),
'episode': itunes('title'),
'episode_number': int_or_none(itunes('episode')),
'season_number': int_or_none(itunes('season')),
'age_limit': age_limit,
}) })
return { return {
@@ -2317,7 +2414,7 @@ class GenericIE(InfoExtractor):
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
'title': self._generic_title(url), 'title': self._generic_title(url),
'upload_date': unified_strdate(head_response.headers.get('Last-Modified')) 'timestamp': unified_timestamp(head_response.headers.get('Last-Modified'))
} }
# Check for direct link to a video # Check for direct link to a video
@@ -2380,6 +2477,9 @@ class GenericIE(InfoExtractor):
webpage = self._webpage_read_content( webpage = self._webpage_read_content(
full_response, url, video_id, prefix=first_bytes) full_response, url, video_id, prefix=first_bytes)
if '<title>DPG Media Privacy Gate</title>' in webpage:
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id) self.report_extraction(video_id)
# Is it an RSS feed, a SMIL file, an XSPF playlist or a MPD manifest? # Is it an RSS feed, a SMIL file, an XSPF playlist or a MPD manifest?
@@ -2423,7 +2523,9 @@ class GenericIE(InfoExtractor):
# Sometimes embedded video player is hidden behind percent encoding # Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/ytdl-org/youtube-dl/issues/2448) # (e.g. https://github.com/ytdl-org/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way # Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage) # FIXME: unescaping the whole page may break URLs, commenting out for now.
# There probably should be a second run of generic extractor on unescaped webpage.
# webpage = compat_urllib_parse_unquote(webpage)
# Unescape squarespace embeds to be detected by generic extractor, # Unescape squarespace embeds to be detected by generic extractor,
# see https://github.com/ytdl-org/youtube-dl/issues/21294 # see https://github.com/ytdl-org/youtube-dl/issues/21294
@@ -2505,6 +2607,15 @@ class GenericIE(InfoExtractor):
if tp_urls: if tp_urls:
return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform') return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
arc_urls = ArcPublishingIE._extract_urls(webpage)
if arc_urls:
return self.playlist_from_matches(arc_urls, video_id, video_title, ie=ArcPublishingIE.ie_key())
mychannels_urls = MedialaanIE._extract_urls(webpage)
if mychannels_urls:
return self.playlist_from_matches(
mychannels_urls, video_id, video_title, ie=MedialaanIE.ie_key())
# Look for embedded rtl.nl player # Look for embedded rtl.nl player
matches = re.findall( matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"', r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@@ -2516,6 +2627,10 @@ class GenericIE(InfoExtractor):
if vimeo_urls: if vimeo_urls:
return self.playlist_from_matches(vimeo_urls, video_id, video_title, ie=VimeoIE.ie_key()) return self.playlist_from_matches(vimeo_urls, video_id, video_title, ie=VimeoIE.ie_key())
vhx_url = VHXEmbedIE._extract_url(webpage)
if vhx_url:
return self.url_result(vhx_url, VHXEmbedIE.ie_key())
vid_me_embed_url = self._search_regex( vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]', r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',
webpage, 'vid.me embed', default=None) webpage, 'vid.me embed', default=None)
@@ -2760,11 +2875,9 @@ class GenericIE(InfoExtractor):
return self.url_result(ustream_url, UstreamIE.ie_key()) return self.url_result(ustream_url, UstreamIE.ie_key())
# Look for embedded arte.tv player # Look for embedded arte.tv player
mobj = re.search( arte_urls = ArteTVEmbedIE._extract_urls(webpage)
r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"', if arte_urls:
webpage) return self.playlist_from_matches(arte_urls, video_id, video_title)
if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed')
# Look for embedded francetv player # Look for embedded francetv player
mobj = re.search( mobj = re.search(
@@ -2773,11 +2886,6 @@ class GenericIE(InfoExtractor):
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url')) return self.url_result(mobj.group('url'))
# Look for embedded smotri.com player
smotri_url = SmotriIE._extract_url(webpage)
if smotri_url:
return self.url_result(smotri_url, 'Smotri')
# Look for embedded Myvi.ru player # Look for embedded Myvi.ru player
myvi_url = MyviIE._extract_url(webpage) myvi_url = MyviIE._extract_url(webpage)
if myvi_url: if myvi_url:

View File

@@ -38,13 +38,17 @@ class GoIE(AdobePassIE):
'disneynow': { 'disneynow': {
'brand': '011', 'brand': '011',
'resource_id': 'Disney', 'resource_id': 'Disney',
} },
'fxnow.fxnetworks': {
'brand': '025',
'requestor_id': 'dtci',
},
} }
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:(?P<sub_domain>%s)\.)?go| (?:(?P<sub_domain>%s)\.)?go|
(?P<sub_domain_2>abc|freeform|disneynow) (?P<sub_domain_2>abc|freeform|disneynow|fxnow\.fxnetworks)
)\.com/ )\.com/
(?: (?:
(?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)| (?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)|
@@ -99,6 +103,19 @@ class GoIE(AdobePassIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://fxnow.fxnetworks.com/shows/better-things/video/vdka12782841',
'info_dict': {
'id': 'VDKA12782841',
'ext': 'mp4',
'title': 'First Look: Better Things - Season 2',
'description': 'md5:fa73584a95761c605d9d54904e35b407',
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True, 'only_matching': True,

View File

@@ -3,11 +3,14 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_parse_qs
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
get_element_by_class,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
try_get,
update_url_query, update_url_query,
) )
@@ -38,21 +41,10 @@ class GoogleDriveIE(InfoExtractor):
# video can't be watched anonymously due to view count limit reached, # video can't be watched anonymously due to view count limit reached,
# but can be downloaded (see https://github.com/ytdl-org/youtube-dl/issues/14046) # but can be downloaded (see https://github.com/ytdl-org/youtube-dl/issues/14046)
'url': 'https://drive.google.com/file/d/0B-vUyvmDLdWDcEt4WjBqcmI2XzQ/view', 'url': 'https://drive.google.com/file/d/0B-vUyvmDLdWDcEt4WjBqcmI2XzQ/view',
'md5': 'bfbd670d03a470bb1e6d4a257adec12e', 'only_matching': True,
'info_dict': {
'id': '0B-vUyvmDLdWDcEt4WjBqcmI2XzQ',
'ext': 'mp4',
'title': 'Annabelle Creation (2017)- Z.V1 [TH].MP4',
}
}, { }, {
# video id is longer than 28 characters # video id is longer than 28 characters
'url': 'https://drive.google.com/file/d/1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ/edit', 'url': 'https://drive.google.com/file/d/1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ/edit',
'info_dict': {
'id': '1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ',
'ext': 'mp4',
'title': 'Andreea Banica feat Smiley - Hooky Song (Official Video).mp4',
'duration': 189,
},
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://drive.google.com/open?id=0B2fjwgkl1A_CX083Tkowdmt6d28', 'url': 'https://drive.google.com/open?id=0B2fjwgkl1A_CX083Tkowdmt6d28',
@@ -171,23 +163,21 @@ class GoogleDriveIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( video_info = compat_parse_qs(self._download_webpage(
'http://docs.google.com/file/d/%s' % video_id, video_id) 'https://drive.google.com/get_video_info',
video_id, query={'docid': video_id}))
title = self._search_regex( def get_value(key):
r'"title"\s*,\s*"([^"]+)', webpage, 'title', return try_get(video_info, lambda x: x[key][0])
default=None) or self._og_search_title(webpage)
duration = int_or_none(self._search_regex( reason = get_value('reason')
r'"length_seconds"\s*,\s*"([^"]+)', webpage, 'length seconds', title = get_value('title')
default=None)) if not title and reason:
raise ExtractorError(reason, expected=True)
formats = [] formats = []
fmt_stream_map = self._search_regex( fmt_stream_map = (get_value('fmt_stream_map') or '').split(',')
r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage, fmt_list = (get_value('fmt_list') or '').split(',')
'fmt stream map', default='').split(',')
fmt_list = self._search_regex(
r'"fmt_list"\s*,\s*"([^"]+)', webpage,
'fmt_list', default='').split(',')
if fmt_stream_map and fmt_list: if fmt_stream_map and fmt_list:
resolutions = {} resolutions = {}
for fmt in fmt_list: for fmt in fmt_list:
@@ -220,19 +210,27 @@ class GoogleDriveIE(InfoExtractor):
'id': video_id, 'id': video_id,
'export': 'download', 'export': 'download',
}) })
urlh = self._request_webpage(
source_url, video_id, note='Requesting source file', def request_source_file(source_url, kind):
errnote='Unable to request source file', fatal=False) return self._request_webpage(
source_url, video_id, note='Requesting %s file' % kind,
errnote='Unable to request %s file' % kind, fatal=False)
urlh = request_source_file(source_url, 'source')
if urlh: if urlh:
def add_source_format(src_url): def add_source_format(urlh):
formats.append({ formats.append({
'url': src_url, # Use redirect URLs as download URLs in order to calculate
# correct cookies in _calc_cookies.
# Using original URLs may result in redirect loop due to
# google.com's cookies mistakenly used for googleusercontent.com
# redirect URLs (see #23919).
'url': urlh.geturl(),
'ext': determine_ext(title, 'mp4').lower(), 'ext': determine_ext(title, 'mp4').lower(),
'format_id': 'source', 'format_id': 'source',
'quality': 1, 'quality': 1,
}) })
if urlh.headers.get('Content-Disposition'): if urlh.headers.get('Content-Disposition'):
add_source_format(source_url) add_source_format(urlh)
else: else:
confirmation_webpage = self._webpage_read_content( confirmation_webpage = self._webpage_read_content(
urlh, url, video_id, note='Downloading confirmation page', urlh, url, video_id, note='Downloading confirmation page',
@@ -240,25 +238,28 @@ class GoogleDriveIE(InfoExtractor):
if confirmation_webpage: if confirmation_webpage:
confirm = self._search_regex( confirm = self._search_regex(
r'confirm=([^&"\']+)', confirmation_webpage, r'confirm=([^&"\']+)', confirmation_webpage,
'confirmation code', fatal=False) 'confirmation code', default=None)
if confirm: if confirm:
add_source_format(update_url_query(source_url, { confirmed_source_url = update_url_query(source_url, {
'confirm': confirm, 'confirm': confirm,
})) })
urlh = request_source_file(confirmed_source_url, 'confirmed source')
if urlh and urlh.headers.get('Content-Disposition'):
add_source_format(urlh)
else:
self.report_warning(
get_element_by_class('uc-error-subcaption', confirmation_webpage)
or get_element_by_class('uc-error-caption', confirmation_webpage)
or 'unable to extract confirmation code')
if not formats: if not formats and reason:
reason = self._search_regex( raise ExtractorError(reason, expected=True)
r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
if reason:
raise ExtractorError(reason, expected=True)
self._sort_formats(formats) self._sort_formats(formats)
hl = self._search_regex( hl = get_value('hl')
r'"hl"\s*,\s*"([^"]+)', webpage, 'hl', default=None)
subtitles_id = None subtitles_id = None
ttsurl = self._search_regex( ttsurl = get_value('ttsurl')
r'"ttsurl"\s*,\s*"([^"]+)', webpage, 'ttsurl', default=None)
if ttsurl: if ttsurl:
# the video Id for subtitles will be the last value in the ttsurl # the video Id for subtitles will be the last value in the ttsurl
# query string # query string
@@ -268,8 +269,8 @@ class GoogleDriveIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None), 'thumbnail': 'https://drive.google.com/thumbnail?id=' + video_id,
'duration': duration, 'duration': int_or_none(get_value('length_seconds')),
'formats': formats, 'formats': formats,
'subtitles': self.extract_subtitles(video_id, subtitles_id, hl), 'subtitles': self.extract_subtitles(video_id, subtitles_id, hl),
'automatic_captions': self.extract_automatic_captions( 'automatic_captions': self.extract_automatic_captions(

View File

@@ -1,73 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import codecs
from .common import InfoExtractor
from ..utils import unified_strdate
class GooglePlusIE(InfoExtractor):
IE_DESC = 'Google Plus'
_VALID_URL = r'https?://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
IE_NAME = 'plus.google'
_TEST = {
'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH',
'info_dict': {
'id': 'ZButuJc6CtH',
'ext': 'flv',
'title': '嘆きの天使 降臨',
'upload_date': '20120613',
'uploader': '井上ヨシマサ',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# Step 1, Retrieve post webpage to extract further information
webpage = self._download_webpage(url, video_id, 'Downloading entry webpage')
title = self._og_search_description(webpage).splitlines()[0]
upload_date = unified_strdate(self._html_search_regex(
r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, 'upload date', fatal=False, flags=re.VERBOSE))
uploader = self._html_search_regex(
r'rel="author".*?>(.*?)</a>', webpage, 'uploader', fatal=False)
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com/'
video_page = self._search_regex(
r'<a href="((?:%s)?photos/.*?)"' % re.escape(DOMAIN),
webpage, 'video page URL')
if not video_page.startswith(DOMAIN):
video_page = DOMAIN + video_page
webpage = self._download_webpage(video_page, video_id, 'Downloading video page')
def unicode_escape(s):
decoder = codecs.getdecoder('unicode_escape')
return re.sub(
r'\\u[0-9a-fA-F]{4,}',
lambda m: decoder(m.group(0))[0],
s)
# Extract video links all sizes
formats = [{
'url': unicode_escape(video_url),
'ext': 'flv',
'width': int(width),
'height': int(height),
} for width, height, video_url in re.findall(
r'\d+,(\d+),(\d+),"(https?://[^.]+\.googleusercontent\.com.*?)"', webpage)]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'uploader': uploader,
'upload_date': upload_date,
'formats': formats,
}

View File

@@ -0,0 +1,88 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_podcast_url,
int_or_none,
try_get,
urlencode_postdata,
)
class GooglePodcastsBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://podcasts\.google\.com/feed/'
def _batch_execute(self, func_id, video_id, params):
return json.loads(self._download_json(
'https://podcasts.google.com/_/PodcastsUi/data/batchexecute',
video_id, data=urlencode_postdata({
'f.req': json.dumps([[[func_id, json.dumps(params), None, '1']]]),
}), transform_source=lambda x: self._search_regex(r'(?s)(\[.+\])', x, 'data'))[0][2])
def _extract_episode(self, episode):
return {
'id': episode[4][3],
'title': episode[8],
'url': clean_podcast_url(episode[13]),
'thumbnail': episode[2],
'description': episode[9],
'creator': try_get(episode, lambda x: x[14]),
'timestamp': int_or_none(episode[11]),
'duration': int_or_none(episode[12]),
'series': episode[1],
}
class GooglePodcastsIE(GooglePodcastsBaseIE):
IE_NAME = 'google:podcasts'
_VALID_URL = GooglePodcastsBaseIE._VALID_URL_BASE + r'(?P<feed_url>[^/]+)/episode/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5ucHIub3JnLzM0NDA5ODUzOS9wb2RjYXN0LnhtbA/episode/MzBlNWRlN2UtOWE4Yy00ODcwLTk2M2MtM2JlMmUyNmViOTRh',
'md5': 'fa56b2ee8bd0703e27e42d4b104c4766',
'info_dict': {
'id': '30e5de7e-9a8c-4870-963c-3be2e26eb94a',
'ext': 'mp3',
'title': 'WWDTM New Year 2021',
'description': 'We say goodbye to 2020 with Christine Baranksi, Doug Jones, Jonna Mendez, and Kellee Edwards.',
'upload_date': '20210102',
'timestamp': 1609606800,
'duration': 2901,
'series': "Wait Wait... Don't Tell Me!",
}
}
def _real_extract(self, url):
b64_feed_url, b64_guid = re.match(self._VALID_URL, url).groups()
episode = self._batch_execute(
'oNjqVe', b64_guid, [b64_feed_url, b64_guid])[1]
return self._extract_episode(episode)
class GooglePodcastsFeedIE(GooglePodcastsBaseIE):
IE_NAME = 'google:podcasts:feed'
_VALID_URL = GooglePodcastsBaseIE._VALID_URL_BASE + r'(?P<id>[^/?&#]+)/?(?:[?#&]|$)'
_TEST = {
'url': 'https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5ucHIub3JnLzM0NDA5ODUzOS9wb2RjYXN0LnhtbA',
'info_dict': {
'title': "Wait Wait... Don't Tell Me!",
'description': "NPR's weekly current events quiz. Have a laugh and test your news knowledge while figuring out what's real and what we've made up.",
},
'playlist_mincount': 20,
}
def _real_extract(self, url):
b64_feed_url = self._match_id(url)
data = self._batch_execute('ncqJEe', b64_feed_url, [b64_feed_url])
entries = []
for episode in (try_get(data, lambda x: x[1][0]) or []):
entries.append(self._extract_episode(episode))
feed = try_get(data, lambda x: x[3]) or []
return self.playlist_result(
entries, playlist_title=try_get(feed, lambda x: x[0]),
playlist_description=try_get(feed, lambda x: x[2]))

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import hashlib import hashlib
import hmac import hmac
import json
import re import re
import time import time
import uuid import uuid
@@ -25,43 +26,50 @@ from ..utils import (
class HotStarBaseIE(InfoExtractor): class HotStarBaseIE(InfoExtractor):
_AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee' _AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
def _call_api_impl(self, path, video_id, query): def _call_api_impl(self, path, video_id, headers, query, data=None):
st = int(time.time()) st = int(time.time())
exp = st + 6000 exp = st + 6000
auth = 'st=%d~exp=%d~acl=/*' % (st, exp) auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest() auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
response = self._download_json( h = {'hotstarauth': auth}
'https://api.hotstar.com/' + path, video_id, headers={ h.update(headers)
'hotstarauth': auth, return self._download_json(
'x-country-code': 'IN', 'https://api.hotstar.com/' + path,
'x-platform-code': 'JIO', video_id, headers=h, query=query, data=data)
}, query=query)
def _call_api(self, path, video_id, query_name='contentId'):
response = self._call_api_impl(path, video_id, {
'x-country-code': 'IN',
'x-platform-code': 'JIO',
}, {
query_name: video_id,
'tas': 10000,
})
if response['statusCode'] != 'OK': if response['statusCode'] != 'OK':
raise ExtractorError( raise ExtractorError(
response['body']['message'], expected=True) response['body']['message'], expected=True)
return response['body']['results'] return response['body']['results']
def _call_api(self, path, video_id, query_name='contentId'): def _call_api_v2(self, path, video_id, headers, query=None, data=None):
return self._call_api_impl(path, video_id, { h = {'X-Request-Id': compat_str(uuid.uuid4())}
query_name: video_id, h.update(headers)
'tas': 10000, try:
}) return self._call_api_impl(
path, video_id, h, query, data)
def _call_api_v2(self, path, video_id): except ExtractorError as e:
return self._call_api_impl( if isinstance(e.cause, compat_HTTPError):
'%s/in/contents/%s' % (path, video_id), video_id, { if e.cause.code == 402:
'desiredConfig': 'encryption:plain;ladder:phone,tv;package:hls,dash', self.raise_login_required()
'client': 'mweb', message = self._parse_json(e.cause.read().decode(), video_id)['message']
'clientVersion': '6.18.0', if message in ('Content not available in region', 'Country is not supported'):
'deviceId': compat_str(uuid.uuid4()), raise self.raise_geo_restricted(message)
'osName': 'Windows', raise ExtractorError(message)
'osVersion': '10', raise e
})
class HotStarIE(HotStarBaseIE): class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar' IE_NAME = 'hotstar'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})' _VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+[/-])?(?P<id>\d{10})'
_TESTS = [{ _TESTS = [{
# contentData # contentData
'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273', 'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273',
@@ -92,8 +100,13 @@ class HotStarIE(HotStarBaseIE):
# only available via api v2 # only available via api v2
'url': 'https://www.hotstar.com/tv/ek-bhram-sarvagun-sampanna/s-2116/janhvi-targets-suman/1000234847', 'url': 'https://www.hotstar.com/tv/ek-bhram-sarvagun-sampanna/s-2116/janhvi-targets-suman/1000234847',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.hotstar.com/in/tv/start-music/1260005217/cooks-vs-comalis/1100039717',
'only_matching': True,
}] }]
_GEO_BYPASS = False _GEO_BYPASS = False
_DEVICE_ID = None
_USER_TOKEN = None
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -121,7 +134,30 @@ class HotStarIE(HotStarBaseIE):
headers = {'Referer': url} headers = {'Referer': url}
formats = [] formats = []
geo_restricted = False geo_restricted = False
playback_sets = self._call_api_v2('h/v2/play', video_id)['playBackSets']
if not self._USER_TOKEN:
self._DEVICE_ID = compat_str(uuid.uuid4())
self._USER_TOKEN = self._call_api_v2('um/v3/users', video_id, {
'X-HS-Platform': 'PCTV',
'Content-Type': 'application/json',
}, data=json.dumps({
'device_ids': [{
'id': self._DEVICE_ID,
'type': 'device_id',
}],
}).encode())['user_identity']
playback_sets = self._call_api_v2(
'play/v2/playback/content/' + video_id, video_id, {
'X-HS-Platform': 'web',
'X-HS-AppVersion': '6.99.1',
'X-HS-UserToken': self._USER_TOKEN,
}, query={
'device-id': self._DEVICE_ID,
'desired-config': 'encryption:plain',
'os-name': 'Windows',
'os-version': '10',
})['data']['playBackSets']
for playback_set in playback_sets: for playback_set in playback_sets:
if not isinstance(playback_set, dict): if not isinstance(playback_set, dict):
continue continue
@@ -163,19 +199,22 @@ class HotStarIE(HotStarBaseIE):
for f in formats: for f in formats:
f.setdefault('http_headers', {}).update(headers) f.setdefault('http_headers', {}).update(headers)
image = try_get(video_data, lambda x: x['image']['h'], compat_str)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': 'https://img1.hotstarext.com/image/upload/' + image if image else None,
'description': video_data.get('description'), 'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate') or video_data.get('startDate')), 'timestamp': int_or_none(video_data.get('broadcastDate') or video_data.get('startDate')),
'formats': formats, 'formats': formats,
'channel': video_data.get('channelName'), 'channel': video_data.get('channelName'),
'channel_id': video_data.get('channelId'), 'channel_id': str_or_none(video_data.get('channelId')),
'series': video_data.get('showName'), 'series': video_data.get('showName'),
'season': video_data.get('seasonName'), 'season': video_data.get('seasonName'),
'season_number': int_or_none(video_data.get('seasonNo')), 'season_number': int_or_none(video_data.get('seasonNo')),
'season_id': video_data.get('seasonId'), 'season_id': str_or_none(video_data.get('seasonId')),
'episode': title, 'episode': title,
'episode_number': int_or_none(video_data.get('episodeNo')), 'episode_number': int_or_none(video_data.get('episodeNo')),
} }
@@ -183,7 +222,7 @@ class HotStarIE(HotStarBaseIE):
class HotStarPlaylistIE(HotStarBaseIE): class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist' IE_NAME = 'hotstar:playlist'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/tv/[^/]+/s-\w+/list/[^/]+/t-(?P<id>\w+)' _VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:[a-z]{2}/)?tv/[^/]+/s-\w+/list/[^/]+/t-(?P<id>\w+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26', 'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'info_dict': { 'info_dict': {
@@ -193,6 +232,9 @@ class HotStarPlaylistIE(HotStarBaseIE):
}, { }, {
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480', 'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.hotstar.com/us/tv/masterchef-india/s-830/list/episodes/t-1_2_830',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -0,0 +1,97 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
clean_podcast_url,
int_or_none,
str_or_none,
)
class IHeartRadioBaseIE(InfoExtractor):
def _call_api(self, path, video_id, fatal=True, query=None):
return self._download_json(
'https://api.iheart.com/api/v3/podcast/' + path,
video_id, fatal=fatal, query=query)
def _extract_episode(self, episode):
return {
'thumbnail': episode.get('imageUrl'),
'description': clean_html(episode.get('description')),
'timestamp': int_or_none(episode.get('startDate'), 1000),
'duration': int_or_none(episode.get('duration')),
}
class IHeartRadioIE(IHeartRadioBaseIE):
IENAME = 'iheartradio'
_VALID_URL = r'(?:https?://(?:www\.)?iheart\.com/podcast/[^/]+/episode/(?P<display_id>[^/?&#]+)-|iheartradio:)(?P<id>\d+)'
_TEST = {
'url': 'https://www.iheart.com/podcast/105-behind-the-bastards-29236323/episode/part-one-alexander-lukashenko-the-dictator-70346499/?embed=true',
'md5': 'c8609c92c8688dcb69d8541042b8abca',
'info_dict': {
'id': '70346499',
'ext': 'mp3',
'title': 'Part One: Alexander Lukashenko: The Dictator of Belarus',
'description': 'md5:96cc7297b3a5a9ebae28643801c96fae',
'timestamp': 1597741200,
'upload_date': '20200818',
}
}
def _real_extract(self, url):
episode_id = self._match_id(url)
episode = self._call_api(
'episodes/' + episode_id, episode_id)['episode']
info = self._extract_episode(episode)
info.update({
'id': episode_id,
'title': episode['title'],
'url': clean_podcast_url(episode['mediaUrl']),
})
return info
class IHeartRadioPodcastIE(IHeartRadioBaseIE):
IE_NAME = 'iheartradio:podcast'
_VALID_URL = r'https?://(?:www\.)?iheart(?:podcastnetwork)?\.com/podcast/[^/?&#]+-(?P<id>\d+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://www.iheart.com/podcast/1119-it-could-happen-here-30717896/',
'info_dict': {
'id': '30717896',
'title': 'It Could Happen Here',
'description': 'md5:5842117412a967eb0b01f8088eb663e2',
},
'playlist_mincount': 11,
}, {
'url': 'https://www.iheartpodcastnetwork.com/podcast/105-stuff-you-should-know-26940277',
'only_matching': True,
}]
def _real_extract(self, url):
podcast_id = self._match_id(url)
path = 'podcasts/' + podcast_id
episodes = self._call_api(
path + '/episodes', podcast_id, query={'limit': 1000000000})['data']
entries = []
for episode in episodes:
episode_id = str_or_none(episode.get('id'))
if not episode_id:
continue
info = self._extract_episode(episode)
info.update({
'_type': 'url',
'id': episode_id,
'title': episode.get('title'),
'url': 'iheartradio:' + episode_id,
'ie_key': IHeartRadioIE.ie_key(),
})
entries.append(info)
podcast = self._call_api(path, podcast_id, False) or {}
return self.playlist_result(
entries, podcast_id, podcast.get('title'), podcast.get('description'))

View File

@@ -12,7 +12,7 @@ from ..utils import (
class InaIE(InfoExtractor): class InaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ina\.fr/(?:video|audio)/(?P<id>[A-Z0-9_]+)' _VALID_URL = r'https?://(?:(?:www|m)\.)?ina\.fr/(?:video|audio)/(?P<id>[A-Z0-9_]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html', 'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
'md5': 'a667021bf2b41f8dc6049479d9bb38a3', 'md5': 'a667021bf2b41f8dc6049479d9bb38a3',
@@ -31,6 +31,9 @@ class InaIE(InfoExtractor):
}, { }, {
'url': 'https://www.ina.fr/video/P16173408-video.html', 'url': 'https://www.ina.fr/video/P16173408-video.html',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://m.ina.fr/video/I12055569',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -54,7 +54,7 @@ class InfoQIE(BokeCCBaseIE):
def _extract_rtmp_video(self, webpage): def _extract_rtmp_video(self, webpage):
# The server URL is hardcoded # The server URL is hardcoded
video_url = 'rtmpe://video.infoq.com/cfx/st/' video_url = 'rtmpe://videof.infoq.com/cfx/st/'
# Extract video URL # Extract video URL
encoded_id = self._search_regex( encoded_id = self._search_regex(
@@ -86,17 +86,18 @@ class InfoQIE(BokeCCBaseIE):
return [{ return [{
'format_id': 'http_video', 'format_id': 'http_video',
'url': http_video_url, 'url': http_video_url,
'http_headers': {'Referer': 'https://www.infoq.com/'},
}] }]
def _extract_http_audio(self, webpage, video_id): def _extract_http_audio(self, webpage, video_id):
fields = self._hidden_inputs(webpage) fields = self._form_hidden_inputs('mp3Form', webpage)
http_audio_url = fields.get('filename') http_audio_url = fields.get('filename')
if not http_audio_url: if not http_audio_url:
return [] return []
# base URL is found in the Location header in the response returned by # base URL is found in the Location header in the response returned by
# GET https://www.infoq.com/mp3download.action?filename=... when logged in. # GET https://www.infoq.com/mp3download.action?filename=... when logged in.
http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url) http_audio_url = compat_urlparse.urljoin('http://ress.infoq.com/downloads/mp3downloads/', http_audio_url)
http_audio_url = update_url_query(http_audio_url, self._extract_cf_auth(webpage)) http_audio_url = update_url_query(http_audio_url, self._extract_cf_auth(webpage))
# audio file seem to be missing some times even if there is a download link # audio file seem to be missing some times even if there is a download link

View File

@@ -22,7 +22,7 @@ from ..utils import (
class InstagramIE(InfoExtractor): class InstagramIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv)/(?P<id>[^/?#&]+))' _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv|reel)/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc', 'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516', 'md5': '0d2da106a9d2631273e192b372806516',
@@ -35,7 +35,7 @@ class InstagramIE(InfoExtractor):
'timestamp': 1371748545, 'timestamp': 1371748545,
'upload_date': '20130620', 'upload_date': '20130620',
'uploader_id': 'naomipq', 'uploader_id': 'naomipq',
'uploader': 'Naomi Leonor Phan-Quang', 'uploader': 'B E A U T Y F O R A S H E S',
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'comments': list, 'comments': list,
@@ -95,6 +95,9 @@ class InstagramIE(InfoExtractor):
}, { }, {
'url': 'https://www.instagram.com/tv/aye83DjauH/', 'url': 'https://www.instagram.com/tv/aye83DjauH/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.instagram.com/reel/CDUMkliABpa/',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@@ -122,9 +125,9 @@ class InstagramIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
(video_url, description, thumbnail, timestamp, uploader, (media, video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count, comments, height, uploader_id, like_count, comment_count, comments, height,
width) = [None] * 11 width) = [None] * 12
shared_data = self._parse_json( shared_data = self._parse_json(
self._search_regex( self._search_regex(
@@ -137,59 +140,77 @@ class InstagramIE(InfoExtractor):
(lambda x: x['entry_data']['PostPage'][0]['graphql']['shortcode_media'], (lambda x: x['entry_data']['PostPage'][0]['graphql']['shortcode_media'],
lambda x: x['entry_data']['PostPage'][0]['media']), lambda x: x['entry_data']['PostPage'][0]['media']),
dict) dict)
if media: # _sharedData.entry_data.PostPage is empty when authenticated (see
video_url = media.get('video_url') # https://github.com/ytdl-org/youtube-dl/pull/22880)
height = int_or_none(media.get('dimensions', {}).get('height')) if not media:
width = int_or_none(media.get('dimensions', {}).get('width')) additional_data = self._parse_json(
description = try_get( self._search_regex(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'], r'window\.__additionalDataLoaded\s*\(\s*[^,]+,\s*({.+?})\s*\)\s*;',
compat_str) or media.get('caption') webpage, 'additional data', default='{}'),
thumbnail = media.get('display_src') video_id, fatal=False)
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date')) if additional_data:
uploader = media.get('owner', {}).get('full_name') media = try_get(
uploader_id = media.get('owner', {}).get('username') additional_data, lambda x: x['graphql']['shortcode_media'],
dict)
if media:
video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height'))
width = int_or_none(media.get('dimensions', {}).get('width'))
description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption')
thumbnail = media.get('display_src') or media.get('display_url')
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
def get_count(key, kind): def get_count(keys, kind):
return int_or_none(try_get( if not isinstance(keys, (list, tuple)):
keys = [keys]
for key in keys:
count = int_or_none(try_get(
media, (lambda x: x['edge_media_%s' % key]['count'], media, (lambda x: x['edge_media_%s' % key]['count'],
lambda x: x['%ss' % kind]['count']))) lambda x: x['%ss' % kind]['count'])))
like_count = get_count('preview_like', 'like') if count is not None:
comment_count = get_count('to_comment', 'comment') return count
like_count = get_count('preview_like', 'like')
comment_count = get_count(
('preview_comment', 'to_comment', 'to_parent_comment'), 'comment')
comments = [{ comments = [{
'author': comment.get('user', {}).get('username'), 'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'), 'author_id': comment.get('user', {}).get('id'),
'id': comment.get('id'), 'id': comment.get('id'),
'text': comment.get('text'), 'text': comment.get('text'),
'timestamp': int_or_none(comment.get('created_at')), 'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get( } for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')] 'comments', {}).get('nodes', []) if comment.get('text')]
if not video_url: if not video_url:
edges = try_get( edges = try_get(
media, lambda x: x['edge_sidecar_to_children']['edges'], media, lambda x: x['edge_sidecar_to_children']['edges'],
list) or [] list) or []
if edges: if edges:
entries = [] entries = []
for edge_num, edge in enumerate(edges, start=1): for edge_num, edge in enumerate(edges, start=1):
node = try_get(edge, lambda x: x['node'], dict) node = try_get(edge, lambda x: x['node'], dict)
if not node: if not node:
continue continue
node_video_url = url_or_none(node.get('video_url')) node_video_url = url_or_none(node.get('video_url'))
if not node_video_url: if not node_video_url:
continue continue
entries.append({ entries.append({
'id': node.get('shortcode') or node['id'], 'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num, 'title': 'Video %d' % edge_num,
'url': node_video_url, 'url': node_video_url,
'thumbnail': node.get('display_url'), 'thumbnail': node.get('display_url'),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])), 'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])), 'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')), 'view_count': int_or_none(node.get('video_view_count')),
}) })
return self.playlist_result( return self.playlist_result(
entries, video_id, entries, video_id,
'Post by %s' % uploader_id if uploader_id else None, 'Post by %s' % uploader_id if uploader_id else None,
description) description)
if not video_url: if not video_url:
video_url = self._og_search_video_url(webpage, secure=False) video_url = self._og_search_video_url(webpage, secure=False)

View File

@@ -86,7 +86,8 @@ class IPrimaIE(InfoExtractor):
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)', (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">', r'data-product="([^"]+)">',
r'id=["\']player-(p\d+)"', r'id=["\']player-(p\d+)"',
r'playerId\s*:\s*["\']player-(p\d+)'), r'playerId\s*:\s*["\']player-(p\d+)',
r'\bvideos\s*=\s*["\'](p\d+)'),
webpage, 'real id') webpage, 'real id')
playerpage = self._download_webpage( playerpage = self._download_webpage(

Some files were not shown because too many files have changed in this diff Show More