Commit Graph

12815 Commits

Author SHA1 Message Date
Bart Broere
d36d50fe5c Re-add Zapp 2024-03-05 14:04:03 +01:00
Bart Broere
eb6e396bfb First version of a VPRO regex 2024-03-05 13:55:59 +01:00
Bart Broere
28ba01f1cc Add Ongehoord Nederland and test URL for BNNVARA 2024-03-05 13:43:56 +01:00
Bart Broere
4fc423845e Fix lint 2024-03-05 12:49:22 +01:00
Bart Broere
34b5b20107 Refactor into reusable method 2024-03-03 17:47:15 +01:00
Bart Broere
8b1a7d9a7c Use provided util 2024-03-01 16:23:19 +01:00
Bart Broere
f9e59b0c49 Add the possibility to add 'hls' later 2024-03-01 15:28:14 +01:00
Bart Broere
fb7b7179ff Speculate about other ways of getting productId 2024-03-01 15:08:10 +01:00
Bart Broere
0dc7d954cb Comply with coding conventions a bit more 2024-03-01 15:05:30 +01:00
Bart Broere
21eb4513e0 Convert the description into code 2024-03-01 14:12:51 +01:00
Bart Broere
29724e7b07 Delete all broken extractors
Re-implementing these is quicker for the cases where that's even still possible
2024-03-01 13:24:48 +01:00
Bart Broere
577368116b Fix token URL 2024-03-01 13:15:52 +01:00
Bart Broere
da3d1f4321 Add notes on new npo.nl site 2024-03-01 10:36:03 +01:00
Bart Broere
f76d58c71f Skip a test 2024-02-26 13:18:36 +01:00
Bart Broere
c409a8c54b
Merge branch 'ytdl-org:master' into fix-npo-support 2024-02-25 09:42:26 +01:00
Aaron Tan
40bd5c1815
[caffeine.tv] Add new extractor (#32514)
* Add CaffeineTVIE info extractor to support site caffeine.tv

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-02-22 12:54:07 +00:00
dirkf
70f230f9cf
[GBNews]Add new extractor for GB News TV channel (#29432)
* Add extractor for GB News TV channel

* Support more GBNews URL formats
Allow alphanumeric and _ in place of `shows`, which redirect to site's preferred URL

* Update for 2024
2024-02-22 12:44:00 +00:00
dirkf
7687389f08 [Vbox7] Improve extraction, adding features from yt-dlp PR #9100
* changes from https://github.com/yt-dlp/yt-dlp/pull/9100 (thx
seproDev):
  - attempt HLS extraction
  - re-enable XFF
  - test `view_count`, `duration` extraction
* improve commenting, error checks
2024-02-19 00:53:22 +00:00
dirkf
4416f82c80 [Vbox7IE] Sanitise ld+json containing unexpected characters
* based on PR #29680
* added hack to force invoking `transform_source`
* fixes #26218
2024-02-02 12:36:05 +00:00
dirkf
bdda6b81df [Vbox7IE] Improve extraction
* DASH extraction no longer fails with new range support
* but always find combined formats if available
* suppress ineffective XFF geo-bypass (causes time-outs)
* adapted from https://github.com/ytdl-org/youtube-dl/pull/29680
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf
1fd8f802b8 [InfoExtractor] Correctly resolve BaseURL in DASH manifest
Specs:
* ISO/IEC 23009-1:2012 section 5.6
* RFC 3986 section 5.
2024-02-02 12:36:05 +00:00
dirkf
4eaeb9b2c6 [InfoExtractor] Support byte range for DASH
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf
c58b655a9e [InfoExtractor] Support DASH subtitle extraction (yt-dlp back-port) 2024-02-02 12:36:05 +00:00
dirkf
dc512e3a8a [YouTube] Fix like_count extraction using likeButtonViewModel
* also fix various tests
* TODO: check against yt-dlp tests
2024-01-22 11:10:34 +00:00
dirkf
f8b0135850 [YouTube] Rework n-sig processing, realigning with yt-dlp
* apply n-sig before chunked fragments, fixes #32692
2024-01-22 11:10:34 +00:00
dirkf
640d39f03a [InfoExtractor] Support some warning and ._downloader shortcut methods from yt-dlp 2024-01-22 11:10:34 +00:00
Robotix
b1bbc1e502
[Epidemic Sound] Add new extractor (#32628)
* Add simple extractor
* Support separate tracks
* Use index as id instead of slug

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-06 01:17:57 +00:00
dirkf
55a442adae
[Imgur] Overhaul extractor module (#32612)
Revise extractors for new API and page formats
2023-12-05 20:02:30 +00:00
mimvahedi
c62936a5f2
[telewebion] Fix extraction (#32634)
* [telewebion] fix extraction

Resolves https://github.com/ytdl-org/youtube-dl/issues/5135#issuecomment-932952119

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-02 15:25:09 +00:00
ReenigneArcher
b7fca0fab3 [Youtube] Update consent cookie handling to match site
Apologies for force push!
[skip ci]
2023-11-29 21:43:02 +00:00
dirkf
31f50c8194 [S4C] Add thumbnail extraction, extract series as playlist
Based on https://github.com/yt-dlp/yt-dlp/pull/7776: thx ifan-t, bashonly
2023-08-31 23:16:50 +01:00
dirkf
86e3cf5e58 [S4C] Add extractor for Sianel Pedwar Cymru
* from https://github.com/yt-dlp/yt-dlp/pull/7730, thx ifan-t, bashonly
2023-08-04 22:54:12 +01:00
dirkf
a25e9f3c84 [compat] Use compat_open() 2023-07-25 13:19:43 +01:00
dirkf
b2ba24bb02 [InfoExtractor] Add _match_valid_url() class method and refactor
* API compatible with yt-dlp
* also support Sequence of patterns in _VALID_URL
* one place to compile _VALID_URL
* TODO: remove existing extractor shims
2023-07-19 22:14:50 +01:00
dirkf
b2741f2654 [InfoExtractor] Add search methods for Next/Nuxt.js from yt-dlp
* add _search_nextjs_data(), from https://github.com/yt-dlp/yt-dlp/pull/1386
  thanks selfisekai
* add _search_nuxt_data(), from https://github.com/yt-dlp/yt-dlp/pull/1921,
  thanks Lesmiscore, pukkandan
* add tests for the above
* also fix HTML5 type recognition and tests, from
  222a230871,
  thanks Lesmiscore
* update extractors in PR using above, fix tests.
2023-07-19 22:14:50 +01:00
dirkf
8465222041 [Clipchamp] Add new extractor back-ported from yt-dlp 2023-07-19 22:14:50 +01:00
dirkf
4339910df3 [DLF] Add site extractors back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6697, thanks nick-cd
2023-07-19 22:14:50 +01:00
dirkf
eaaf4c6736 [Whyp] Add extractor back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6803, thanks CoryTibbettsDev
2023-07-19 22:14:50 +01:00
dirkf
4566e6e53e [GlobalPlayer] Add site extractors back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6903, thanks garret1317
2023-07-19 22:14:50 +01:00
dirkf
1e8ccdd2eb [InfoExtractor] Support groups in _search_regex(), etc 2023-07-19 22:14:50 +01:00
dirkf
fa7f0effbe [YouTube] Avoid crash in author extraction 2023-06-22 23:14:21 +01:00
pukkandan
9112e668a5 [YouTube] Improve nsig function name extraction
Fixes player b7910ca8, using `,` vs `;`
See https://github.com/ytdl-org/youtube-dl/issues/32292#issuecomment-1602231170

Co-authored-by: dirkf
2023-06-22 16:46:53 +01:00
dirkf
07af47960f [YouTube] Improve fix for ae8ba2c
Thx: https://github.com/yt-dlp/yt-dlp/commit/01aba25
2023-06-18 00:52:18 +01:00
dirkf
ae8ba2c319 [YouTube] Fix KeyError QV in signature extraction failed
* temporarily force missing global definition into sig JS
* improve test: thanks https://github.com/yt-dlp/yt-dlp/issues/7327#issuecomment-1595274615
* resolves #32314
2023-06-17 15:55:19 +01:00
dirkf
ee731f3d00 [ITV] Fix UA capitalisation in 384f632 2023-05-23 16:50:25 +01:00
dirkf
64d6dd64c8 [YouTube] Support Releases tab 2023-04-23 22:58:35 +01:00
dirkf
2da3fa04a6 [YouTube] Simplify signature patterns 2023-04-12 23:53:14 +01:00
dirkf
0c7261db90
Update npo.py
* simplify comment
* force CI
2023-04-06 01:51:02 +01:00
Bart Broere
632897860b
Accept suggestions on PR; comply with conventions
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-04-03 09:50:21 +02:00
Bart Broere
9e1acb2527 Fix flake8 2023-03-31 12:56:18 +02:00
Bart Broere
fb2b4e2894 Add line comment 2023-03-31 12:46:05 +02:00
Bart Broere
b4776f2e36 Import from compat 2023-03-31 12:39:11 +02:00
Bart Broere
3b31478dfd Fix support for NPO downloads 2023-03-31 12:30:22 +02:00
pukkandan
3f6d2bd76f [extractor/youtube] Bypass throttling for -f17
and related cleanup

Thanks @AudricV for the finding

Ref: yt-dlp/yt-dlp/commit/c9abebb
2023-03-19 02:29:00 +00:00
pukkandan
88f28f620b [extractor/youtube] Construct fragment list lazily
Ref: yt-dlp/yt-dlp/commit/e389d17
See: yt-dlp/yt-dlp#6517
2023-03-19 02:29:00 +00:00
dirkf
6fece0a96b [AENetworksBaseIE] Report missing show data instead of crash 2023-03-14 16:23:20 +00:00
pukkandan
3da17834a4 [Youtube] Construct dash formats with range query
See yt-dlp/yt_dlp#6369
2023-03-03 15:02:15 +00:00
dirkf
f7ce98a21e [YouTube] Support @owner format in uploader_id etc
* implement https://github.com/ytdl-org/youtube-dl/issues/31530#issuecomment-1435734719
* update affected tests
* misc clean-ups
2023-02-24 12:22:16 +00:00
pukkandan
1d3751c3fe Escape URLs in sanitized_Request, not sanitize_url d2558234cf5dd12d6896eed5427b7dcdb3ab7b5a added escaping of URLs while sanitizing. However, sanitize_url may not always receive an actual URL. Eg: When using youtube-dl "search query" --default-search ytsearch, search query gets escaped to search%20query before being prefixed with ytsearch: which is not the intended behavior. So the escaping is moved to sanitized_Request instead. 2023-02-20 20:27:25 +00:00
df
6067451e43 [Vimeo] Fix e19ec52 for tween-age Pythons
* a check in older Pythons in the 2.7 and earlier, 3.3, 3.4 series caused "sre_constants.error: nothing to repeat"
* satisfy the check by avoiding nested qualifiers that can match empty string

Resolves #31597
2023-02-20 01:41:46 +00:00
dirkf
2dd6c6edd8
[YouTube] Avoid crash if uploader_id extraction fails
See #31530.
2023-02-17 11:16:54 +00:00
dirkf
42b098dd79 [InfoExtractor] Handle unquoted values in OpenGraph searches 2023-02-14 02:53:16 +00:00
fonkap
6f8c2635a5 [StreamsbIE] Add extractor for streamsb.com (viewsb.com) (#31517)
* Add extractor for streamsb.com (viewsb.com)

* make data url using app.js version

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
fonkap
de48105dd8 [KommunetvIE] Add extractor for kommunetv.no (#31516)
* Add extractor for kommunetv.no
* Using utils.update_url instead of regex

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
fonkap
822f19f05d [FileMoonIE] Add extractor for filemoon.sx (#31515)
---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
Valentin Metz
f33923cba7 [rbgtum] Add new extractor (#31305)
* [rbgtum] Add new extractor

* Small update, force CI

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
dirkf
e8198c517b [YouTube] Fix tests 2023-02-13 03:54:51 +00:00
dirkf
bafb6dec72 [YouTube] Refresh compat/utils usage
* import parse_qs()
* import parse_qs in lazy_extractors (clears old TODO)
* clean up old compiled lazy_extractors for Py2
* use update_url()
2023-02-13 03:54:51 +00:00
dirkf
30e986b834 [YouTube] Add signatureTimestamp for age-gate bypass 2023-02-13 03:54:51 +00:00
dirkf
58988c1421 [YouTube] Bypass age-gating for certain restricted videos
* Use TVHTML5_SIMPLY_EMBEDDED_PLAYER client

* Also add and fix tests

* Introduce and use new utility function `update_url()`
2023-02-13 03:54:51 +00:00
dirkf
e19ec52322 [Vimeo] Support /user{video_id}/{slug} URL format 2023-02-12 22:16:00 +00:00
dirkf
f2f90887ca [Vimeo] Fix Unable to extract info section redux
* as reported in yt-dlp/yt-dlp#6149
* also allow newline in target JSON object
2023-02-12 22:16:00 +00:00
dirkf
d947ffe8e3 [IGN] Overhaul extractor to avoid URL redirection loop
Consequently/also:
* centralise video data extraction
* detect 404 and 503 expected errors
* handle the test video in IGNVideo
* handle two additional page formats for the tests in IGNArticle
2023-02-12 22:16:00 +00:00
dirkf
384f632e8a
[ITV] Overhaul ITV extractor (#30266)
* support ITVX URLs (thanks Vangelis66)
* support legacy ITV Hub URLs
* include extraction fix 4c57dd2 from sleaux-meaux 3 May 2021
* include extraction fix 6fbcc16, fix by staubichsauger & pukkandan
* work-around duration parsing pending fix to utils.parse_duration
* apply default vanilla UA for pages and media to avoid site blocking
* also detect and report `Episode not found` instead of generic 404
* rework ITVBTCCIE with geo-block detection, best effort geo-restriction handling, news article support
* fix tests
2023-02-03 21:10:07 +00:00
dirkf
9d17948b5a
[myvideoge] Add new extractor (#31360)
NB download tests on CI servers blocked 

Co-authored-by: Alfonso Solbes <fonk666@gmail.com>
2023-02-02 23:25:44 +00:00
afterdelight
f316f5d4e3
[xhamster] add support for new domain xhvid.com (#31370) 2023-02-02 23:20:14 +00:00
dirkf
bc6f94e459
[FIFA] Back-port extractor from yt-dlp (#31385) 2023-02-02 23:19:03 +00:00
Epsilonator
be3392a0d4
[Blerp] Add new extractor (#31398)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:33:09 +00:00
zhangeric-15
6d829d8119
[YouTube] Fix not finding videos listed under a channel's "shorts" subpage. (#31409)
Resolves #31336

Co-authored-by: Jouni Järvinen <rautamiekka@users.noreply.github.com>
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:26:31 +00:00
Ruowang Sun
98b0cf1cd0
[Callin] Add new extractor (#31414)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:21:05 +00:00
Leon Etienne
e9611a2a36
[pr0gramm] implement InfoExtractor, Resolves #31433 (#31434)
* [pr0gramm] implement infoextractor

* [pr0gramm] remove misplaced comment, uncapture regex-group

* [pr0gramm]: specify utf-8 coding

* [pr0gramm]: add trailing comma to lists for maintainability

* [pr0gramm]: ie only sets upload_date attribute

* [pr0gramm]: add video_id to title

* [pr0gramm]: more forgiving _valid_url regex

* [pr0gramm]: add uploader to title, if set

* Discriminate URL pattern

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:13:39 +00:00
JChris246
807e593a32
[cammodels] fix and improve extractor (#31453)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:12:36 +00:00
Brian Marks
37cbdfa0e7
[americastestkitchen] Add support for downloading entire series (#31493)
Also
* support new sites and URL patterns
* back-port from yt-dlp

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 16:58:21 +00:00
dirkf
195f22f679
[generic] Improve KVS (etc) extraction 2022-11-13 15:09:29 +00:00
dirkf
fc2beab0e7
[generic] Improve KVS (etc) extraction
* detect kt_player('kt_player', 'https://.../kt_player.swf?v=5...
* detect age limit if 18 USC 2257 is mentioned
* test with shooshtime.com

Partially resolves #31332.
2022-11-13 14:59:30 +00:00
FraFraFra-LongD
1a4fbe8462
Added ThisVid.com support (#29187)
* add ThisVidIE, ThisVidMemberIE, ThisVidPlaylistIE
* redirect embed to main page for more metadata
* use KVS extraction newly added to GenericIE and remove duplicate tests
* also add MrDeepFake etc compat to GenericIE
(closes #22390)

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-11-13 13:22:04 +00:00
dirkf
c2f9be3e63
[generic] Add KVS player extraction 2022-11-12 11:55:05 +00:00
dirkf
604762a9f8
[common:jwplayer] Improve jwplayer extraction and parsing (#31000)
* don't crash parser if jwplayer_data is invalid (empty, or no formats)
* use `label` in `sources[n]` as `format_id`
* relax `jwplayer().setup(...)` RE (also rework PR #27274 enhancement)
* detect more manifest formats in _parse_jwplayer_formats() (from PR #29596)
* improve metadata extraction (from PR #25433)
* remember URLs in a set
* use parse_resolution() in format
* extract filesize in format (from yt-dlp)

Co-authored-by: kikuyan <kikuyan@users.noreply.github.com>
Co-authored-by: martin54 <martin54@users.noreply.github.com>
2022-11-11 00:49:13 +00:00
Moises Lima
47e70fff8b
[PeekVids, PlayVids] Add new extractor (#29765)
* Merge back-port from yt-dlp
* Merge features from PR #29798
* Improve metadata extraction

Co-authored-by: dirkf <fieldhouse@gmx.net>
Co-authored by: AXDOOMER
2022-11-09 20:26:30 +00:00
dirkf
de39d1281c
[extractor/ceskatelevize] Back-port extractor from yt-dlp, etc (#30713)
* back-port extractor, removing CeskaTelevizePoradyIE
* follow redirect URL
* support liveBroadcast and videobonusDetail in __NEXT__ data
* return single video for singleton playlist
* fix/add tests
2022-11-04 10:13:07 +00:00
Xie Yanbo
ce5d36486e
[netease] Support urls shared from mobile app (#31304)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-10-30 11:48:44 +00:00
Xie Yanbo
d25cf62086
[netease] Impove error handling (#31303)
* add warnings for users outside of China
* skip empty song urls

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-10-30 11:46:46 +00:00
dirkf
502cefa41f
[Vimeo] Update variable name in hydration JSON pattern
Fixes #31311
2022-10-27 14:33:00 +00:00
dirkf
0faa45d6c0
[BongaCams] Support new .net domain
Resolves #31262.
2022-10-20 11:06:44 +00:00
ache
447edc48e6
Fix ADN extractor (#31275)
* Rename Anime Digital Network to Animation Digital Network, animationdigitalnetwork.fr
* Update the test to an available video
* Update the decoding key of subtitles
* Keep the support of old URLs
* Add a test to match the old URL
* Reduce redundancy of the URL name
* Fix md5 ^^"
* Fix undefined _BASE
* Process HTTP error text (eg geo-block) correctly and uniformly in Py3, Py2
* Skip test for CI since geo-blocked

Signed-off-by: ache <ache@ache.one>
Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-10-18 16:06:27 +01:00
dirkf
ee8560d01e
[ManyVids] Support new single-page app structure 2022-10-13 02:42:49 +00:00
dirkf
7135277fec
[ManyVids] Support new single-page app structure
See https://github.com/yt-dlp/yt-dlp/issues/5210#issuecomment-1276919962.
2022-10-13 01:59:01 +00:00
dirkf
7bbd5b13d4
[Motherless] Pull from yt-dlp, etc
* use username field
* loosen regexes
* warn on page count 0 in group
* avoid reloading group page 1
Closes #29626
2022-10-12 01:09:55 +01:00
Xie Yanbo
c91cbf6072
[netease] Get netease music download url through player api (#31235)
* remove unplayable song from test
* compatible with python 2
* using standard User_Agent, fix imports
* use hash instead of long description
* fix lint
* fix hash
2022-10-11 13:55:09 +01:00
dirkf
11b284c81f
[Common:JWPlayer] Fix x1000 scaling error
See https://github.com/yt-dlp/yt-dlp/issues/5106#issuecomment-1264625161
2022-10-11 12:36:44 +00:00