Commit Graph

16329 Commits

Author SHA1 Message Date
dirkf
2b7dd3b2a2 [utils] Fix update_Request() with empty data (not None) 2023-07-25 13:19:43 +01:00
dirkf
7bce2ad441 [build] Fix various Jython CI and test issues 2023-07-25 13:19:43 +01:00
dirkf
1fa8b86f0b [utils] Remove stray undocumented Host header in redirect (fix 46fde7c) 2023-07-20 05:29:59 +01:00
dirkf
b2ba24bb02 [InfoExtractor] Add _match_valid_url() class method and refactor
* API compatible with yt-dlp
* also support Sequence of patterns in _VALID_URL
* one place to compile _VALID_URL
* TODO: remove existing extractor shims
2023-07-19 22:14:50 +01:00
dirkf
a190b55964 [utils] Fix broken Py 3.11+ compat in traverse_obj()
* inspect.getargspec is missing despite doc claiming backward compat
* replace with emulation of `Signature.bind()`
2023-07-19 22:14:50 +01:00
dirkf
b2741f2654 [InfoExtractor] Add search methods for Next/Nuxt.js from yt-dlp
* add _search_nextjs_data(), from https://github.com/yt-dlp/yt-dlp/pull/1386
  thanks selfisekai
* add _search_nuxt_data(), from https://github.com/yt-dlp/yt-dlp/pull/1921,
  thanks Lesmiscore, pukkandan
* add tests for the above
* also fix HTML5 type recognition and tests, from
  222a230871,
  thanks Lesmiscore
* update extractors in PR using above, fix tests.
2023-07-19 22:14:50 +01:00
dirkf
8465222041 [Clipchamp] Add new extractor back-ported from yt-dlp 2023-07-19 22:14:50 +01:00
dirkf
4339910df3 [DLF] Add site extractors back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6697, thanks nick-cd
2023-07-19 22:14:50 +01:00
dirkf
eaaf4c6736 [Whyp] Add extractor back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6803, thanks CoryTibbettsDev
2023-07-19 22:14:50 +01:00
dirkf
4566e6e53e [GlobalPlayer] Add site extractors back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6903, thanks garret1317
2023-07-19 22:14:50 +01:00
dirkf
1e8ccdd2eb [InfoExtractor] Support groups in _search_regex(), etc 2023-07-19 22:14:50 +01:00
dirkf
cb9366eda5 [utils] Minor updates (merge_dicts, T)
A couple of mods to ease yt-dlp back-ports:
* add kwargs to merge_dicts:
  `unblank=True` (disallow empty string), `rev=False` (reverse the merge list)
* add `T(x)` shortcut for `{x}`, unsupported in Py2.6
2023-07-19 22:14:50 +01:00
dirkf
d9d07a9581 [utils] Improve js_to_json, align with yt-dlp
* support variable substitution, from https://github.com/yt-dlp/yt-dlp/pull/#521 etc,
  thanks ChillingPepper, Grub4k, pukkandan
* improve escape handling, from https://github.com/yt-dlp/yt-dlp/pull/#521
  thanks Grub4k
* support template strings from https://github.com/yt-dlp/yt-dlp/pull/6623
  thanks Grub4k
* add limited `!` evaluation (eg, !!0 -> false, see tests)
2023-07-19 22:14:50 +01:00
dirkf
825a40744b [utils] Align traverse_obj() with yt-dlp
Thanks Grub4k for these:
* traverse `Iterable`s, from https://github.com/yt-dlp/yt-dlp/pull/6902, etc
* traverse `set` key for transformations/filters, `re.Match` group names, from
  776995bc10, etc
* traverse `re.Match`es, from https://github.com/yt-dlp/yt-dlp/pull/5174
* always return list when branching, from https://github.com/yt-dlp/yt-dlp/pull/5170
2023-07-19 22:14:50 +01:00
dirkf
47214e46d8 [compat] Fix old Pythons broken loading of valueless cookie attributes
Cookie string parsing in Py 2.6.9, probably earlier, requires `=`.
Also 3.2, though the CPython code appears to be OK: 3.1 was also wrong.
2023-07-18 10:50:46 +01:00
dirkf
1d8d5a93f7 [test] Fixes for old Pythons 2023-07-18 10:50:46 +01:00
dirkf
1634b1d61e [doc] Warn against setting cookies with --add-header 2023-07-18 10:50:46 +01:00
bashonly
21438a4194 [downloader/external] Fix cookie support 2023-07-18 10:50:46 +01:00
Simon Sawicki
8334ec961b [core] Process header cookies on loading 2023-07-18 10:50:46 +01:00
bashonly
3801d36416 [utils] YoutubeDLCookieJar: Add get_cookie_header and get_cookies_for_url methods 2023-07-18 10:50:46 +01:00
dirkf
b383be9887 [core] Remove Cookie header on redirect to prevent leaks
Adated from yt-dlp/yt-dlp-ghsa-v8mc-9377-rwjj/pull/1/commits/101caac
Thx coletdjnz
2023-07-18 10:50:46 +01:00
dirkf
46fde7caee [core] Update redirect handling from yt-dlp
* Thx coletdjnz: https://github.com/yt-dlp/yt-dlp/pull/7094
* add test that redirected `POST` loses its `Content-Type`
2023-07-18 10:50:46 +01:00
dirkf
648dc5304c [compat] Add Request and HTTPClient compat for redirect
* support `method` parameter of `Request.__init__`  (Py 2 and old Py 3)
* support `getcode` method of compat_http_client.HTTPResponse (Py 2)
2023-07-18 10:50:46 +01:00
dirkf
d5ef405c5d [core] Align error reporting methods with yt-dlp 2023-07-18 10:50:46 +01:00
dirkf
f47fdb9564 [utils] Add {expected_type} and Iterable support to traverse_obj() 2023-07-18 10:50:46 +01:00
dirkf
b6dff4073d [core] Revert version display from b8a86dc 2023-07-18 10:50:46 +01:00
dirkf
f24bc9272e [Misc] Fixes for 2.6 compatibility 2023-07-05 22:58:54 +01:00
dirkf
2500300c2a [workflows/ci.yml] Restore test support for Py 3.2 2023-07-05 22:51:15 +01:00
dirkf
fa7f0effbe [YouTube] Avoid crash in author extraction 2023-06-22 23:14:21 +01:00
pukkandan
9112e668a5 [YouTube] Improve nsig function name extraction
Fixes player b7910ca8, using `,` vs `;`
See https://github.com/ytdl-org/youtube-dl/issues/32292#issuecomment-1602231170

Co-authored-by: dirkf
2023-06-22 16:46:53 +01:00
dirkf
07af47960f [YouTube] Improve fix for ae8ba2c
Thx: https://github.com/yt-dlp/yt-dlp/commit/01aba25
2023-06-18 00:52:18 +01:00
dirkf
ae8ba2c319 [YouTube] Fix KeyError QV in signature extraction failed
* temporarily force missing global definition into sig JS
* improve test: thanks https://github.com/yt-dlp/yt-dlp/issues/7327#issuecomment-1595274615
* resolves #32314
2023-06-17 15:55:19 +01:00
dirkf
d6433cbb2c [jsinterp] Don't find unrelated objects 2023-06-17 15:46:12 +01:00
dirkf
ff75c300f5 [jsinterp] Fix test for failed match in extract_object() 2023-06-17 15:34:11 +01:00
dirkf
a2534f7b88 [jsinterp] Fix div bug breaking player 8c7583ff
Thx bashonly: https://github.com/ytdl-org/youtube-dl/issues/32292#issuecomment-1585639223
Fixes #32292
2023-06-11 17:23:00 +01:00
dirkf
b8a86dcf1a [core] Revise 1f7c6f8 to help downstream merger (possibly) 2023-05-26 20:25:25 +01:00
dirkf
2389c7cbd3 [compat] Fix casefold import __all__ syntax in a19855f 2023-05-23 17:11:22 +01:00
dirkf
ee731f3d00 [ITV] Fix UA capitalisation in 384f632 2023-05-23 16:50:25 +01:00
dirkf
1f7c6f8b2b [core] Further improve platform debug log
* see d1c6c5c
2023-05-23 16:50:25 +01:00
dirkf
d89c2137ba [jsinterp] Small updates for a85a875
* update signature tests
* clarify NaN handling
2023-05-23 16:50:25 +01:00
dirkf
d1c6c5c4d6 [core] Improve platform debug log, based on yt-dlp 2023-05-11 21:17:31 +01:00
dirkf
6ed3433828 [jsinterp] Add short-cut evaluation for common expression
* special handling for (d%e.length+e.length)%e.length speeds up ~6%
2023-05-11 21:02:01 +01:00
dirkf
a85a875fef [jsinterp] Handle NaN in bitwise operators
* also add _NaN
* also pull function naming from yt-dlp
2023-05-11 20:59:30 +01:00
dirkf
11cc3f3ad0 [utils] Fix compiled_regex_type in 249f2b6 2023-05-11 20:53:07 +01:00
dirkf
64d6dd64c8 [YouTube] Support Releases tab 2023-04-23 22:58:35 +01:00
dirkf
211cbfd5d4 [jsinterp] Minimally handle arithmetic operator precedence
Resolves #32066
2023-04-21 14:04:30 +01:00
dirkf
26035bde46 [DashSegmentsFD] Correctly detect errors when fragment_retries == 0
* use the success flag instead of the retry count
* establish the fragment_url outside the retry loop
* only report skipping a fragment once.
* resolves #32033
2023-04-13 00:23:17 +01:00
dirkf
2da3fa04a6 [YouTube] Simplify signature patterns 2023-04-12 23:53:14 +01:00
Gabriel Nagy
735e87adfc
[core] Sanitize info dict before dumping JSON (fixes fe7e130) (#32032)
* follow up to fe7e130 which didn't fix everything.

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-04-12 23:40:38 +01:00
dirkf
fe7e13066c [core] Add and use sanitize_info() method from yt-dlp 2023-04-10 17:12:31 +01:00
dirkf
213d1d91bf [core] No longer importing copy 2023-04-06 19:49:46 +01:00
dirkf
f8253a5289 [core] Avoid deepcopy of ctx dict (fix f35b757) (Pt 2) 2023-04-06 19:42:36 +01:00
dirkf
d6ae3b77cd [core] Avoid deepcopy of ctx dict (fix f35b757)
* may now contain `LazyList`s
* resolves #31999
2023-04-06 14:56:55 +01:00
dirkf
9f4d83ff42 [options] Add --mtime option, unsets default --no-mtime
* resolves #1709 (!)
2023-04-05 19:05:16 +01:00
dirkf
25124bd640 [devscripts] Improve hack to convert command-line options to API options
* define equality for DateRange
* don't show default DateRange
2023-04-05 19:05:16 +01:00
dirkf
78da22489b [compat] Add and use compat_open() like Py3 open()
* resolves FIXME: ytdl-org/youtube-dl/commit/dfe5fa4
2023-04-05 18:57:37 +01:00
dirkf
557dbac173 [FragmentFD] Fix iteration with infinite limit
* fixes ytdl-org/youtube-dl/baa6c5e
* resolves #31885
2023-04-05 18:55:41 +01:00
pukkandan
3f6d2bd76f [extractor/youtube] Bypass throttling for -f17
and related cleanup

Thanks @AudricV for the finding

Ref: yt-dlp/yt-dlp/commit/c9abebb
2023-03-19 02:29:00 +00:00
pukkandan
88f28f620b [extractor/youtube] Construct fragment list lazily
Ref: yt-dlp/yt-dlp/commit/e389d17
See: yt-dlp/yt-dlp#6517
2023-03-19 02:29:00 +00:00
dirkf
f35b757c82 [utils] Ensure allow_types for variadic() is a tuple 2023-03-19 02:29:00 +00:00
dirkf
45495228b7 [downloader/http] Only check for resumability when actually resuming 2023-03-19 02:15:41 +00:00
dirkf
6fece0a96b [AENetworksBaseIE] Report missing show data instead of crash 2023-03-14 16:23:20 +00:00
dirkf
e8de54bce5 [core] Handle /../ sequences in HTTP URLs
* use Python's RFC implementation for embedded sequences
* hack: strip unbalanced leading `../` from path, like eg Firefox

See https://github.com/yt-dlp/yt-dlp/issues/3355
2023-03-14 16:23:20 +00:00
dirkf
baa6c5e95c [FragmentFD] Respect --no-continue
* discard partial fragment on `--no-continue`
* continue with correct progress display otherwise

Resolves #21467
2023-03-14 16:23:20 +00:00
dirkf
5c985d4f81 [downloader] Let _ffmpeg_ handle DASH segments
Fixes https://github.com/ytdl-org/youtube-dl/issues/31792 after 3da1783.
2023-03-14 16:23:20 +00:00
dirkf
0402710227 [jsinterp] Fix regexp parsing and .replace[All] method
* For performance, make regexp object instantiation lazy
 * Other small performance improvements
2023-03-07 01:24:21 +00:00
pukkandan
3e92c60fcd [jsinterp] Handle Date at epoch 0
See yt-dlp/yt_dlp#6400
2023-03-03 15:02:15 +00:00
pukkandan
3da17834a4 [Youtube] Construct dash formats with range query
See yt-dlp/yt_dlp#6369
2023-03-03 15:02:15 +00:00
dirkf
f7ce98a21e [YouTube] Support @owner format in uploader_id etc
* implement https://github.com/ytdl-org/youtube-dl/issues/31530#issuecomment-1435734719
* update affected tests
* misc clean-ups
2023-02-24 12:22:16 +00:00
pukkandan
1d3751c3fe Escape URLs in sanitized_Request, not sanitize_url d2558234cf5dd12d6896eed5427b7dcdb3ab7b5a added escaping of URLs while sanitizing. However, sanitize_url may not always receive an actual URL. Eg: When using youtube-dl "search query" --default-search ytsearch, search query gets escaped to search%20query before being prefixed with ytsearch: which is not the intended behavior. So the escaping is moved to sanitized_Request instead. 2023-02-20 20:27:25 +00:00
df
6067451e43 [Vimeo] Fix e19ec52 for tween-age Pythons
* a check in older Pythons in the 2.7 and earlier, 3.3, 3.4 series caused "sre_constants.error: nothing to repeat"
* satisfy the check by avoiding nested qualifiers that can match empty string

Resolves #31597
2023-02-20 01:41:46 +00:00
dirkf
57802e632f [jsinterp] Fix dict comprehension for Py2.6
Resolves #31600
2023-02-19 13:48:58 +00:00
dirkf
2dd6c6edd8
[YouTube] Avoid crash if uploader_id extraction fails
See #31530.
2023-02-17 11:16:54 +00:00
dirkf
42b098dd79 [InfoExtractor] Handle unquoted values in OpenGraph searches 2023-02-14 02:53:16 +00:00
fonkap
6f8c2635a5 [StreamsbIE] Add extractor for streamsb.com (viewsb.com) (#31517)
* Add extractor for streamsb.com (viewsb.com)

* make data url using app.js version

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
fonkap
de48105dd8 [KommunetvIE] Add extractor for kommunetv.no (#31516)
* Add extractor for kommunetv.no
* Using utils.update_url instead of regex

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
fonkap
822f19f05d [FileMoonIE] Add extractor for filemoon.sx (#31515)
---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
teddy171
33db85c571 [feat]: Add support to external downloader aria2p (#31500)
* feat: add class Aria2pFD

* feat: create call_downloader function

* feat: a colorful download interface to aria2pFD

* feat: change value name

* Apply suggestions from code review

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Typo in suggestion

* fix: remove unused value

* fix: add not function to return value(0 is normal); add total_seconds to download.eta(timedelta object); add waiting status when hook progress

* fix: remove unuse method ..utils.format_bytes

* fix: be up to flake8

* fix: be up to flake8

* Apply suggestions from code review

* [feat] test external downloader aria2p

* [feat] test external downloader aria2p

* [fix] test_external_downloader.py

* Apply suggestions from code review

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Apply suggestions from code review

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Update test/test_external_downloader.py

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Update test/test_external_downloader.py

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Update youtube_dl/downloader/external.py

Co-authored-by: dirkf <fieldhouse@gmx.net>

* refactoring code and fix bugs

* Apply suggestions from code review

* Rename test_external_downloader.py to test_downloader_external.py

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
Valentin Metz
f33923cba7 [rbgtum] Add new extractor (#31305)
* [rbgtum] Add new extractor

* Small update, force CI

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
dirkf
e8198c517b [YouTube] Fix tests 2023-02-13 03:54:51 +00:00
dirkf
bafb6dec72 [YouTube] Refresh compat/utils usage
* import parse_qs()
* import parse_qs in lazy_extractors (clears old TODO)
* clean up old compiled lazy_extractors for Py2
* use update_url()
2023-02-13 03:54:51 +00:00
dirkf
90c9f789d9 [utils] Add parse_qs, update_url
[skip ci]
2023-02-13 03:54:51 +00:00
dirkf
249f2b6316 [compat] Systematise compat_ naming
[skip ci]
2023-02-13 03:54:51 +00:00
dirkf
30e986b834 [YouTube] Add signatureTimestamp for age-gate bypass 2023-02-13 03:54:51 +00:00
dirkf
58988c1421 [YouTube] Bypass age-gating for certain restricted videos
* Use TVHTML5_SIMPLY_EMBEDDED_PLAYER client

* Also add and fix tests

* Introduce and use new utility function `update_url()`
2023-02-13 03:54:51 +00:00
dirkf
e19ec52322 [Vimeo] Support /user{video_id}/{slug} URL format 2023-02-12 22:16:00 +00:00
dirkf
f2f90887ca [Vimeo] Fix Unable to extract info section redux
* as reported in yt-dlp/yt-dlp#6149
* also allow newline in target JSON object
2023-02-12 22:16:00 +00:00
dirkf
cd987e6fca [jsinterp] Nits 2023-02-12 22:16:00 +00:00
dirkf
d947ffe8e3 [IGN] Overhaul extractor to avoid URL redirection loop
Consequently/also:
* centralise video data extraction
* detect 404 and 503 expected errors
* handle the test video in IGNVideo
* handle two additional page formats for the tests in IGNArticle
2023-02-12 22:16:00 +00:00
dirkf
384f632e8a
[ITV] Overhaul ITV extractor (#30266)
* support ITVX URLs (thanks Vangelis66)
* support legacy ITV Hub URLs
* include extraction fix 4c57dd2 from sleaux-meaux 3 May 2021
* include extraction fix 6fbcc16, fix by staubichsauger & pukkandan
* work-around duration parsing pending fix to utils.parse_duration
* apply default vanilla UA for pages and media to avoid site blocking
* also detect and report `Episode not found` instead of generic 404
* rework ITVBTCCIE with geo-block detection, best effort geo-restriction handling, news article support
* fix tests
2023-02-03 21:10:07 +00:00
dirkf
9d17948b5a
[myvideoge] Add new extractor (#31360)
NB download tests on CI servers blocked 

Co-authored-by: Alfonso Solbes <fonk666@gmail.com>
2023-02-02 23:25:44 +00:00
afterdelight
f316f5d4e3
[xhamster] add support for new domain xhvid.com (#31370) 2023-02-02 23:20:14 +00:00
dirkf
bc6f94e459
[FIFA] Back-port extractor from yt-dlp (#31385) 2023-02-02 23:19:03 +00:00
Epsilonator
be3392a0d4
[Blerp] Add new extractor (#31398)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:33:09 +00:00
zhangeric-15
6d829d8119
[YouTube] Fix not finding videos listed under a channel's "shorts" subpage. (#31409)
Resolves #31336

Co-authored-by: Jouni Järvinen <rautamiekka@users.noreply.github.com>
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:26:31 +00:00
Ruowang Sun
98b0cf1cd0
[Callin] Add new extractor (#31414)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:21:05 +00:00
Leon Etienne
e9611a2a36
[pr0gramm] implement InfoExtractor, Resolves #31433 (#31434)
* [pr0gramm] implement infoextractor

* [pr0gramm] remove misplaced comment, uncapture regex-group

* [pr0gramm]: specify utf-8 coding

* [pr0gramm]: add trailing comma to lists for maintainability

* [pr0gramm]: ie only sets upload_date attribute

* [pr0gramm]: add video_id to title

* [pr0gramm]: more forgiving _valid_url regex

* [pr0gramm]: add uploader to title, if set

* Discriminate URL pattern

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:13:39 +00:00
JChris246
807e593a32
[cammodels] fix and improve extractor (#31453)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:12:36 +00:00
Brian Marks
37cbdfa0e7
[americastestkitchen] Add support for downloading entire series (#31493)
Also
* support new sites and URL patterns
* back-port from yt-dlp

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 16:58:21 +00:00
dirkf
295736c9cb [jsinterp] Improve parsing
* support subset `... else if ...`
* support `while`
* add `RegExp` class
* generalise `new` support
* limited more debug strings
* matching test changes
2023-02-02 16:31:49 +00:00