Zenon Mousmoulas
a29d3ba444
Merge a2fd63ce22
into d55d1f423d
2024-12-16 14:56:48 +02:00
dirkf
f28d7178e4
[InfoExtractor] Use kwarg maxsplit for re.split
...
* May become kw-only in future Pythons
2024-12-16 12:38:51 +00:00
dirkf
ad01fa6cca
[jsinterp] Add Debugger from yt-dlp
...
* https://github.com/yt-dlp/yt-dlp/commit/8f53dc4
* thx pukkandan
2024-06-20 20:03:49 +01:00
dirkf
21924742f7
[InfoExtractor] Misc yt-dlp back-ports, etc
...
* add _yes_playlist() method
* avoid crash using _NETRC_MACHINE
* use _search_json() in _search_nextjs_data()
* _search_nextjs_data() default is JSON, not text
* test for above
2024-05-30 15:46:36 +01:00
dirkf
f66372403f
[InfoExtractor] Rework and improve JWPlayer extraction
...
* use traverse_obj() and _search_json()
* support playlist `.load({**video1},{**video2}, ...)`
* support transform_source=... for _extract_jwplayer_data()
2024-03-08 13:03:42 +00:00
dirkf
7216fa2ac4
[InfoExtractor] Add _search_json()
...
* uses the error diagnostic to truncate the JSON string
* may be confused by non-C-Pythons
2024-03-08 13:03:42 +00:00
dirkf
1fd8f802b8
[InfoExtractor] Correctly resolve BaseURL in DASH manifest
...
Specs:
* ISO/IEC 23009-1:2012 section 5.6
* RFC 3986 section 5.
2024-02-02 12:36:05 +00:00
dirkf
4eaeb9b2c6
[InfoExtractor] Support byte range for DASH
...
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf
c58b655a9e
[InfoExtractor] Support DASH subtitle extraction (yt-dlp back-port)
2024-02-02 12:36:05 +00:00
dirkf
640d39f03a
[InfoExtractor] Support some warning and ._downloader
shortcut methods from yt-dlp
2024-01-22 11:10:34 +00:00
dirkf
a25e9f3c84
[compat] Use compat_open()
2023-07-25 13:19:43 +01:00
dirkf
b2ba24bb02
[InfoExtractor] Add _match_valid_url()
class method and refactor
...
* API compatible with yt-dlp
* also support Sequence of patterns in _VALID_URL
* one place to compile _VALID_URL
* TODO: remove existing extractor shims
2023-07-19 22:14:50 +01:00
dirkf
b2741f2654
[InfoExtractor] Add search methods for Next/Nuxt.js from yt-dlp
...
* add _search_nextjs_data(), from https://github.com/yt-dlp/yt-dlp/pull/1386
thanks selfisekai
* add _search_nuxt_data(), from https://github.com/yt-dlp/yt-dlp/pull/1921 ,
thanks Lesmiscore, pukkandan
* add tests for the above
* also fix HTML5 type recognition and tests, from
222a230871
,
thanks Lesmiscore
* update extractors in PR using above, fix tests.
2023-07-19 22:14:50 +01:00
dirkf
1e8ccdd2eb
[InfoExtractor] Support groups in _search_regex()
, etc
2023-07-19 22:14:50 +01:00
dirkf
42b098dd79
[InfoExtractor] Handle unquoted values in OpenGraph searches
2023-02-14 02:53:16 +00:00
dirkf
604762a9f8
[common:jwplayer] Improve jwplayer extraction and parsing ( #31000 )
...
* don't crash parser if jwplayer_data is invalid (empty, or no formats)
* use `label` in `sources[n]` as `format_id`
* relax `jwplayer().setup(...)` RE (also rework PR #27274 enhancement)
* detect more manifest formats in _parse_jwplayer_formats() (from PR #29596 )
* improve metadata extraction (from PR #25433 )
* remember URLs in a set
* use parse_resolution() in format
* extract filesize in format (from yt-dlp)
Co-authored-by: kikuyan <kikuyan@users.noreply.github.com>
Co-authored-by: martin54 <martin54@users.noreply.github.com>
2022-11-11 00:49:13 +00:00
dirkf
11b284c81f
[Common:JWPlayer] Fix x1000 scaling error
...
See https://github.com/yt-dlp/yt-dlp/issues/5106#issuecomment-1264625161
2022-10-11 12:36:44 +00:00
Zenon Mousmoulas
a2fd63ce22
JSON-LD: Tweak (News)Article description extraction
...
Let JSON-LD extract description from articleBody and fall back to
description field when processing (News)Article typed nodes
2021-11-12 09:35:50 +02:00
Zenon Mousmoulas
77e8f5353c
JSON-LD: Support top-level @graph expressing implicit default graph
...
Per W3C JSON-LD v1.1 §4.9 (non-normative ref):
When a JSON-LD document's top-level structure is a map that contains
no other keys than @graph and optionally @context (properties that
are not mapped to an IRI or a keyword are ignored), @graph is
considered to express the otherwise implicit default graph.
Support such a structure in InfoExtractor._json_ld parsing:
Wrap the control flow block in a function, which is called recursively
upon such a structure
2021-11-12 09:30:17 +02:00
Sergey M․
70d0d4f9be
[compat] Use more conventional name for compat SimpleCookie
2021-04-06 14:22:28 +07:00
Remita Amine
162bf9e10a
[compat] add compat_SimpleCookie
2021-04-04 19:49:24 +01:00
Remita Amine
6beb1ac65b
[extractor/common] keep support for non standard JSON-LD VideoObject author values
2021-04-04 19:16:17 +01:00
Remita Amine
e165f5641f
[extractor/common] fix JSON-LD VideoObject author extraction
2021-04-04 16:28:26 +01:00
Remita Amine
1df2596f81
[extractor/common] fix _get_cookies method for python 2( #20673 , #23256 , #20326 , closes #28640 )
2021-04-03 07:54:16 +01:00
Sergey M․
477bff6906
Introduce release_timestamp meta field (refs #28386 )
2021-03-10 03:36:31 +07:00
Remita Amine
67299f23d8
[youtube] Rewrite Extractor
...
- improve format sorting
- remove unused code(swf parsing, ...)
- fix series metadata extraction
- fix trailer video extraction
- improve error reporting
- extract video location
2021-02-01 14:53:01 +01:00
Remita Amine
22feed08a1
[common] remove unwanted query params from unsigned akamai manifest URLs
2020-12-19 20:14:44 +01:00
Sergey M․
1727541315
[extractor/common] Improve JSON-LD interaction statistic extraction (refs #23306 )
2020-12-13 20:24:13 +07:00
Sergey M․
eae19a4473
[extractor/common] Document duration meta field for playlists
2020-12-13 16:53:23 +07:00
Sergey M․
5a1fbbf8b7
[extractor/common] Fix inline HTML5 media tags processing and add test ( closes #27345 )
2020-12-09 00:05:21 +07:00
Sergey M․
91dd25fe1e
[extractor/common] Add support for dl8-* media tags ( closes #27283 )
2020-12-07 01:08:22 +07:00
Sergey M․
06bf2ac20f
[extractor/common] Eliminate media tag name regex duplication
2020-12-07 00:56:29 +07:00
Sergey M․
6ad0d8781e
[extractor/common] Fix media type extraction for HTML5 media tags in start/end form
2020-12-07 00:45:16 +07:00
Remita Amine
da4304609d
[extractor/commons] improve Akamai HTTP formats extraction
2020-12-03 00:33:55 +01:00
Remita Amine
664dd8ba85
[extractor/common] improve Akamai HTTP format extraction
...
- Allow m3u8 manifest without an additional audio format
- Fix extraction for qualities starting with a number
Solution provided by @nixxo based on: https://stackoverflow.com/a/5984688
2020-12-02 21:49:09 +01:00
Remita Amine
193422e12a
[extractor/common] add generic support for akamai http format extraction
2020-11-22 12:54:55 +01:00
Josh Soref
71ddc222ad
Fix typos ( #27084 )
...
* spelling: authorization
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: brightcove
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: creation
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: exceeded
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: exception
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: extension
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: extracting
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: extraction
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: frontline
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: improve
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: length
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: listsubtitles
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: multimedia
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: obfuscated
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: partitioning
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: playlist
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: playlists
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: restriction
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: services
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: split
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: srmediathek
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: support
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: thumbnail
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: verification
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
* spelling: whitespaces
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-11-21 22:00:05 +07:00
Sergey M․
c7178f0f7a
[extractor/common] Output error for invalid URLs in _is_valid_url (refs #21400 , refs #24151 , refs #25617 , refs #25618 , refs #25586 , refs #26068 , refs #27072 )
2020-11-18 23:31:35 +07:00
Sergey M․
ce5b904050
[extractor/common] Relax interaction count extraction in _json_ld
2020-09-19 06:33:17 +07:00
Sergey M․
ad06b99dd4
[extractor/common] Extract author as uploader for VideoObject in _json_ld
2020-09-19 06:13:42 +07:00
Sergey M․
f8c7bed133
[extractor/common] Handle ssl.CertificateError in _request_webpage ( closes #26601 )
...
ssl.CertificateError is raised on some python versions <= 3.7.x
2020-09-18 03:41:16 +07:00
Sergey M․
6c22cee673
[extractor/common] Use compat_cookiejar_Cookie for _set_cookie ( closes #23256 , closes #24776 )
...
To always ensure cookie name and value are bytestrings on python 2.
2020-05-05 06:00:37 +07:00
Sergey M․
4433bb0245
[extractor/common] Extract multiple JSON-LD entries
2020-05-02 23:40:30 +07:00
Sergey M․
13b08034b5
[extractor/common] Skip malformed ISM manifest XMLs while extracting ISM formats ( #24667 )
2020-04-07 22:55:59 +07:00
Sergey M․
7947a1f7db
Remove no longer needed compat_str around geturl
2020-02-29 19:19:24 +07:00
Sergey M․
e2f8bf5888
[extractor/common] Convert ISM manifest to unicode before processing on python 2 ( #24152 )
2020-02-29 17:29:30 +07:00
Remita Amine
5ef62fc4ce
[dailymotion] improve extraction
...
- extract http formats included in m3u8 manifest
- fix user extraction(closes #3553 )(closes #21415 )
- add suport for User Authentication(closes #11491 )
- fix password protected videos extraction(closes #23176 )
- respect age limit option and family filter cookie value(closes #18437 )
- handle video url playlist query param
- report alowed countries for geo-restricted videos
2019-11-26 22:18:21 +01:00
Sergey M․
7360c06fac
[extractor/common] Add data, headers and query to all major extract methods preserving standard order for potential future use
2019-11-16 05:55:54 +07:00
Remita Amine
f81dd65ba2
[extractor/common] clean jwplayer description HTML tags
2019-11-09 13:11:59 +01:00
Remita Amine
3ec86619e3
[common] initialize headers param with empty dict
2019-11-06 07:18:29 +01:00