Compare commits

...

271 Commits

Author SHA1 Message Date
Sergey M․
7815d6b743 release 2019.10.16 2019-10-16 03:26:47 +07:00
Sergey M․
173190f5e3 [ChangeLog] Actualize
[ci skip]
2019-10-16 03:25:13 +07:00
Remita Amine
974311b5aa [vimeo] improve album videos id extraction(closes #22599) 2019-10-15 21:01:59 +01:00
Remita Amine
30eb05cb41 [globo] extract subtitles(closes #22713) 2019-10-15 19:54:53 +01:00
Remita Amine
2af01c0293 [bokecc] improve player params extraction(closes #22638) 2019-10-15 15:18:51 +01:00
Remita Amine
7e05df71b7 [nexx] handle result list(closes #22666) 2019-10-15 00:10:22 +01:00
Remita Amine
a1ee23e98f [vimeo] fix VHX embed extraction 2019-10-14 18:37:35 +01:00
Remita Amine
311ee45731 [nbc] switch to graphql api(closes #18581)(closes #22693)(closes #22701) 2019-10-14 18:36:25 +01:00
Remita Amine
c317b6163b [vessel] remove extractor 2019-10-10 00:01:51 +01:00
Sergey M․
2765c47a8c [promptfile] Remove extractor (closes #6239) 2019-10-10 03:40:01 +07:00
Sergey M․
07b50f616e [kaltura] Fix service URL extraction (closes #22658) 2019-10-10 00:24:03 +07:00
Sergey M․
1907f06e7b [kaltura] Fix embed info strip (refs #22658) 2019-10-10 00:11:41 +07:00
Remita Amine
d4bb825b83 [globo] fix format extraction(closes #20319) 2019-10-09 11:08:28 +01:00
Sergey M․
560d3b7d7c [redtube] Improve metadata extraction (closes #22492, closes #22615) 2019-10-05 22:04:49 +07:00
Sergey M․
4bf568d36c [pornhub:uservideos:upload] Fix extraction (closes #22619) 2019-10-05 21:43:31 +07:00
Sergey M․
05446d483d [telequebec:squat] Add support for squat.telequebec.tv (closes #18503) 2019-10-04 20:17:18 +07:00
bitraid
3a37f2c3be [wimp] Remove extractor (closes #22088) (#22091) 2019-10-04 19:48:20 +07:00
Anh Nhan Nguyen
0b87beefe6 [gfycat] Extend _VALID_URL (#22225) 2019-10-04 19:27:58 +07:00
axelerometer
fd4db1ebc2 [chaturbate] Extend _VALID_URL (#22309) 2019-10-04 19:22:01 +07:00
Andrew Morgan
b64045cd2a [peertube] Update instances (#22414) 2019-10-04 19:17:16 +07:00
Patrice Levesque
c2915de82e [telequebec] Add support for coucou.telequebec.tv (#22482) 2019-10-04 19:14:31 +07:00
Stephan
4e72d02f39 [xvideos] Extend _VALID_URL (#22471) 2019-10-04 19:05:35 +07:00
sofutru
76e510b92c [youtube] Remove support for invidious.enkirton.net (#22543) 2019-10-04 19:01:03 +07:00
kr4ssi
9679a62a28 [openload] Add support for oload.monster (#22592) 2019-10-04 18:57:51 +07:00
Martin Polden
ca20b13048 [nrktv:seriebase] Fix extraction (#22596) 2019-10-04 18:57:18 +07:00
sofutru
894b3826f5 [youtube] Add support for yt.lelux.fi (#22597) 2019-10-04 18:52:15 +07:00
Sergey M․
aaf9d904aa [orf:tvthek] Make manifest requests non fatal (refs #22578) 2019-10-03 00:55:46 +07:00
Sergey M․
25e911a968 [extractor/common] Make _is_valid_url more relaxed 2019-10-03 00:53:07 +07:00
Sergey M․
74bc299453 [teachable] Skip login when already logged in (closes #22572) 2019-10-02 02:03:22 +07:00
Sergey M․
2906631e12 [viewlift] Fix URL matching 2019-10-01 23:18:11 +07:00
Sergey M․
326ae4ff96 [viewlift] Improve extraction (closes #22545) 2019-09-29 23:03:39 +07:00
Sergey M․
72fd4d0c6a [nonktube] Fix extraction (closes #22544) 2019-09-29 21:57:08 +07:00
Sergey M․
f4b865c613 release 2019.09.28 2019-09-28 00:30:30 +07:00
Sergey M․
412f44f4b3 [ChangeLog] Actualize
[ci skip]
2019-09-28 00:23:25 +07:00
Sergey M․
6483fbd336 [vk] Fix extraction (closes #22522) 2019-09-28 00:04:52 +07:00
Sergey M․
8130ac42e5 [openload] PEP 8 2019-09-26 23:15:06 +07:00
Sergey M․
cb3e4a2947 [heise] Fix kaltura embeds extraction (closes #22514) 2019-09-26 23:11:02 +07:00
Remita Amine
2a88a0c44d [ted] check for resources validity and extract subtitled downloads(closes #22513) 2019-09-26 11:44:57 +01:00
sofutru
33c1c7d80f [youtube] Add support for owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya.b32.i2p (#22292) 2019-09-25 02:43:34 +07:00
Sergey M․
21d3c21e62 [nhk] Add support for clips 2019-09-25 02:39:25 +07:00
Remita Amine
a373befa25 [nhk] fix video extraction(closes #22249)(closes #22353) 2019-09-24 20:24:17 +01:00
Sergey M․
df63cafe49 [byutv] Fix extraction (refs #22070)
Downloading of new videos does not work due to DRM
2019-09-25 02:16:25 +07:00
Sergey M․
d06daf23da [YoutubeDL] Honour all --get-* options with --flat-playlist (closes #22493) 2019-09-25 02:10:37 +07:00
smed79
8e9fdcbe27 [openload] Add support for oload.online (#22304) 2019-09-24 23:56:12 +07:00
sofutru
666d808e70 [youtube] Add support for invidious.drycat.fr (#22451) 2019-09-24 23:16:46 +07:00
ipaha
7d327fea5b [jwplatfom] do not match video URLs(#20596) (#22148) 2019-09-23 19:44:00 +00:00
Sergey M․
4e3f1f0469 [youtube:playlist] Unescape playlist uploader (closes #22483) 2019-09-23 00:20:52 +07:00
Remita Amine
4bc15a68d1 [bilibili] add support audio albums and songs(closes #21094) 2019-09-22 17:14:18 +01:00
Remita Amine
edb2820ca5 [instagram] add support for tv URLs 2019-09-21 21:57:45 +01:00
Remita Amine
6cf6b357f5 [mixcloud] allow uppercase letters in format urls(closes #19280) 2019-09-20 11:14:24 +01:00
Remita Amine
f455a934e9 [brightcove] delegate all supported BrightcoveLegacyIE URLs to BrightcoveNewIE
closes #11523
closes #12842
closes #13912
closes #15669
closes #16303
2019-09-19 18:02:26 +01:00
Sergey M․
d9d3098675 [hotstar] Use native HLS downloader by default 2019-09-19 03:03:07 +07:00
Sergey M․
1cb812d3c2 [hotstar] Extract more formats (closes #22323) 2019-09-19 03:00:19 +07:00
Sergey M․
6fd26a7d4a [9now] Fix extraction (closes #22361) 2019-09-19 02:31:39 +07:00
Sergey M․
9cf26b6e1d [zdf] Bypass geo restriction 2019-09-19 01:11:52 +07:00
Sergey M․
20e11b70ac [tv4] Fix extraction and extract series metadata (closes #22443) 2019-09-18 23:45:26 +07:00
Sergey M․
e1f692f0b3 release 2019.09.12.1 2019-09-12 02:53:52 +07:00
Sergey M․
2f851a7d7d [ChangeLog] Actualize
[ci skip]
2019-09-12 02:48:07 +07:00
Sergey M․
4878759f3b [youtube] Remove quality and tbr for itag 43 (closes #22372) 2019-09-12 02:46:12 +07:00
Sergey M․
303d3e142c [ChangeLog] Actualize
[ci skip]
2019-09-12 02:05:54 +07:00
Sergey M․
bd10b229c0 release 2019.09.12 2019-09-12 01:21:21 +07:00
Sergey M․
035c7a59e8 [ChangeLog] Actualize
[ci skip]
2019-09-12 01:18:25 +07:00
Sergey M․
bf1317d257 [youtube] Quick extraction tempfix (closes #22367, closes #22163) 2019-09-11 22:44:47 +07:00
sofutru
bff90fc518 [youtube] Add support for invidious tor instances (#22268) 2019-09-03 01:35:32 +07:00
Sergey M․
31dbd054c8 [platzi] Improve client data extraction (closes #22290) 2019-09-03 01:24:20 +07:00
Sergey M․
66d04c74e0 [platzi:course] Add support for authentication 2019-09-03 01:23:22 +07:00
Patrick Dessalle
d7da1e37c7 [nickjr] Add support for nickelodeonjunior.fr (#22246) 2019-09-02 00:59:57 +07:00
Sergey M․
f620d0d860 release 2019.09.01 2019-09-01 03:33:02 +07:00
Sergey M․
79dd8884bb [ChangeLog] Actualize
[ci skip]
2019-09-01 03:18:35 +07:00
Sergey M․
df228355fd [xhamster:user] Add extractor (closes #16330, closes #18454) 2019-09-01 03:12:56 +07:00
Sergey M․
8945b10f6e [xhamster] Add support for more domains 2019-09-01 03:09:04 +07:00
Sergey M․
7cb51b5daf [extractor/generic] Improve squarespace detection and fix test (closes #21859, refs #21294, refs #21802) 2019-09-01 01:25:48 +07:00
Barbara Miller
d78657fd18 [extractor/generic] Add support for squarespace embeds (closes #21294) 2019-09-01 01:25:48 +07:00
Sergey M․
cc73d5ad15 [openload] Fix domains regex 2019-09-01 01:25:48 +07:00
telephono
71f47617c8 [downloader/external] Respect mtime option for aria2c (#22242) 2019-09-01 00:24:43 +07:00
Remita Amine
3f46a25a97 [verystream] add support for woof.tube (closes #22217) 2019-08-31 10:02:09 +01:00
Sergey M․
9d058b3206 [dailymotion] Add support for lequipe.fr (closes #21328, closes #22152) 2019-08-29 23:08:19 +07:00
Sergey M․
b500955a58 [openload] Add support for oload.vip (closes #22205) 2019-08-28 01:58:07 +07:00
Jay
acc86c9a97 [bbc] Fix some tests 2019-08-28 01:53:40 +07:00
Jay
b72305f078 [bbccouk] Extend _VALID_URL (closes #19200) 2019-08-28 01:53:40 +07:00
sofutru
494d664e67 [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223) 2019-08-28 01:39:59 +07:00
phan-ctrl
d1fcf255c5 [safari] Fix authentication (closes #22161) (#22184) 2019-08-27 10:16:04 +07:00
Sergey M․
183a18c4e7 [usanetwork] Fix extraction (closes #22105) 2019-08-26 03:38:54 +07:00
supritkumar
393cc31d5e [einthusan] Add support for einthusan.ca (#22171) 2019-08-21 09:52:59 +07:00
Sergey M․
0add33abcb [youtube] Improve unavailable message extraction (refs #22117) 2019-08-16 23:44:11 +07:00
Chuck Cho
0326bcb6c1 [piksel] add subtitle capability (#20506) 2019-08-15 22:14:47 +00:00
Sergey M․
def849e0e6 release 2019.08.13 2019-08-13 23:18:38 +07:00
Sergey M․
69611a1616 [ChangeLog] Actualize
[ci skip]
2019-08-13 23:10:05 +07:00
Sergey M․
351f37c022 [youtube:playlist] Improve flat extraction (closes #21927) 2019-08-13 05:02:52 +07:00
lightmare
3bce4ff7d9 [downloader/fragment] Fix ETA calculation of resumed download (#21992) 2019-08-11 06:57:43 +07:00
Remita Amine
ffddb11264 [YoutubeDL] check annotations availabilty(closes #18582) 2019-08-09 08:19:41 +01:00
Remita Amine
64b6a4e91e [youtube] fix annotations extraction(closes #22045) 2019-08-09 08:16:53 +01:00
Remita Amine
b3d39be239 [discovery] extract series meta field(#21808) 2019-08-08 23:23:58 +01:00
Sergey M․
1357734978 [youtube] Improve error detection (#16445) 2019-08-06 02:32:44 +07:00
Remita Amine
eb9c9c74a6 [vimeo] fix album extraction
closes #1933
closes #15704
closes #15855
closes #18967
closes #21986
2019-08-03 10:29:20 +01:00
Remita Amine
5efbc1366f [roosterteeth] add support for watch URLs 2019-08-02 19:38:35 +01:00
Remita Amine
995f319b06 [discovery] limit video data by show slug(closes #21980) 2019-08-02 18:08:26 +01:00
Sergey M
d9d3a5a816 [README.md] Move code from #21939 to the right place 2019-08-02 05:54:56 +07:00
Sergey M․
4f2d735803 release 2019.08.02 2019-08-02 05:37:54 +07:00
Sergey M․
2e9522b061 [ChangeLog] Actualize
[ci skip]
2019-08-02 05:36:32 +07:00
Sergey M․
be306d6a31 [tvigle] Fix extraction and add support for HLS and DASH formats (closes #21967) 2019-08-02 05:25:01 +07:00
Sergey M․
33b529fabd [yandexvideo] Add support for DASH formats (#21971) 2019-08-02 05:03:25 +07:00
Kyle
07f3a05c87 [CONTRIBUTING.md] Add some more coding conventions (#21939) 2019-08-02 04:49:01 +07:00
Remita Amine
535111657b [discovery] use API call for video data extraction(#21808) 2019-08-01 22:45:10 +01:00
cantandwont
826dcff99c Output batch filename when it could not be read (#21915) 2019-08-01 03:54:39 +07:00
Sen Jiang
9a37ff82f1 [mgtv] Extract format_note (#21881)
format_note should now show 标清, 高清, 超清, 蓝光, etc.
2019-08-01 03:45:02 +07:00
Sergey M․
766c4f6090 [tvn24] Fix test 2019-07-31 02:32:45 +07:00
Sergey M․
7279163412 [tvn24] Fix metadata extraction (closes #21833, closes #21834) 2019-07-31 02:32:45 +07:00
CeruleanSky
07ab44c420 [dlive] Relax _VALID_URL (#21909) 2019-07-31 01:43:49 +07:00
smed79
2c8b1a21e8 [openload] Add support for oload.best (#21913) 2019-07-31 01:40:50 +07:00
Sergey M․
c2d125d99f [youtube] Improve metadata extraction for age gate content (closes #21943) 2019-07-31 00:14:33 +07:00
Sergey M․
85c2c4b4ab release 2019.07.30 2019-07-30 09:43:47 +07:00
Sergey M․
8614a03f9c [ChangeLog] Actualize
[ci skip]
2019-07-30 09:41:23 +07:00
Remita Amine
8dbf751aa2 [youtube] improve title and description extraction(closes #21934) 2019-07-30 00:13:33 +01:00
Sergey M․
90634acfcf release 2019.07.27 2019-07-27 03:44:55 +07:00
Sergey M․
eaba9dd6c2 [ChangeLog] Actualize
[ci skip]
2019-07-27 03:43:33 +07:00
Kitten King
843ad1796b Fix typos (#21901) 2019-07-26 22:30:18 +07:00
Kyle
608b8a4300 [yahoo:japannews] Add extractor (closes #21698) (#21265) 2019-07-22 00:59:36 +07:00
Sergey M․
ab794a553c [ctsnews] PEP 8 2019-07-21 14:59:53 +07:00
Remita Amine
3b446ab351 [discovery] add support go.discovery.com URLs 2019-07-20 20:20:53 +01:00
Sergey M․
13a75688a5 [youtube] Fix some tests 2019-07-21 00:01:46 +07:00
Sergey M․
2e18adec98 [youtube:playlist] Relax _VIDEO_RE (closes #21844) 2019-07-20 23:46:34 +07:00
Sergey M․
9c1da4a9f9 [extractor/generic] Restrict --default-search schemeless URLs detection pattern (closes #21842) 2019-07-20 23:08:26 +07:00
Petr Vaněk
5e1c39ac85 [extractor/common] Fix typo in thumbnails resolution description (#21817) 2019-07-17 22:47:53 +07:00
Remita Amine
1824bfdcdf [vrv] fix CMS signing query extraction(closes #21809) 2019-07-16 22:51:10 +01:00
Sergey M․
2f1991ff14 release 2019.07.16 2019-07-16 00:01:46 +07:00
Sergey M․
8b4a0ebf10 [ChangeLog] Actualize
[ci skip]
2019-07-15 23:59:23 +07:00
Sergey M․
f61496863d [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv (closes #21281, closes #21290) 2019-07-15 23:58:08 +07:00
Sergey M․
799756a3b3 [kaltura] Check source format URL (#21290) 2019-07-15 23:58:08 +07:00
chien-yu
7d4dd3e5b4 [ctsnews] Fix YouTube embeds extraction (#21678) 2019-07-15 23:03:03 +07:00
tlonic
f2a213d025 [einthusan] Add support for einthusan.com (closes #21748) (#21775) 2019-07-15 22:58:55 +07:00
geditorit
791d2e8117 [youtube] Add support for invidious.mastodon.host (#21777) 2019-07-15 22:54:22 +07:00
Gary
2adedc477e [gfycat] Extend _VALID_URL (closes #21779) (#21780) 2019-07-15 22:53:20 +07:00
Sergey M․
898238e9f8 [youtube] Restrict is_live extraction (closes #21782) 2019-07-14 20:30:05 +07:00
Sergey M․
ce80cacefd release 2019.07.14 2019-07-14 03:10:49 +07:00
Sergey M․
0250161c52 [yandexmusic] Add missing import 2019-07-14 03:09:16 +07:00
Sergey M․
364a2cb658 [ChangeLog] Actualize
[ci skip]
2019-07-14 03:07:02 +07:00
hrimfaxi
2fe074a960 [porn91] Fix extraction (#21312) 2019-07-14 02:57:43 +07:00
aerworker
c452790a79 [yandexmusic] Add support for multi disk albums and extract track number and disk number (closes #21420) (#21421)
* [yandexmusic] extract tracks from all volumes of an album (closes #21420)

* [yandexmusic] extract genre, disk_number and track_number

* [yandexmusic] extract decomposed artist names

* Update yandexmusic.py

* Update yandexmusic.py

* Update yandexmusic.py
2019-07-14 02:38:47 +07:00
Sergey M․
d89a0a8026 [lynda] Handle missing subtitles (closes #20490, closes #20513) 2019-07-14 01:45:28 +07:00
geditorit
ba036333bf [youtube] Add more invidious instances to _VALID_URL (#21694) 2019-07-14 01:23:22 +07:00
Sergey M․
b7ef93f0ab [twitter] Improve uploader id extraction (closes #21705) 2019-07-14 01:19:17 +07:00
Sergey M․
f9eeeda31c [spankbang] Fix and improve metadata extraction 2019-07-14 00:21:39 +07:00
Sergey M․
5f562bd4bb [spankbang] Fix extraction (closes #21763, closes #21764) 2019-07-14 00:13:26 +07:00
Remita Amine
b99f11a56b [dlive] restrict DLive Stream _VALID_URL regex 2019-07-13 14:11:57 +01:00
Remita Amine
4a71ef6da6 [dlive] Add new extractor(closes #18080) 2019-07-13 13:08:19 +01:00
Remita Amine
fd95105ed4 [livejournal] Add new extractor(closes #21526) 2019-07-13 12:47:02 +01:00
Remita Amine
c72dc20d09 [roosterteeth] fix free episode extraction(#16094) 2019-07-13 10:13:07 +01:00
Remita Amine
272355c172 [dbtv] fix extraction 2019-07-12 23:26:46 +01:00
Remita Amine
57227618fe [spike] fix Bellator extraction 2019-07-12 22:50:37 +01:00
Remita Amine
0441d6266c [rudo] remove extractor(closes #18430)(closes #18474)
Covered by generic extractor
2019-07-12 22:31:11 +01:00
Remita Amine
82f68e4a01 [facebook] fallback to twitter:image meta for thumbnail extraction(closes #21224) 2019-07-12 22:02:06 +01:00
Remita Amine
d4ece5d359 [bleacherreport] fix Bleacher Report CMS extraction 2019-07-12 21:56:49 +01:00
Remita Amine
16d3672ad7 [espn] fix fivethirtyeight.com extraction 2019-07-11 23:37:34 +01:00
Remita Amine
0dd58a523f [fivetv] relax video URL regex and support https URLs 2019-07-11 23:10:35 +01:00
Sergey M․
27019dbb4b [youtube] Fix is_live extraction (closes #21734) 2019-07-12 03:45:58 +07:00
Sergey M․
baf67a604d [youtube] Fix authentication (closes #11270) 2019-07-12 02:26:05 +07:00
Sergey M․
0d1f4af39d release 2019.07.12 2019-07-12 00:43:54 +07:00
Sergey M․
7612406bf9 [ChangeLog] Actualize
[ci skip]
2019-07-12 00:34:03 +07:00
Sergey M․
4dcd4b7b16 [mgtv] Pass Referer HTTP header for format URLs (closes #21726) 2019-07-12 00:04:25 +07:00
Sergey M․
5fc0896168 [beeg] Add support for api/v6 v2 URLs without t argument (closes #21701) 2019-07-11 23:37:09 +07:00
Remita Amine
e4d53148f5 [funnyordie] move extraction to VoxMedia extractor and improve vox volume embed extraction(closes #16846) 2019-07-10 16:47:37 +01:00
Remita Amine
cfe781d4fa [gameinformer] fix extraction(closes #8895)(closes #15363)(closes #17206) 2019-07-10 15:45:00 +01:00
Remita Amine
253289656f [extractors] update funk.net import 2019-07-10 13:57:43 +01:00
Remita Amine
4b30282616 [funk] fix extraction(closes #17915) 2019-07-10 13:54:49 +01:00
Remita Amine
c9b0564ac1 [packtpub] Relax lesson _VALID_URL regex(closes #21695) 2019-07-09 11:56:16 +01:00
Remita Amine
25d71fb058 [packtpub] fix extraction(closes #21268) 2019-07-09 08:28:56 +01:00
Sergey M․
a6389abfd7 [philharmoniedeparis] Relax _VALID_URL (closes #21672) 2019-07-06 23:17:45 +07:00
Sergey M․
d18003a141 [peertube] Detect embed URLs in generic extraction (closes #21666) 2019-07-06 00:50:56 +07:00
Hendrik Schröter
d1850c1a97 [mixer:vod] Relax _VALID_URL (closes #21657) (#21658) 2019-07-05 22:47:32 +07:00
Remita Amine
c9fa84d88e [lecturio] add support id based URLs(closes #21630) 2019-07-04 15:59:45 +01:00
Sergey M․
a30c2f4055 [go] Add site info for disneynow (closes #21613) 2019-07-04 04:01:30 +07:00
Sergey M․
5ae9b8b3a3 [adobepass] Add support for AT&T U-verse (mso ATT) (closes #13938, closes #21016) 2019-07-04 03:57:11 +07:00
Sergey M․
cdb7c7d147 [ted] Restrict info regex (closes #21631) 2019-07-04 02:04:23 +07:00
David Caldwell
2da4316e48 [twitch:vod] Actualize m3u8 URL (#21538, #21607) 2019-07-03 23:22:23 +07:00
Sergey M․
313877c6a2 [vzaar] Fix videos with empty title (closes #21606) 2019-07-03 23:16:40 +07:00
Remita Amine
e61ac1a09c [tvland] fix extraction(closes #21384) 2019-07-03 13:31:47 +01:00
Remita Amine
ff0f4cfeba [arte] clean extractor(closes #15583)(closes #21614) 2019-07-02 22:09:40 +01:00
Sergey M․
1335bf10f6 release 2019.07.02 2019-07-02 01:09:59 +07:00
Sergey M․
c8343f0a43 [ChangeLog] Actualize
[ci skip]
2019-07-02 01:07:54 +07:00
nyuszika7h
d1e4116427 [vevo] Add support for embed.vevo.com URLs (#21565) 2019-07-02 00:13:23 +07:00
smed79
9baf69af45 [openload] Add support for oload.biz (#21574) 2019-07-02 00:11:38 +07:00
Fai
918398092c [xiami] Update API base URL (#21575) 2019-07-02 00:10:55 +07:00
xyssy
4e2491f066 [yourporn] Fix extraction (#21585) 2019-07-02 00:05:51 +07:00
Remita Amine
976e1ff7f9 [acast] add support for URLs with episode id(closes #21444) 2019-07-01 12:05:18 +01:00
Remita Amine
5e3da0d42b [dailymotion] add support embed with DM.player js call 2019-07-01 08:37:21 +01:00
Sergey M․
c560680247 [soundcloud] Update client id 2019-06-29 00:33:35 +07:00
Sergey M․
f7a147e3b6 [utils] Introduce random_user_agent and use as default User-Agent (closes #21546) 2019-06-29 00:32:43 +07:00
Sergey M․
8c8cae91ec release 2019.06.27 2019-06-27 23:57:33 +07:00
Sergey M․
232331c0d2 [ChangeLog] Actualize
[ci skip]
2019-06-27 23:55:15 +07:00
Sergey M․
4f71473ef1 [go] Add support for disneynow.com (closes #21528) 2019-06-27 22:59:30 +07:00
Mike Fährmann
6625bf200d [mixer:vod] Relax _VALID_URL (closes #21531) (#21536) 2019-06-27 22:24:46 +07:00
Sergey M․
f562994660 [drtv] Relax _VALID_URL 2019-06-27 22:18:10 +07:00
Remita Amine
509bcec37b [fusion] fix extraction(closes #17775)(closes #21269) 2019-06-27 12:06:31 +01:00
Sergey M․
1d83e9bd4b [nfb] Remove extractor (closes #21518)
Covered by generic extractor
2019-06-25 00:12:31 +07:00
Sergey M․
27cef8885d [beeg] Add support for api/v6 v2 URLs (closes #21511) 2019-06-24 23:01:52 +07:00
Kyle
3031b7c4ed [brightcove:new] Add support for playlists (#21331) 2019-06-23 17:04:05 +07:00
smed79
695720ebe8 [openload] Add support for oload.life (#21495) 2019-06-23 04:31:43 +07:00
Sergey M․
2605043d6d [vimeo:channel,group] Make title extraction no fatal 2019-06-23 02:16:09 +07:00
Sergey M․
091c9b4316 [vimeo:likes] Implement extrator in terms of channel extractor
This allows to obtain videos' ids before extraction (#21493)
2019-06-23 02:13:46 +07:00
Sergey M․
9634de178d [pornhub] Add support for more paged video sources 2019-06-22 08:37:07 +07:00
Sergey M․
1f7a563ab0 [pornhub] Add support for downloading single pages and search pages (closes #15570) 2019-06-22 06:01:43 +07:00
Sergey M․
21b08463a7 [pornhub] Rework extractors (closes #11922, closes #16078, closes #17454, closes #17936) 2019-06-22 05:34:46 +07:00
Sergey M․
31ce6e9966 [youtube] Add another signature function pattern 2019-06-22 02:22:41 +07:00
Sergey M․
1c11204056 [tf1] Improve extraction and fix issues (closes #21372) 2019-06-22 00:40:06 +07:00
Emmanuel Froissart
9c2aaac268 [tf1] Fix wat id extraction (closes #21365) 2019-06-22 00:40:00 +07:00
Sergey M․
d415957dbc [crunchyroll] Move Accept-Language workaround to video extractor since it causes playlists not to list any videos 2019-06-22 00:15:52 +07:00
Sergey M․
4681441d2f [crunchyroll:playlist] Fix and relax title extraction (closes #21291, closes #21443) 2019-06-22 00:07:26 +07:00
Sergey M․
9842d29d66 release 2019.06.21 2019-06-21 23:04:09 +07:00
Sergey M․
bc6438c092 [ChangeLog] Actualize
[ci skip]
2019-06-21 23:01:09 +07:00
Sergey M․
abefc03f51 [youtube] Update signature function patterns (closes #21469, closes #21476) 2019-06-21 22:58:42 +07:00
Sergey M․
c40714cdee [youtube] Make --write-annotations non fatal (closes #21452) 2019-06-20 00:57:58 +07:00
Remita Amine
7c24a58bdb [sixplay] add support for rtlmost.hu(#21405) 2019-06-16 06:32:47 +01:00
Sergey M․
b85eae0f05 [youtube] Hardcode codec metadata for av01 video only formats (closes #21381) 2019-06-14 01:59:05 +07:00
Sergey M․
28cc2241e4 [utils] Restrict parse_codecs and add theora as known vcodec (#21381) 2019-06-14 01:56:17 +07:00
Remita Amine
8361e7f934 [toutv] update client key(closes #21370) 2019-06-12 21:41:46 +01:00
Remita Amine
427cc21531 [biqle] remove unnecessary regex group 2019-06-10 15:17:26 +01:00
Remita Amine
f991dd2bfb [biqle] add support for new embed domain 2019-06-10 14:58:37 +01:00
Remita Amine
35c2dd48d9 [cbs] improve DRM protected videos detection(closes #21339) 2019-06-09 00:13:07 +01:00
Sergey M․
3b2fd09596 release 2019.06.08 2019-06-08 03:47:30 +07:00
Sergey M․
e35fc5ebc7 [ChangeLog] Actualize
[ci skip]
2019-06-08 03:45:31 +07:00
Sergey M․
dbb1886114 [ted] Improve playlist extraction (closes #21032) 2019-06-08 03:07:08 +07:00
biwubo
c2ee6fa66a [ted] Fix playlist extraction (closes #20844) 2019-06-08 03:07:02 +07:00
Sergey M․
4831ef7fe4 [vlive] Add comments to tests and quotes consistency 2019-06-08 02:42:01 +07:00
biwubo
178663df52 [vlive:playlist] Redirect to video extractor when no playlist is found (#20590) 2019-06-08 02:40:47 +07:00
Sergey M․
ef19739e64 [vlive] Improve and style (closes #21209) 2019-06-08 02:22:23 +07:00
biwubo
01b517a20a [vlive] Add CH+ support (closes #16887) 2019-06-08 02:22:15 +07:00
Miroslav Šedivý
f4c99cd635 [openload] Add support for oload.website (#21329) 2019-06-08 02:03:58 +07:00
Sergey M․
e75220b11a [tvnow] Extract HD formats (closes #21201) 2019-06-08 00:45:28 +07:00
Sergey M․
2efefddafd [redbulltv:rrn:content] Add extractor (closes #21297) 2019-06-07 23:48:57 +07:00
Sergey M․
bf3c932663 [youtube] Fix average rating extraction (closes #21304) 2019-06-07 23:13:33 +07:00
Sergey M․
4c78c3d700 [bitchute] Extract HTML5 formats (closes #21306) 2019-06-07 22:58:19 +07:00
Remita Amine
2e11e51c04 [cbsnews] fix extraction(closes #9659)(closes #15397) 2019-06-05 01:23:23 +01:00
Sergey M․
1a01639bf9 [downloader/common] Improve rate limit (#21301) 2019-06-05 03:06:35 +07:00
Remita Amine
59ca17b1c8 [vvvvid] relax _VALID_URL(closes #21299) 2019-06-04 21:03:52 +01:00
Remita Amine
c94c121a99 Revert "[southpark] add support for videos with english audio track"
This reverts commit 0c84002650.
2019-06-04 00:37:58 +01:00
Remita Amine
0c84002650 [southpark] add support for videos with english audio track(closes #21271) 2019-06-03 12:43:39 +01:00
Remita Amine
c5eb75b35a [prosiebensat1] add support for new API(closes #21272) 2019-06-01 15:12:30 +01:00
Remita Amine
0e2dd3fcbc [vrv] extract adaptive_hls formats(closes #21243) 2019-06-01 11:17:44 +01:00
Sergey M․
26a87972a9 [viki] Switch to HTTPS (closes #21001) 2019-05-29 04:42:19 +07:00
bitraid
33b2218b2f [LiveLeak] Check if the original videos exist (closes #21206) (#21208) 2019-05-28 12:31:11 +00:00
Remita Amine
ead467a9c1 [rtp] fix extraction(closes #15099) 2019-05-28 04:58:12 +01:00
Remita Amine
0d29751890 [youtube] improve DRM protected videos detection(#1774) 2019-05-25 23:15:59 +01:00
Sergey M․
25b83c2a0e [srgssrplay] Improve _VALID_URL (closes #21155) 2019-05-24 00:45:06 +07:00
Stefan Pöschel
3fe774722b [srgssrplay] Add support for popupvideoplayer URLs 2019-05-24 00:44:37 +07:00
Sergey M․
f4cc2ca503 [24video] Add support for porno.24video.net (closes #21194) 2019-05-24 00:38:06 +07:00
Sergey M․
11ec06de7f [24video] Add support for 24video.site (closes #21193) 2019-05-24 00:35:46 +07:00
Sergey M․
53cd37bac5 [utils] Improve strip_or_none 2019-05-24 00:03:01 +07:00
Sergey M․
f856816b94 [extractor/common] Strip src attribute for HTML5 entries code (closes #18485, closes #21169) 2019-05-23 23:52:11 +07:00
Sergey M․
8af49fc276 [pornflip] Remove extractor 2019-05-23 23:48:06 +07:00
Sergey M․
9c5f2988b9 [criterion] Remove extractor (closes #21195) 2019-05-23 23:38:01 +07:00
Sergey M․
afd4985f72 [travis] Force dist to Ubuntu Trusty by default
According to https://blog.travis-ci.com/2019-04-15-xenial-default-build-environment Ubuntu Xenial is now default, but it lacks python 2.6, 3.2 and 3.3 support needed by tests
2019-05-23 06:06:49 +07:00
NRTICN
2c53c0ebc6 [pornhub] Use https (#21061) 2019-05-23 03:56:54 +07:00
Georgi Saev
bbf1defe58 [bitchute] Fix uploader extraction (#21076) 2019-05-23 03:51:50 +07:00
Malte Kiefer
186d185b6e [streamcloud] Reduce waiting time to 6 seconds (#21092) 2019-05-23 03:46:20 +07:00
ealgase
612300a686 [novamov] Remove extractors (#21077)
Sites no longer exist
2019-05-23 03:38:48 +07:00
smed79
ea75382094 [openload] Add support for oload.press (#21135) 2019-05-23 03:30:17 +07:00
Sergey M․
e438e81469 [vivo] Improve extraction (closes #19217) 2019-05-23 03:06:07 +07:00
Daniel Höpfl
0e6f914b3b [vivo] Fix extraction (closes #18906) 2019-05-23 03:06:07 +07:00
Sergey M․
6ab30ff50b release 2019.05.20 2019-05-20 23:29:49 +07:00
Sergey M․
42c971341b [ChangeLog] Actualize
[ci skip]
2019-05-20 23:24:27 +07:00
Sergey M․
ce2fe4c01c [extractor/common] Add doc string for _apply_first_set_cookie_header 2019-05-20 23:23:18 +07:00
Sergey M․
a9e03736df [safari] Fix authentication (closes #21090) 2019-05-18 03:33:40 +07:00
Sergey M․
e3c1266f49 [extractor/common] Move workaround for applying first Set-Cookie header into a separate method 2019-05-18 03:17:15 +07:00
Remita Amine
82e91d20a0 [vrt] fix extraction(closes #20527) 2019-05-15 15:02:51 +01:00
Remita Amine
170d644440 [canvas] add support for vrtnieuws and sporza site ids and extract AES HLS formats 2019-05-15 14:58:57 +01:00
Remita Amine
5831742840 [vrv] extract captions(closes #19238) 2019-05-15 10:39:59 +01:00
Sergey M․
a277dd33eb [tele5] Improve video id extraction 2019-05-13 00:13:00 +07:00
Sergey M․
0e0bfd334c [tele5] Relax _VALID_URL (closes #21020, closes #21063) 2019-05-13 00:11:48 +07:00
Sergey M․
e6a25fea23 [svtplay] Update API URL (closes #21075) 2019-05-12 23:24:01 +07:00
Remita Amine
b7df8f90a7 [yahoo:gyao] add X-User-Agent header to dam proxy requests(closes #21071) 2019-05-12 10:23:36 +01:00
147 changed files with 5738 additions and 4802 deletions

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.10.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.10.16**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.05.11
[debug] youtube-dl version 2019.10.16
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.10.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.10.16**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.10.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.10.16**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.10.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.10.16**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.05.11
[debug] youtube-dl version 2019.10.16
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.10.16. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.10.16**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -9,6 +9,7 @@ python:
- "3.6"
- "pypy"
- "pypy3"
dist: trusty
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download

View File

@@ -339,6 +339,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

322
ChangeLog
View File

@@ -1,3 +1,325 @@
version 2019.10.16
Core
* [extractor/common] Make _is_valid_url more relaxed
Extractors
* [vimeo] Improve album videos id extraction (#22599)
+ [globo] Extract subtitles (#22713)
* [bokecc] Improve player params extraction (#22638)
* [nexx] Handle result list (#22666)
* [vimeo] Fix VHX embed extraction
* [nbc] Switch to graphql API (#18581, #22693, #22701)
- [vessel] Remove extractor
- [promptfile] Remove extractor (#6239)
* [kaltura] Fix service URL extraction (#22658)
* [kaltura] Fix embed info strip (#22658)
* [globo] Fix format extraction (#20319)
* [redtube] Improve metadata extraction (#22492, #22615)
* [pornhub:uservideos:upload] Fix extraction (#22619)
+ [telequebec:squat] Add support for squat.telequebec.tv (#18503)
- [wimp] Remove extractor (#22088, #22091)
+ [gfycat] Extend URL regular expression (#22225)
+ [chaturbate] Extend URL regular expression (#22309)
* [peertube] Update instances (#22414)
+ [telequebec] Add support for coucou.telequebec.tv (#22482)
+ [xvideos] Extend URL regular expression (#22471)
- [youtube] Remove support for invidious.enkirton.net (#22543)
+ [openload] Add support for oload.monster (#22592)
* [nrktv:seriebase] Fix extraction (#22596)
+ [youtube] Add support for yt.lelux.fi (#22597)
* [orf:tvthek] Make manifest requests non fatal (#22578)
* [teachable] Skip login when already logged in (#22572)
* [viewlift] Improve extraction (#22545)
* [nonktube] Fix extraction (#22544)
version 2019.09.28
Core
* [YoutubeDL] Honour all --get-* options with --flat-playlist (#22493)
Extractors
* [vk] Fix extraction (#22522)
* [heise] Fix kaltura embeds extraction (#22514)
* [ted] Check for resources validity and extract subtitled downloads (#22513)
+ [youtube] Add support for
owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya.b32.i2p (#22292)
+ [nhk] Add support for clips
* [nhk] Fix video extraction (#22249, #22353)
* [byutv] Fix extraction (#22070)
+ [openload] Add support for oload.online (#22304)
+ [youtube] Add support for invidious.drycat.fr (#22451)
* [jwplatfom] Do not match video URLs (#20596, #22148)
* [youtube:playlist] Unescape playlist uploader (#22483)
+ [bilibili] Add support audio albums and songs (#21094)
+ [instagram] Add support for tv URLs
+ [mixcloud] Allow uppercase letters in format URLs (#19280)
* [brightcove] Delegate all supported legacy URLs to new extractor (#11523,
#12842, #13912, #15669, #16303)
* [hotstar] Use native HLS downloader by default
+ [hotstar] Extract more formats (#22323)
* [9now] Fix extraction (#22361)
* [zdf] Bypass geo restriction
+ [tv4] Extract series metadata
* [tv4] Fix extraction (#22443)
version 2019.09.12.1
Extractors
* [youtube] Remove quality and tbr for itag 43 (#22372)
version 2019.09.12
Extractors
* [youtube] Quick extraction tempfix (#22367, #22163)
version 2019.09.01
Core
+ [extractor/generic] Add support for squarespace embeds (#21294, #21802,
#21859)
+ [downloader/external] Respect mtime option for aria2c (#22242)
Extractors
+ [xhamster:user] Add support for user pages (#16330, #18454)
+ [xhamster] Add support for more domains
+ [verystream] Add support for woof.tube (#22217)
+ [dailymotion] Add support for lequipe.fr (#21328, #22152)
+ [openload] Add support for oload.vip (#22205)
+ [bbccouk] Extend URL regular expression (#19200)
+ [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223)
* [safari] Fix authentication (#22161, #22184)
* [usanetwork] Fix extraction (#22105)
+ [einthusan] Add support for einthusan.ca (#22171)
* [youtube] Improve unavailable message extraction (#22117)
+ [piksel] Extract subtitles (#20506)
version 2019.08.13
Core
* [downloader/fragment] Fix ETA calculation of resumed download (#21992)
* [YoutubeDL] Check annotations availability (#18582)
Extractors
* [youtube:playlist] Improve flat extraction (#21927)
* [youtube] Fix annotations extraction (#22045)
+ [discovery] Extract series meta field (#21808)
* [youtube] Improve error detection (#16445)
* [vimeo] Fix album extraction (#1933, #15704, #15855, #18967, #21986)
+ [roosterteeth] Add support for watch URLs
* [discovery] Limit video data by show slug (#21980)
version 2019.08.02
Extractors
+ [tvigle] Add support for HLS and DASH formats (#21967)
* [tvigle] Fix extraction (#21967)
+ [yandexvideo] Add support for DASH formats (#21971)
* [discovery] Use API call for video data extraction (#21808)
+ [mgtv] Extract format_note (#21881)
* [tvn24] Fix metadata extraction (#21833, #21834)
* [dlive] Relax URL regular expression (#21909)
+ [openload] Add support for oload.best (#21913)
* [youtube] Improve metadata extraction for age gate content (#21943)
version 2019.07.30
Extractors
* [youtube] Fix and improve title and description extraction (#21934)
version 2019.07.27
Extractors
+ [yahoo:japannews] Add support for yahoo.co.jp (#21698, #21265)
+ [discovery] Add support go.discovery.com URLs
* [youtube:playlist] Relax video regular expression (#21844)
* [generic] Restrict --default-search schemeless URLs detection pattern
(#21842)
* [vrv] Fix CMS signing query extraction (#21809)
version 2019.07.16
Extractors
+ [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv
(#21281, #21290)
* [kaltura] Check source format URL (#21290)
* [ctsnews] Fix YouTube embeds extraction (#21678)
+ [einthusan] Add support for einthusan.com (#21748, #21775)
+ [youtube] Add support for invidious.mastodon.host (#21777)
+ [gfycat] Extend URL regular expression (#21779, #21780)
* [youtube] Restrict is_live extraction (#21782)
version 2019.07.14
Extractors
* [porn91] Fix extraction (#21312)
+ [yandexmusic] Extract track number and disk number (#21421)
+ [yandexmusic] Add support for multi disk albums (#21420, #21421)
* [lynda] Handle missing subtitles (#20490, #20513)
+ [youtube] Add more invidious instances to URL regular expression (#21694)
* [twitter] Improve uploader id extraction (#21705)
* [spankbang] Fix and improve metadata extraction
* [spankbang] Fix extraction (#21763, #21764)
+ [dlive] Add support for dlive.tv (#18080)
+ [livejournal] Add support for livejournal.com (#21526)
* [roosterteeth] Fix free episode extraction (#16094)
* [dbtv] Fix extraction
* [bellator] Fix extraction
- [rudo] Remove extractor (#18430, #18474)
* [facebook] Fallback to twitter:image meta for thumbnail extraction (#21224)
* [bleacherreport] Fix Bleacher Report CMS extraction
* [espn] Fix fivethirtyeight.com extraction
* [5tv] Relax video URL regular expression and support https URLs
* [youtube] Fix is_live extraction (#21734)
* [youtube] Fix authentication (#11270)
version 2019.07.12
Core
+ [adobepass] Add support for AT&T U-verse (mso ATT) (#13938, #21016)
Extractors
+ [mgtv] Pass Referer HTTP header for format URLs (#21726)
+ [beeg] Add support for api/v6 v2 URLs without t argument (#21701)
* [voxmedia:volume] Improvevox embed extraction (#16846)
* [funnyordie] Move extraction to VoxMedia extractor (#16846)
* [gameinformer] Fix extraction (#8895, #15363, #17206)
* [funk] Fix extraction (#17915)
* [packtpub] Relax lesson URL regular expression (#21695)
* [packtpub] Fix extraction (#21268)
* [philharmoniedeparis] Relax URL regular expression (#21672)
* [peertube] Detect embed URLs in generic extraction (#21666)
* [mixer:vod] Relax URL regular expression (#21657, #21658)
+ [lecturio] Add support id based URLs (#21630)
+ [go] Add site info for disneynow (#21613)
* [ted] Restrict info regular expression (#21631)
* [twitch:vod] Actualize m3u8 URL (#21538, #21607)
* [vzaar] Fix videos with empty title (#21606)
* [tvland] Fix extraction (#21384)
* [arte] Clean extractor (#15583, #21614)
version 2019.07.02
Core
+ [utils] Introduce random_user_agent and use as default User-Agent (#21546)
Extractors
+ [vevo] Add support for embed.vevo.com URLs (#21565)
+ [openload] Add support for oload.biz (#21574)
* [xiami] Update API base URL (#21575)
* [yourporn] Fix extraction (#21585)
+ [acast] Add support for URLs with episode id (#21444)
+ [dailymotion] Add support for DM.player embeds
* [soundcloud] Update client id
version 2019.06.27
Extractors
+ [go] Add support for disneynow.com (#21528)
* [mixer:vod] Relax URL regular expression (#21531, #21536)
* [drtv] Relax URL regular expression
* [fusion] Fix extraction (#17775, #21269)
- [nfb] Remove extractor (#21518)
+ [beeg] Add support for api/v6 v2 URLs (#21511)
+ [brightcove:new] Add support for playlists (#21331)
+ [openload] Add support for oload.life (#21495)
* [vimeo:channel,group] Make title extraction non fatal
* [vimeo:likes] Implement extrator in terms of channel extractor (#21493)
+ [pornhub] Add support for more paged video sources
+ [pornhub] Add support for downloading single pages and search pages (#15570)
* [pornhub] Rework extractors (#11922, #16078, #17454, #17936)
+ [youtube] Add another signature function pattern
* [tf1] Fix extraction (#21365, #21372)
* [crunchyroll] Move Accept-Language workaround to video extractor since
it causes playlists not to list any videos
* [crunchyroll:playlist] Fix and relax title extraction (#21291, #21443)
version 2019.06.21
Core
* [utils] Restrict parse_codecs and add theora as known vcodec (#21381)
Extractors
* [youtube] Update signature function patterns (#21469, #21476)
* [youtube] Make --write-annotations non fatal (#21452)
+ [sixplay] Add support for rtlmost.hu (#21405)
* [youtube] Hardcode codec metadata for av01 video only formats (#21381)
* [toutv] Update client key (#21370)
+ [biqle] Add support for new embed domain
* [cbs] Improve DRM protected videos detection (#21339)
version 2019.06.08
Core
* [downloader/common] Improve rate limit (#21301)
* [utils] Improve strip_or_none
* [extractor/common] Strip src attribute for HTML5 entries code (#18485,
#21169)
Extractors
* [ted] Fix playlist extraction (#20844, #21032)
* [vlive:playlist] Fix video extraction when no playlist is found (#20590)
+ [vlive] Add CH+ support (#16887, #21209)
+ [openload] Add support for oload.website (#21329)
+ [tvnow] Extract HD formats (#21201)
+ [redbulltv] Add support for rrn:content URLs (#21297)
* [youtube] Fix average rating extraction (#21304)
+ [bitchute] Extract HTML5 formats (#21306)
* [cbsnews] Fix extraction (#9659, #15397)
* [vvvvid] Relax URL regular expression (#21299)
+ [prosiebensat1] Add support for new API (#21272)
+ [vrv] Extract adaptive_hls formats (#21243)
* [viki] Switch to HTTPS (#21001)
* [LiveLeak] Check if the original videos exist (#21206, #21208)
* [rtp] Fix extraction (#15099)
* [youtube] Improve DRM protected videos detection (#1774)
+ [srgssrplay] Add support for popupvideoplayer URLs (#21155)
+ [24video] Add support for porno.24video.net (#21194)
+ [24video] Add support for 24video.site (#21193)
- [pornflip] Remove extractor
- [criterion] Remove extractor (#21195)
* [pornhub] Use HTTPS (#21061)
* [bitchute] Fix uploader extraction (#21076)
* [streamcloud] Reduce waiting time to 6 seconds (#21092)
- [novamov] Remove extractors (#21077)
+ [openload] Add support for oload.press (#21135)
* [vivo] Fix extraction (#18906, #19217)
version 2019.05.20
Core
+ [extractor/common] Move workaround for applying first Set-Cookie header
into a separate _apply_first_set_cookie_header method
Extractors
* [safari] Fix authentication (#21090)
* [vk] Use _apply_first_set_cookie_header
* [vrt] Fix extraction (#20527)
+ [canvas] Add support for vrtnieuws and sporza site ids and extract
AES HLS formats
+ [vrv] Extract captions (#19238)
* [tele5] Improve video id extraction
* [tele5] Relax URL regular expression (#21020, #21063)
* [svtplay] Update API URL (#21075)
+ [yahoo:gyao] Add X-User-Agent header to dam proxy requests (#21071)
version 2019.05.11
Core

View File

@@ -1216,6 +1216,72 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
Correct:
```python
title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
```
Incorrect:
```python
TITLE_RE = r'<title>([^<]+)</title>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
### Collapse fallbacks
Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
#### Example
Good:
```python
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
```
Unwieldy:
```python
description = (
self._og_search_description(webpage, default=None)
or self._html_search_meta('description', webpage, default=None)
or self._html_search_meta('twitter:description', webpage, default=None))
```
Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
### Trailing parentheses
Always move trailing parentheses after the last argument.
#### Example
Correct:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
```
Incorrect:
```python
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
)
```
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

View File

@@ -58,16 +58,8 @@
- **ARD:mediathek**
- **ARDBetaMediathek**
- **Arkena**
- **arte.tv**
- **arte.tv:+7**
- **arte.tv:cinema**
- **arte.tv:concert**
- **arte.tv:creative**
- **arte.tv:ddc**
- **arte.tv:embed**
- **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine**
- **arte.tv:playlist**
- **AsianCrush**
- **AsianCrushPlaylist**
@@ -78,7 +70,6 @@
- **AudioBoom**
- **audiomack**
- **audiomack:album**
- **auroravid**: AuroraVid
- **AWAAN**
- **awaan:live**
- **awaan:season**
@@ -107,6 +98,8 @@
- **Bigflix**
- **Bild**: Bild.de
- **BiliBili**
- **BilibiliAudio**
- **BilibiliAudioAlbum**
- **BioBioChileTV**
- **BIQLE**
- **BitChute**
@@ -150,6 +143,7 @@
- **CBSInteractive**
- **CBSLocal**
- **cbsnews**: CBS News
- **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports**
- **CCMA**
@@ -174,7 +168,6 @@
- **Clipsyndicate**
- **CloserToTruth**
- **CloudflareStream**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
- **Clyp**
@@ -194,7 +187,6 @@
- **Coub**
- **Cracked**
- **Crackle**
- **Criterion**
- **CrooksAndLiars**
- **crunchyroll**
- **crunchyroll:playlist**
@@ -233,6 +225,8 @@
- **DiscoveryNetworksDe**
- **DiscoveryVR**
- **Disney**
- **dlive:stream**
- **dlive:vod**
- **Dotsub**
- **DouyuShow**
- **DouyuTV**: 斗鱼
@@ -315,9 +309,7 @@
- **FrontendMastersCourse**
- **FrontendMastersLesson**
- **Funimation**
- **FunkChannel**
- **FunkMix**
- **FunnyOrDie**
- **Funk**
- **Fusion**
- **Fux**
- **FXNetworks**
@@ -460,6 +452,7 @@
- **linkedin:learning:course**
- **LinuxAcademy**
- **LiTV**
- **LiveJournal**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
@@ -583,7 +576,6 @@
- **NextTV**: 壹電視
- **Nexx**
- **NexxEmbed**
- **nfb**: National Film Board of Canada
- **nfl.com**
- **NhkVod**
- **nhl.com**
@@ -609,7 +601,6 @@
- **nowness**
- **nowness:playlist**
- **nowness:series**
- **nowvideo**: NowVideo
- **Noz**
- **npo**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **npo.nl:live**
@@ -693,17 +684,16 @@
- **PopcornTV**
- **PornCom**
- **PornerBros**
- **PornFlip**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist**
- **PornHubUserVideos**
- **PornHubPagedVideoList**
- **PornHubUser**
- **PornHubUserVideosUpload**
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
- **PornTube**
- **PressTV**
- **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital
- **puhutv**
- **puhutv:serie**
@@ -734,6 +724,7 @@
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBullTV**
- **RedBullTVRrnContent**
- **Reddit**
- **RedditR**
- **RedTube**
@@ -767,7 +758,6 @@
- **rtve.es:television**
- **RTVNH**
- **RTVS**
- **Rudo**
- **RUHD**
- **rutube**: Rutube videos
- **rutube:channel**: Rutube channels
@@ -893,13 +883,13 @@
- **TeleQuebec**
- **TeleQuebecEmission**
- **TeleQuebecLive**
- **TeleQuebecSquat**
- **TeleTask**
- **Telewebion**
- **TennisTV**
- **TF1**
- **TFO**
- **TheIntercept**
- **theoperaplatform**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
@@ -1001,7 +991,6 @@
- **VeeHD**
- **Veoh**
- **verystream**
- **Vessel**
- **Vesti**: Вести.Ru
- **Vevo**
- **VevoPlaylist**
@@ -1024,7 +1013,6 @@
- **videomore:video**
- **VideoPremium**
- **VideoPress**
- **videoweed**: VideoWeed
- **Vidio**
- **VidLii**
- **vidme**
@@ -1071,7 +1059,7 @@
- **VoxMediaVolume**
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **Vrak**
- **VRT**: deredactie.be, sporza.be, cobra.be and cobra.canvas.be
- **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza
- **VrtNU**: VrtNU.be
- **vrv**
- **vrv:series**
@@ -1101,8 +1089,6 @@
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop**
@@ -1114,6 +1100,7 @@
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
- **XHamster**
- **XHamsterEmbed**
- **XHamsterUser**
- **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
@@ -1131,6 +1118,7 @@
- **Yahoo**: Yahoo screen and movies
- **yahoo:gyao**
- **yahoo:gyao:player**
- **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист

View File

@@ -73,6 +73,7 @@ from youtube_dl.utils import (
smuggle_url,
str_to_int,
strip_jsonp,
strip_or_none,
timeconvert,
unescapeHTML,
unified_strdate,
@@ -752,6 +753,18 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'})
def test_strip_or_none(self):
self.assertEqual(strip_or_none(' abc'), 'abc')
self.assertEqual(strip_or_none('abc '), 'abc')
self.assertEqual(strip_or_none(' abc '), 'abc')
self.assertEqual(strip_or_none('\tabc\t'), 'abc')
self.assertEqual(strip_or_none('\n\tabc\n\t'), 'abc')
self.assertEqual(strip_or_none('abc'), 'abc')
self.assertEqual(strip_or_none(''), '')
self.assertEqual(strip_or_none(None), None)
self.assertEqual(strip_or_none(42), None)
self.assertEqual(strip_or_none([]), None)
def test_uppercase_escape(self):
self.assertEqual(uppercase_escape(''), '')
self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')
@@ -809,6 +822,15 @@ class TestUtil(unittest.TestCase):
'vcodec': 'av01.0.05M.08',
'acodec': 'none',
})
self.assertEqual(parse_codecs('theora, vorbis'), {
'vcodec': 'theora',
'acodec': 'vorbis',
})
self.assertEqual(parse_codecs('unknownvcodec, unknownacodec'), {
'vcodec': 'unknownvcodec',
'acodec': 'unknownacodec',
})
self.assertEqual(parse_codecs('unknown'), {})
def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]"

View File

@@ -852,8 +852,9 @@ class YoutubeDL(object):
extract_flat = self.params.get('extract_flat', False)
if ((extract_flat == 'in_playlist' and 'playlist' in extra_info)
or extract_flat is True):
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(ie_result))
self.__forced_printings(
ie_result, self.prepare_filename(ie_result),
incomplete=True)
return ie_result
if result_type == 'video':
@@ -1693,6 +1694,36 @@ class YoutubeDL(object):
subs[lang] = f
return subs
def __forced_printings(self, info_dict, filename, incomplete):
def print_mandatory(field):
if (self.params.get('force%s' % field, False)
and (not incomplete or info_dict.get(field) is not None)):
self.to_stdout(info_dict[field])
def print_optional(field):
if (self.params.get('force%s' % field, False)
and info_dict.get(field) is not None):
self.to_stdout(info_dict[field])
print_mandatory('title')
print_mandatory('id')
if self.params.get('forceurl', False) and not incomplete:
if info_dict.get('requested_formats') is not None:
for f in info_dict['requested_formats']:
self.to_stdout(f['url'] + f.get('play_path', ''))
else:
# For RTMP URLs, also include the playpath
self.to_stdout(info_dict['url'] + info_dict.get('play_path', ''))
print_optional('thumbnail')
print_optional('description')
if self.params.get('forcefilename', False) and filename is not None:
self.to_stdout(filename)
if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
self.to_stdout(formatSeconds(info_dict['duration']))
print_mandatory('format')
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(info_dict))
def process_info(self, info_dict):
"""Process a single resolved IE result."""
@@ -1703,9 +1734,8 @@ class YoutubeDL(object):
if self._num_downloads >= int(max_downloads):
raise MaxDownloadsReached()
# TODO: backward compatibility, to be removed
info_dict['fulltitle'] = info_dict['title']
if len(info_dict['title']) > 200:
info_dict['title'] = info_dict['title'][:197] + '...'
if 'format' not in info_dict:
info_dict['format'] = info_dict['ext']
@@ -1720,29 +1750,7 @@ class YoutubeDL(object):
info_dict['_filename'] = filename = self.prepare_filename(info_dict)
# Forced printings
if self.params.get('forcetitle', False):
self.to_stdout(info_dict['fulltitle'])
if self.params.get('forceid', False):
self.to_stdout(info_dict['id'])
if self.params.get('forceurl', False):
if info_dict.get('requested_formats') is not None:
for f in info_dict['requested_formats']:
self.to_stdout(f['url'] + f.get('play_path', ''))
else:
# For RTMP URLs, also include the playpath
self.to_stdout(info_dict['url'] + info_dict.get('play_path', ''))
if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
self.to_stdout(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
self.to_stdout(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None:
self.to_stdout(filename)
if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
self.to_stdout(formatSeconds(info_dict['duration']))
if self.params.get('forceformat', False):
self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(info_dict))
self.__forced_printings(info_dict, filename, incomplete=False)
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
@@ -1783,6 +1791,8 @@ class YoutubeDL(object):
annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present')
elif not info_dict.get('annotations'):
self.report_warning('There are no annotations to write.')
else:
try:
self.to_screen('[info] Writing video annotations to: ' + annofn)

View File

@@ -94,7 +94,7 @@ def _real_main(argv=None):
if opts.verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
except IOError:
sys.exit('ERROR: batch file could not be read')
sys.exit('ERROR: batch file %s could not be read' % opts.batchfile)
all_urls = batch_urls + [url.strip() for url in args] # batch_urls are already striped in read_batch_urls
_enc = preferredencoding()
all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]

View File

@@ -176,7 +176,9 @@ class FileDownloader(object):
return
speed = float(byte_counter) / elapsed
if speed > rate_limit:
time.sleep(max((byte_counter // rate_limit) - elapsed, 0))
sleep_time = float(byte_counter) / rate_limit - elapsed
if sleep_time > 0:
time.sleep(sleep_time)
def temp_name(self, filename):
"""Returns a temporary filename for the given filename."""

View File

@@ -53,7 +53,7 @@ class DashSegmentsFD(FragmentFD):
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.

View File

@@ -194,6 +194,7 @@ class Aria2cFD(ExternalFD):
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += ['--', info_dict['url']]
return cmd

View File

@@ -190,12 +190,13 @@ class FragmentFD(FileDownloader):
})
def _start_frag_download(self, ctx):
resume_len = ctx['complete_frags_downloaded_bytes']
total_frags = ctx['total_frags']
# This dict stores the download progress, it's updated by the progress
# hook
state = {
'status': 'downloading',
'downloaded_bytes': ctx['complete_frags_downloaded_bytes'],
'downloaded_bytes': resume_len,
'fragment_index': ctx['fragment_index'],
'fragment_count': total_frags,
'filename': ctx['filename'],
@@ -234,8 +235,8 @@ class FragmentFD(FileDownloader):
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
if not ctx['live']:
state['eta'] = self.calc_eta(
start, time_now, estimated_size,
state['downloaded_bytes'])
start, time_now, estimated_size - resume_len,
state['downloaded_bytes'] - resume_len)
state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed']
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes

View File

@@ -146,7 +146,7 @@ def write_piff_header(stream, params):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete representation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps

View File

@@ -15,10 +15,13 @@ class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video'
_VALID_URL = r'''(?x)
https?://
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
)|
fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
)
(?P<id>\d+)
'''

View File

@@ -7,6 +7,7 @@ import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
float_or_none,
int_or_none,
try_get,
@@ -27,7 +28,7 @@ class ACastIE(InfoExtractor):
'''
_TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': 'a02393c74f3bdb1801c3ec2695577ce0',
'md5': '16d936099ec5ca2d5869e3a813ee8dc4',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
@@ -46,28 +47,37 @@ class ACastIE(InfoExtractor):
}, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
'only_matching': True,
}]
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json(
'https://play-api.acast.com/stitch/%s/%s' % (channel, display_id),
display_id)['result']
'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id),
display_id)
media_url = s['url']
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id):
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e['name']
title = e.get('name') or s['title']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': media_url,
'title': title,
'description': e.get('description') or e.get('summary'),
'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate')),
'duration': float_or_none(s.get('duration') or e.get('duration')),
'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),

View File

@@ -25,6 +25,11 @@ MSO_INFO = {
'username_field': 'username',
'password_field': 'password',
},
'ATT': {
'name': 'AT&T U-verse',
'username_field': 'userid',
'password_field': 'password',
},
'ATTOTT': {
'name': 'DIRECTV NOW',
'username_field': 'email',

View File

@@ -4,17 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
find_xpath_attr,
get_element_by_attribute,
int_or_none,
NO_DEFAULT,
qualities,
try_get,
unified_strdate,
@@ -25,59 +18,7 @@ from ..utils import (
# add tests.
class ArteTvIE(InfoExtractor):
_VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
lang = mobj.group('lang')
video_id = mobj.group('id')
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
ref_xml_doc = self._download_xml(
ref_xml_url, video_id, note='Downloading metadata')
config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
config_xml_url = config_node.attrib['ref']
config = self._download_xml(
config_xml_url, video_id, note='Downloading configuration')
formats = [{
'format_id': q.attrib['quality'],
# The playpath starts at 'mp4:', if we don't manually
# split the url, rtmpdump will incorrectly parse them
'url': q.text.split('mp4:', 1)[0],
'play_path': 'mp4:' + q.text.split('mp4:', 1)[1],
'ext': 'flv',
'quality': 2 if q.attrib['quality'] == 'hd' else 1,
} for q in config.findall('./urls/url')]
self._sort_formats(formats)
title = config.find('.//name').text
thumbnail = config.find('.//firstThumbnailUrl').text
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}
class ArteTVBaseIE(InfoExtractor):
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
if 'vid' in query:
video_id = query['vid'][0]
else:
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
@@ -108,13 +49,15 @@ class ArteTVBaseIE(InfoExtractor):
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
LANGS = {
'fr': 'F',
'de': 'A',
'en': 'E[ANG]',
'es': 'E[ESP]',
'it': 'E[ITA]',
'pl': 'E[POL]',
}
langcode = LANGS.get(lang, lang)
@@ -126,8 +69,8 @@ class ArteTVBaseIE(InfoExtractor):
l = re.escape(langcode)
# Language preference from most to least priority
# Reference: section 5.6.3 of
# http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf
# Reference: section 6.8 of
# https://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-07-1.pdf
PREFERENCES = (
# original version in requested language, without subtitles
r'VO{0}$'.format(l),
@@ -193,274 +136,59 @@ class ArteTVBaseIE(InfoExtractor):
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/(?:[^/]+/)?(?P<lang>fr|de|en|es)/(?:videos/)?(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
'only_matching': True,
}, {
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
'only_matching': True,
}, {
'url': 'http://www.arte.tv/de/videos/048696-000-A/der-kluge-bauch-unser-zweites-gehirn',
'only_matching': True,
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'info_dict': {
'id': '088501-000-A',
'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive',
'upload_date': '20190628',
},
}]
@classmethod
def suitable(cls, url):
return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
title = self._search_regex(
r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
entries = [
self.url_result(url)
for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
return self.playlist_result(entries)
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
'info_dict': {
'id': '057405-001-A',
'ext': 'mp4',
'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
'upload_date': '20150716',
},
}, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'playlist_count': 11,
'add_ie': ['Youtube'],
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
}]
class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
'ext': 'mp4',
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}]
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
'info_dict': {
'id': '050940-028-A',
'ext': 'mp4',
'title': 'Les écrevisses aussi peuvent être anxieuses',
'upload_date': '20140902',
},
}, {
'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
'only_matching': True,
}]
class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
_TESTS = []
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
if lang == 'folge':
lang = 'de'
elif lang == 'emission':
lang = 'fr'
webpage = self._download_webpage(url, video_id)
scriptElement = get_element_by_attribute('class', 'visu_video_block', webpage)
script_url = self._html_search_regex(r'src="(.*?)"', scriptElement, 'script url')
javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': {
'id': '186',
'ext': 'mp4',
'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
'upload_date': '20140128',
'description': 'md5:486eb08f991552ade77439fe6d82c305',
},
}]
class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TESTS = [{
'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
'info_dict': {
'id': '062494-000-A',
'ext': 'mp4',
'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
'upload_date': '20150807',
},
}]
class ArteTVMagazineIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:magazine'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
# Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
'md5': '2a9369bcccf847d1c741e51416299f25',
'info_dict': {
'id': '065965-000-A',
'ext': 'mp4',
'title': 'Trepalium - Extrait Ep.01',
'upload_date': '20160121',
},
}, {
# Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
'md5': 'fedc64fc7a946110fe311634e79782ca',
'info_dict': {
'id': '054813-004_PLUS7-F',
'ext': 'mp4',
'title': 'Trepalium (4/6)',
'description': 'md5:10057003c34d54e95350be4f9b05cb40',
'upload_date': '20160218',
},
}, {
'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
'only_matching': True,
}]
lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(
'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id),
video_id, lang)
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
http://www\.arte\.tv
/(?:playerv2/embed|arte_vp/index)\.php\?json_url=
https://www\.arte\.tv
/player/v3/index\.php\?json_url=
(?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
https?://api\.arte\.tv/api/player/v1/config/
(?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
)
'''
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang')
json_url = mobj.group('json_url')
json_url, lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(json_url, video_id, lang)
class TheOperaPlatformIE(ArteTVPlus7IE):
IE_NAME = 'theoperaplatform'
_VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
'md5': '970655901fa2e82e04c00b955e9afe7b',
'info_dict': {
'id': '060338-009-A',
'ext': 'mp4',
'title': 'Verdi - OTELLO',
'upload_date': '20160927',
},
}]
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
'info_dict': {
'id': 'PL-013263',
'title': 'Areva & Uramin',
'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
'id': 'RC-016954',
'title': 'Earn a Living',
'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
},
'playlist_mincount': 6,
}, {
'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id, lang = self._extract_url_info(url)
lang, playlist_id = re.match(self._VALID_URL, url).groups()
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)

View File

@@ -5,14 +5,12 @@ import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
)
from ..utils import extract_attributes
class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
@@ -20,7 +18,7 @@ class AsianCrushIE(InfoExtractor):
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:3db14e9186197857e7063522cb89a805',
'description': 'md5:7e986615808bcfb11756eb503a751487',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
@@ -28,10 +26,27 @@ class AsianCrushIE(InfoExtractor):
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/013886v/the-act-of-killing/',
'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/peep-show/013922v-warring-factions/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/010400v/drifters/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/mononoke/016378v-zashikiwarashi-part-1/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
@@ -51,7 +66,7 @@ class AsianCrushIE(InfoExtractor):
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
@@ -63,15 +78,23 @@ class AsianCrushIE(InfoExtractor):
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
return self.url_result(
'kaltura:%s:%s' % (partner_id, kaltura_id),
ie=KalturaIE.ie_key(), video_id=kaltura_id,
video_title=title)
description = self._html_search_regex(
r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', fatal=False)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
_TEST = {
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': {
'id': '12481',
@@ -79,7 +102,16 @@ class AsianCrushPlaylistIE(InfoExtractor):
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
},
'playlist_count': 20,
}
}, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/series/016375s/mononoke/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
@@ -96,15 +128,15 @@ class AsianCrushPlaylistIE(InfoExtractor):
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = remove_end(
self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
' | AsianCrush')
title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title:
title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(

View File

@@ -40,6 +40,7 @@ class BBCCoUkIE(InfoExtractor):
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/(?:clips|audiovideo/popular)[/#]|
radio/player/|
sounds/play/|
events/[^/]+/play/[^/]+/
)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
@@ -70,7 +71,7 @@ class BBCCoUkIE(InfoExtractor):
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
},
'params': {
@@ -220,6 +221,20 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
}, {
'url': 'https://www.bbc.co.uk/sounds/play/m0007jzb',
'note': 'Audio',
'info_dict': {
'id': 'm0007jz9',
'ext': 'mp4',
'title': 'BBC Proms, 2019, Prom 34: WestEastern Divan Orchestra',
'description': "Live BBC Proms. WestEastern Divan Orchestra with Daniel Barenboim and Martha Argerich.",
'duration': 9840,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
@@ -609,7 +624,7 @@ class BBCIE(BBCCoUkIE):
'url': 'http://www.bbc.com/news/world-europe-32668511',
'info_dict': {
'id': 'world-europe-32668511',
'title': 'Russia stages massive WW2 parade despite Western boycott',
'title': 'Russia stages massive WW2 parade',
'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
},
'playlist_count': 2,

View File

@@ -99,8 +99,8 @@ class BeamProLiveIE(BeamProBaseIE):
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
'info_dict': {
@@ -119,7 +119,13 @@ class BeamProVodIE(BeamProBaseIE):
'params': {
'skip_download': True,
},
}
}, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True,
}, {
'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
'only_matching': True,
}]
@staticmethod
def _extract_format(vod, vod_type):

View File

@@ -1,7 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
int_or_none,
unified_timestamp,
@@ -11,6 +14,7 @@ from ..utils import (
class BeegIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?beeg\.(?:com|porn(?:/video)?)/(?P<id>\d+)'
_TESTS = [{
# api/v6 v1
'url': 'http://beeg.com/5416503',
'md5': 'a1a1b1a8bc70a89e49ccfd113aed0820',
'info_dict': {
@@ -24,6 +28,14 @@ class BeegIE(InfoExtractor):
'tags': list,
'age_limit': 18,
}
}, {
# api/v6 v2
'url': 'https://beeg.com/1941093077?t=911-1391',
'only_matching': True,
}, {
# api/v6 v2 w/o t
'url': 'https://beeg.com/1277207756',
'only_matching': True,
}, {
'url': 'https://beeg.porn/video/5416503',
'only_matching': True,
@@ -41,11 +53,25 @@ class BeegIE(InfoExtractor):
r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
default='1546225636701')
if len(video_id) >= 10:
query = {
'v': 2,
}
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
t = qs.get('t', [''])[0].split('-')
if len(t) > 1:
query.update({
's': t[0],
'e': t[1],
})
else:
query = {'v': 1}
for api_path in ('', 'api.'):
video = self._download_json(
'https://%sbeeg.com/api/v6/%s/video/%s'
% (api_path, beeg_version, video_id), video_id,
fatal=api_path == 'api.')
fatal=api_path == 'api.', query=query)
if video:
break

View File

@@ -15,6 +15,7 @@ from ..utils import (
float_or_none,
parse_iso8601,
smuggle_url,
str_or_none,
strip_jsonp,
unified_timestamp,
unsmuggle_url,
@@ -306,3 +307,115 @@ class BiliBiliBangumiIE(InfoExtractor):
return self.playlist_result(
entries, bangumi_id,
season_info.get('bangumi_title'), season_info.get('evaluate'))
class BilibiliAudioBaseIE(InfoExtractor):
def _call_api(self, path, sid, query=None):
if not query:
query = {'sid': sid}
return self._download_json(
'https://www.bilibili.com/audio/music-service-c/web/' + path,
sid, query=query)['data']
class BilibiliAudioIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/au(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/au1003142',
'md5': 'fec4987014ec94ef9e666d4d158ad03b',
'info_dict': {
'id': '1003142',
'ext': 'm4a',
'title': '【tsukimi】YELLOW / 神山羊',
'artist': 'tsukimi',
'comment_count': int,
'description': 'YELLOW的mp3版',
'duration': 183,
'subtitles': {
'origin': [{
'ext': 'lrc',
}],
},
'thumbnail': r're:^https?://.+\.jpg',
'timestamp': 1564836614,
'upload_date': '20190803',
'uploader': 'tsukimi-つきみぐー',
'view_count': int,
},
}
def _real_extract(self, url):
au_id = self._match_id(url)
play_data = self._call_api('url', au_id)
formats = [{
'url': play_data['cdns'][0],
'filesize': int_or_none(play_data.get('size')),
}]
song = self._call_api('song/info', au_id)
title = song['title']
statistic = song.get('statistic') or {}
subtitles = None
lyric = song.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}]
}
return {
'id': au_id,
'title': title,
'formats': formats,
'artist': song.get('author'),
'comment_count': int_or_none(statistic.get('comment')),
'description': song.get('intro'),
'duration': int_or_none(song.get('duration')),
'subtitles': subtitles,
'thumbnail': song.get('cover'),
'timestamp': int_or_none(song.get('passtime')),
'uploader': song.get('uname'),
'view_count': int_or_none(statistic.get('play')),
}
class BilibiliAudioAlbumIE(BilibiliAudioBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/audio/am(?P<id>\d+)'
_TEST = {
'url': 'https://www.bilibili.com/audio/am10624',
'info_dict': {
'id': '10624',
'title': '每日新曲推荐每日11:00更新',
'description': '每天11:00更新为你推送最新音乐',
},
'playlist_count': 19,
}
def _real_extract(self, url):
am_id = self._match_id(url)
songs = self._call_api(
'song/of-menu', am_id, {'sid': am_id, 'pn': 1, 'ps': 100})['data']
entries = []
for song in songs:
sid = str_or_none(song.get('id'))
if not sid:
continue
entries.append(self.url_result(
'https://www.bilibili.com/audio/au' + sid,
BilibiliAudioIE.ie_key(), sid))
if entries:
album_data = self._call_api('menu/info', am_id) or {}
album_title = album_data.get('title')
if album_title:
for entry in entries:
entry['album'] = album_title
return self.playlist_result(
entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id)

View File

@@ -6,7 +6,6 @@ from ..utils import (
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor):
@@ -41,11 +40,15 @@ class BioBioChileTVIE(InfoExtractor):
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'id': 'b4xd0LK3SK',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
# TODO: fix url_transparent information overriding
# 'uploader': 'Juan Pablo Echenique',
'title': 'Comentario Oscar Cáceres',
},
'params': {
# empty m3u8 manifest
'skip_download': True,
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
@@ -60,7 +63,9 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
rudo_url = self._search_regex(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
webpage, 'embed URL', None, group='url')
if not rudo_url:
raise ExtractorError('No videos found')
@@ -68,7 +73,7 @@ class BioBioChileTVIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
r'<a[^>]+href=["\'](?:https?://(?:busca|www)\.biobiochile\.cl)?/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {

View File

@@ -42,7 +42,7 @@ class BIQLEIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
embed_url = self._proto_relative_url(self._search_regex(
r'<iframe.+?src="((?:https?:)?//daxab\.com/[^"]+)".*?></iframe>',
r'<iframe.+?src="((?:https?:)?//(?:daxab\.com|dxb\.to|[^/]+/player)/[^"]+)".*?></iframe>',
webpage, 'embed url'))
if VKIE.suitable(embed_url):
return self.url_result(embed_url, VKIE.ie_key(), video_id)

View File

@@ -55,6 +55,11 @@ class BitChuteIE(InfoExtractor):
formats = [
{'url': format_url}
for format_url in orderedSet(format_urls)]
if not formats:
formats = self._parse_html5_media_entries(
url, webpage, video_id)[0]['formats']
self._check_formats(formats, video_id)
self._sort_formats(formats)
@@ -65,8 +70,9 @@ class BitChuteIE(InfoExtractor):
webpage, default=None) or self._html_search_meta(
'twitter:image:src', webpage, 'thumbnail')
uploader = self._html_search_regex(
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>', webpage,
'uploader', fatal=False)
(r'(?s)<div class=["\']channel-banner.*?<p\b[^>]+\bclass=["\']name[^>]+>(.+?)</p>',
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>'),
webpage, 'uploader', fatal=False)
return {
'id': video_id,

View File

@@ -71,7 +71,7 @@ class BleacherReportIE(InfoExtractor):
video = article_data.get('video')
if video:
video_type = video['type']
if video_type == 'cms.bleacherreport.com':
if video_type in ('cms.bleacherreport.com', 'vid.bleacherreport.com'):
info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id']
elif video_type == 'ooyala.com':
info['url'] = 'ooyala:%s' % video['id']
@@ -87,9 +87,9 @@ class BleacherReportIE(InfoExtractor):
class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36})'
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
@@ -101,6 +101,6 @@ class BleacherReportCMSIE(AMPIE):
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._extract_feed_info('http://cms.bleacherreport.com/media/items/%s/akamai.json' % video_id)
info = self._extract_feed_info('http://vid.bleacherreport.com/videos/%s.akamai' % video_id)
info['id'] = video_id
return info

View File

@@ -11,8 +11,8 @@ from ..utils import ExtractorError
class BokeCCBaseIE(InfoExtractor):
def _extract_bokecc_formats(self, webpage, video_id, format_id=None):
player_params_str = self._html_search_regex(
r'<(?:script|embed)[^>]+src="http://p\.bokecc\.com/player\?([^"]+)',
webpage, 'player params')
r'<(?:script|embed)[^>]+src=(?P<q>["\'])(?:https?:)?//p\.bokecc\.com/(?:player|flash/player\.swf)\?(?P<query>.+?)(?P=q)',
webpage, 'player params', group='query')
player_params = compat_parse_qs(player_params_str)
@@ -36,9 +36,9 @@ class BokeCCIE(BokeCCBaseIE):
_VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{
'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',
'url': 'http://union.bokecc.com/playvideo.bo?vid=E0ABAE9D4F509B189C33DC5901307461&uid=FE644790DE9D154A',
'info_dict': {
'id': 'CD0C5D3C8614B28B_E44D40C15E65EA30',
'id': 'FE644790DE9D154A_E0ABAE9D4F509B189C33DC5901307461',
'ext': 'flv',
'title': 'BokeCC Video',
},

View File

@@ -2,7 +2,6 @@
from __future__ import unicode_literals
import base64
import json
import re
import struct
@@ -11,14 +10,12 @@ from .adobepass import AdobePassIE
from ..compat import (
compat_etree_fromstring,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse,
compat_xml_parse_error,
compat_HTTPError,
)
from ..utils import (
determine_ext,
ExtractorError,
extract_attributes,
find_xpath_attr,
@@ -27,18 +24,19 @@ from ..utils import (
js_to_json,
int_or_none,
parse_iso8601,
smuggle_url,
unescapeHTML,
unsmuggle_url,
update_url_query,
clean_html,
mimetype2ext,
UnsupportedError,
)
class BrightcoveLegacyIE(InfoExtractor):
IE_NAME = 'brightcove:legacy'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL = 'http://c.brightcove.com/services/viewer/htmlFederated'
_TESTS = [
{
@@ -55,7 +53,8 @@ class BrightcoveLegacyIE(InfoExtractor):
'timestamp': 1368213670,
'upload_date': '20130510',
'uploader_id': '1589608506001',
}
},
'skip': 'The player has been deactivated by the content owner',
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
@@ -70,6 +69,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'upload_date': '20120814',
'uploader_id': '1460825906',
},
'skip': 'video not playable',
},
{
# From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/
@@ -79,7 +79,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'ext': 'mp4',
'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable',
# 'uploader': 'Mashable',
'timestamp': 1382041798,
'upload_date': '20131017',
'uploader_id': '1130468786001',
@@ -124,6 +124,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'id': '3550319591001',
},
'playlist_mincount': 7,
'skip': 'Unsupported URL',
},
{
# playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965)
@@ -133,6 +134,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Lesson 08',
},
'playlist_mincount': 10,
'skip': 'Unsupported URL',
},
{
# playerID inferred from bcpid
@@ -141,12 +143,6 @@ class BrightcoveLegacyIE(InfoExtractor):
'only_matching': True, # Tested in GenericIE
}
]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod
def _build_brighcove_url(cls, object_str):
@@ -238,7 +234,8 @@ class BrightcoveLegacyIE(InfoExtractor):
@classmethod
def _make_brightcove_url(cls, params):
return update_url_query(cls._FEDERATED_URL, params)
return update_url_query(
'http://c.brightcove.com/services/viewer/htmlFederated', params)
@classmethod
def _extract_brightcove_url(cls, webpage):
@@ -297,38 +294,12 @@ class BrightcoveLegacyIE(InfoExtractor):
videoPlayer = query.get('@videoPlayer')
if videoPlayer:
# We set the original url as the default 'Referer' header
referer = smuggled_data.get('Referer', url)
referer = query.get('linkBaseURL', [None])[0] or smuggled_data.get('Referer', url)
video_id = videoPlayer[0]
if 'playerID' not in query:
mobj = re.search(r'/bcpid(\d+)', url)
if mobj is not None:
query['playerID'] = [mobj.group(1)]
return self._get_video_info(
videoPlayer[0], query, referer=referer)
elif 'playerKey' in query:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
else:
raise ExtractorError(
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _brightcove_new_url_result(self, publisher_id, video_id):
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
if linkBase is not None:
referer = linkBase[0]
if referer is not None:
headers['Referer'] = referer
webpage = self._download_webpage(self._FEDERATED_URL, video_id, headers=headers, query=query)
error_msg = self._html_search_regex(
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
'error message', default=None)
if error_msg is not None:
publisher_id = query.get('publisherId')
if publisher_id and publisher_id[0].isdigit():
publisher_id = publisher_id[0]
@@ -339,6 +310,9 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
player_id = query.get('playerID')
if player_id and player_id[0].isdigit():
headers = {}
if referer:
headers['Referer'] = referer
player_page = self._download_webpage(
'http://link.brightcove.com/services/player/bcpid' + player_id[0],
video_id, headers=headers, fatal=False)
@@ -349,141 +323,21 @@ class BrightcoveLegacyIE(InfoExtractor):
if player_key:
enc_pub_id = player_key.split(',')[1].replace('~', '=')
publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0]
if publisher_id:
return self._brightcove_new_url_result(publisher_id, video_id)
raise ExtractorError(
'brightcove said: %s' % error_msg, expected=True)
self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*});', webpage, 'json')
info = json.loads(info)['data']
video_info = info['programmedContent']['videoPlayer']['mediaDTO']
video_info['_youtubedl_adServerURL'] = info.get('adServerURL')
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
info_url = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s' % player_key
playlist_info = self._download_webpage(
info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist')
videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
publisher_id = video_info.get('publisherId')
info = {
'id': video_id,
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
'uploader_id': compat_str(publisher_id) if publisher_id else None,
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
}
renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions:
formats = []
for rend in renditions:
url = rend['defaultURL']
if not url:
continue
ext = None
if rend['remote']:
url_comp = compat_urllib_parse_urlparse(url)
if url_comp.path.endswith('.m3u8'):
formats.extend(
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue
elif 'akamaihd.net' in url_comp.netloc:
# This type of renditions are served through
# akamaihd.net, but they don't use f4m manifests
url = url.replace('control/', '') + '?&v=3.3.0&fp=13&r=FEEFJ&g=RTSJIMBMPFPB'
ext = 'flv'
if ext is None:
ext = determine_ext(url)
tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,
'ext': ext,
'filesize': int_or_none(rend.get('size')) or None,
'tbr': tbr,
}
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
})
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8_native',
})
formats.append(a_format)
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
})
if self._downloader.params.get('include_ads', False):
adServerURL = video_info.get('_youtubedl_adServerURL')
if adServerURL:
ad_info = {
'_type': 'url',
'url': adServerURL,
}
if 'url' in info:
return {
'_type': 'playlist',
'title': info['title'],
'entries': [ad_info, info],
}
else:
return ad_info
if not info.get('url') and not info.get('formats'):
uploader_id = info.get('uploader_id')
if uploader_id:
info.update(self._brightcove_new_url_result(uploader_id, video_id))
else:
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info
if publisher_id:
brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
if referer:
brightcove_new_url = smuggle_url(brightcove_new_url, {'referrer': referer})
return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
# TODO: figure out if it's possible to extract playlistId from playerKey
# elif 'playerKey' in query:
# player_key = query['playerKey']
# return self._get_playlist_info(player_key[0])
raise UnsupportedError(url)
class BrightcoveNewIE(AdobePassIE):
IE_NAME = 'brightcove:new'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+|ref:[^&]+)'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*(?P<content_type>video|playlist)Id=(?P<video_id>\d+|ref:[^&]+)'
_TESTS = [{
'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
'md5': 'c8100925723840d4b0d243f7025703be',
@@ -516,6 +370,21 @@ class BrightcoveNewIE(AdobePassIE):
# m3u8 download
'skip_download': True,
}
}, {
# playlist stream
'url': 'https://players.brightcove.net/1752604059001/S13cJdUBz_default/index.html?playlistId=5718313430001',
'info_dict': {
'id': '5718313430001',
'title': 'No Audio Playlist',
},
'playlist_count': 7,
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://players.brightcove.net/5690807595001/HyZNerRl7_default/index.html?playlistId=5743160747001',
'only_matching': True,
}, {
# ref: prefixed video id
'url': 'http://players.brightcove.net/3910869709001/21519b5c-4b3b-4363-accb-bdc8f358f823_default/index.html?videoId=ref:7069442',
@@ -715,7 +584,7 @@ class BrightcoveNewIE(AdobePassIE):
'ip_blocks': smuggled_data.get('geo_ip_blocks'),
})
account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
@@ -736,7 +605,7 @@ class BrightcoveNewIE(AdobePassIE):
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
headers = {
'Accept': 'application/json;pk=%s' % policy_key,
}
@@ -771,5 +640,12 @@ class BrightcoveNewIE(AdobePassIE):
'tveToken': tve_token,
})
if content_type == 'playlist':
return self.playlist_result(
[self._parse_brightcove_metadata(vid, vid.get('id'), headers)
for vid in json_data.get('videos', []) if vid.get('id')],
json_data.get('id'), json_data.get('name'),
json_data.get('description'))
return self._parse_brightcove_metadata(
json_data, video_id, headers=headers)

View File

@@ -3,7 +3,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_duration
from ..utils import (
determine_ext,
merge_dicts,
parse_duration,
url_or_none,
)
class BYUtvIE(InfoExtractor):
@@ -51,7 +56,7 @@ class BYUtvIE(InfoExtractor):
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
info = self._download_json(
video = self._download_json(
'https://api.byutv.org/api3/catalog/getvideosforcontent',
display_id, query={
'contentid': video_id,
@@ -62,7 +67,7 @@ class BYUtvIE(InfoExtractor):
'x-byutv-platformkey': 'xsaaw9c7y5',
})
ep = info.get('ooyalaVOD')
ep = video.get('ooyalaVOD')
if ep:
return {
'_type': 'url_transparent',
@@ -75,18 +80,38 @@ class BYUtvIE(InfoExtractor):
'thumbnail': ep.get('imageThumbnail'),
}
ep = info['dvr']
title = ep['title']
formats = self._extract_m3u8_formats(
ep['videoUrl'], video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
info = {}
formats = []
for format_id, ep in video.items():
if not isinstance(ep, dict):
continue
video_url = url_or_none(ep.get('videoUrl'))
if not video_url:
continue
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': video_url,
'format_id': format_id,
})
merge_dicts(info, {
'title': ep.get('title'),
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
'duration': parse_duration(ep.get('length')),
})
self._sort_formats(formats)
return {
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'title': title,
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
'duration': parse_duration(ep.get('length')),
'title': display_id,
'formats': formats,
}
})

View File

@@ -17,7 +17,7 @@ from ..utils import (
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrtvideo)/assets/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '90139b746a0a9bd7bb631283f6e2a64e',
@@ -35,6 +35,10 @@ class CanvasIE(InfoExtractor):
'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'only_matching': True,
}]
_HLS_ENTRY_PROTOCOLS_MAP = {
'HLS': 'm3u8_native',
'HLS_AES': 'm3u8',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -52,9 +56,9 @@ class CanvasIE(InfoExtractor):
format_url, format_type = target.get('url'), target.get('type')
if not format_url or not format_type:
continue
if format_type == 'HLS':
if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
m3u8_id=format_type, fatal=False))
elif format_type == 'HDS':
formats.extend(self._extract_f4m_formats(

View File

@@ -69,7 +69,7 @@ class CBSIE(CBSBaseIE):
last_e = None
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types or asset_type in ('HLS_FPS', 'DASH_CENC'):
if not asset_type or asset_type in asset_types or 'HLS_FPS' in asset_type or 'DASH_CENC' in asset_type:
continue
asset_types.append(asset_type)
query = {

View File

@@ -1,40 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import zlib
from .common import InfoExtractor
from .cbs import CBSIE
from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote,
)
from ..utils import (
parse_duration,
)
class CBSNewsEmbedIE(CBSIE):
IE_NAME = 'cbsnews:embed'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/embed/video[^#]*#(?P<id>.+)'
_TESTS = [{
'url': 'https://www.cbsnews.com/embed/video/?v=1.c9b5b61492913d6660db0b2f03579ef25e86307a#1Vb7b9s2EP5XBAHbT6Gt98PAMKTJ0se6LVjWYWtdGBR1stlIpEBSTtwi%2F%2FvuJNkNhmHdGxgM2NL57vjd6zt%2B8PngdN%2Fyg79qeGvhzN%2FLGrS%2F%2BuBLB531V28%2B%2BO7Qg7%2Fy97r2z3xZ42NW8yLhDbA0S0KWlHnIijwKWJBHZZnHBa8Cgbpdf%2F89NM9Hi9fXifhpr8sr%2FlP848tn%2BTdXycX25zh4cdX%2FvHl6PmmPqnWQv9w8Ed%2B9GjYRim07bFEqdG%2BZVHuwTm65A7bVRrYtR5lAyMox7pigF6W4k%2By91mjspGsJ%2BwVae4%2BsvdnaO1p73HkXs%2FVisUDTGm7R8IcdnOROeq%2B19qT1amhA1VJtPenoTUgrtfKc9m7Rq8dP7nnjwOB7wg7ADdNt7VX64DWAWlKhPtmDEq22g4GF99x6Dk9E8OSsankHXqPNKDxC%2FdK7MLKTircTDgsI3mmj4OBdSq64dy7fd1x577RU1rt4cvMtOaulFYOd%2FLewRWvDO9lIgXFpZSnkZmjbv5SxKTPoQXClFbpsf%2Fhbbpzs0IB3vb8KkyzJQ%2BywOAgCrMpgRrz%2BKk4fvb7kFbR4XJCu0gAdtNO7woCwZTu%2BBUs9bam%2Fds71drVerpeisgrubLjAB4nnOSkWQnfr5W6o1ku5Xpr1MgrCbL0M0vUyDtfLLK15WiYp47xKWSLyjFVpwVmVJSLIoCjSOFkv3W7oKsVliwZJcB9nwXpZ5GEQQwY8jNKqKCBrgjTLeFxgdCIpazojDgnRtn43J6kG7nZ6cAbxh0EeFFk4%2B1u867cY5u4344n%2FxXjCqAjucdTHgLKojNKmSfO8KRsOFY%2FzKEYCKEJBzv90QA9nfm9gL%2BHulaFqUkz9ULUYxl62B3U%2FRVNLA8IhggaPycOoBuwOCESciDQVSSUgiOMsROB%2FhKfwCKOzEk%2B4k6rWd4uuT%2FwTDz7K7t3d3WLO8ISD95jSPQbayBacthbz86XVgxHwhex5zawzgDOmtp%2F3GPcXn0VXHdSS029%2Fj99UC%2FwJUvyKQ%2FzKyixIEVlYJOn4RxxuaH43Ty9fbJ5OObykHH435XAzJTHeOF4hhEUXD8URe%2FQ%2FBT%2BMpf8d5GN02Ox%2FfiGsl7TA7POu1xZ5%2BbTzcAVKMe48mqcC21hkacVEVScM26liVVBnrKkC4CLKyzAvHu0lhEaTKMFwI3a4SN9MsrfYzdBLq2vkwRD1gVviLT8kY9h2CHH6Y%2Bix6609weFtey4ESp60WtyeWMy%2BsmBuhsoKIyuoT%2Bq2R%2FrW5qi3g%2FvzS2j40DoixDP8%2BKP0yUdpXJ4l6Vla%2Bg9vce%2BC4yM5YlUcbA%2F0jLKdpmTwvsdN5z88nAIe08%2F0HgxeG1iv%2B6Hlhjh7uiW0SDzYNI92L401uha3JKYk268UVRzdOzNQvAaJqoXzAc80dAV440NZ1WVVAAMRYQ2KrGJFmDUsq8saWSnjvIj8t78y%2FRa3JRnbHVfyFpfwoDiGpPgjzekyUiKNlU3OMlwuLMmzgvEojllYVE2Z1HhImvsnk%2BuhusTEoB21PAtSFodeFK3iYhXEH9WOG2%2FkOE833sfeG%2Ff5cfHtEFNXgYes0%2FXj7aGivUgJ9XpusCtoNcNYVVnJVrrDo0OmJAutHCpuZul4W9lLcfy7BnuLPT02%2ByXsCTk%2B9zhzswIN04YueNSK%2BPtM0jS88QdLqSLJDTLsuGZJNolm2yO0PXh3UPnz9Ix5bfIAqxPjvETQsDCEiPG4QbqNyhBZISxybLnZYCrW5H3Axp690%2F0BJdXtDZ5ITuM4xj3f4oUHGzc5JeJmZKpp%2FjwKh4wMV%2FV1yx3emLoR0MwbG4K%2F%2BZgVep3PnzXGDHZ6a3i%2Fk%2BJrONDN13%2Bnq6tBTYk4o7cLGhBtqCC4KwacGHpEVuoH5JNro%2FE6JfE6d5RydbiR76k%2BW5wioDHBIjw1euhHjUGRB0y5A97KoaPx6MlL%2BwgboUVtUFRI%2FLemgTpdtF59ii7pab08kuPcfWzs0l%2FRI5takWnFpka0zOgWRtYcuf9aIxZMxlwr6IiGpsb6j2DQUXPl%2FimXI599Ev7fWjoPD78A',
'only_matching': True,
}]
def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')
class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|video)/(?P<id>[\da-z_-]+)'
_TESTS = [
{
# 60 minutes
'url': 'http://www.cbsnews.com/news/artificial-intelligence-positioned-to-be-a-game-changer/',
'info_dict': {
'id': '_B6Ga3VJrI4iQNKsir_cdFo9Re_YJHE_',
'ext': 'mp4',
'title': 'Artificial Intelligence',
'description': 'md5:8818145f9974431e0fb58a1b8d69613c',
'id': 'Y_nf_aEg6WwO9OLAq0MpKaPgfnBUxfW4',
'ext': 'flv',
'title': 'Artificial Intelligence, real-life applications',
'description': 'md5:a7aaf27f1b4777244de8b0b442289304',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1606,
'duration': 317,
'uploader': 'CBSI-NEW',
'timestamp': 1498431900,
'upload_date': '20170625',
'timestamp': 1476046464,
'upload_date': '20161009',
},
'params': {
# m3u8 download
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'url': 'https://www.cbsnews.com/video/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': {
'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
'ext': 'mp4',
@@ -60,37 +82,29 @@ class CBSNewsIE(CBSIE):
# 48 hours
'url': 'http://www.cbsnews.com/news/maria-ridulph-murder-will-the-nations-oldest-cold-case-to-go-to-trial-ever-get-solved/',
'info_dict': {
'id': 'QpM5BJjBVEAUFi7ydR9LusS69DPLqPJ1',
'ext': 'mp4',
'title': 'Cold as Ice',
'description': 'Can a childhood memory of a friend\'s murder solve a 1957 cold case? "48 Hours" correspondent Erin Moriarty has the latest.',
'upload_date': '20170604',
'timestamp': 1496538000,
'uploader': 'CBSI-NEW',
},
'params': {
'skip_download': True,
'description': 'Can a childhood memory solve the 1957 murder of 7-year-old Maria Ridulph?',
},
'playlist_mincount': 7,
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, display_id)
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"[^>]*><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info', default='{}'), video_id, fatal=False)
if video_info:
item = video_info['item'] if 'item' in video_info else video_info
else:
state = self._parse_json(self._search_regex(
r'data-cbsvideoui-options=(["\'])(?P<json>{.+?})\1', webpage,
'playlist JSON info', group='json'), video_id)['state']
item = state['playlist'][state['pid']]
entries = []
for embed_url in re.findall(r'<iframe[^>]+data-src="(https?://(?:www\.)?cbsnews\.com/embed/video/[^#]*#[^"]+)"', webpage):
entries.append(self.url_result(embed_url, CBSNewsEmbedIE.ie_key()))
if entries:
return self.playlist_result(
entries, playlist_title=self._html_search_meta(['og:title', 'twitter:title'], webpage),
playlist_description=self._html_search_meta(['og:description', 'twitter:description', 'description'], webpage))
item = self._parse_json(self._html_search_regex(
r'CBSNEWS\.defaultPayload\s*=\s*({.+})',
webpage, 'video JSON info'), display_id)['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View File

@@ -7,7 +7,7 @@ from ..utils import ExtractorError
class ChaturbateIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?:fullvideo/?\?.*?\bb=)?(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.chaturbate.com/siswet19/',
'info_dict': {
@@ -21,6 +21,9 @@ class ChaturbateIE(InfoExtractor):
'skip_download': True,
},
'skip': 'Room is offline',
}, {
'url': 'https://chaturbate.com/fullvideo/?b=caylin',
'only_matching': True,
}, {
'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True,
@@ -32,7 +35,8 @@ class ChaturbateIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, headers=self.geo_verification_headers())
'https://chaturbate.com/%s/' % video_id, video_id,
headers=self.geo_verification_headers())
m3u8_urls = []

View File

@@ -67,6 +67,7 @@ from ..utils import (
sanitized_Request,
sanitize_filename,
str_or_none,
strip_or_none,
unescapeHTML,
unified_strdate,
unified_timestamp,
@@ -219,7 +220,7 @@ class InfoExtractor(object):
* "preference" (optional, int) - quality of the image
* "width" (optional, int)
* "height" (optional, int)
* "resolution" (optional, string "{width}x{height"},
* "resolution" (optional, string "{width}x{height}",
deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image.
@@ -1423,12 +1424,10 @@ class InfoExtractor(object):
try:
self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
return True
except ExtractorError as e:
if isinstance(e.cause, compat_urllib_error.URLError):
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
return False
raise
except ExtractorError:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
return False
def http_scheme(self):
""" Either "http:" or "https:", depending on the user's preferences """
@@ -2480,7 +2479,7 @@ class InfoExtractor(object):
'subtitles': {},
}
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
src = strip_or_none(media_attributes.get('src'))
if src:
_, formats = _media_formats(src, media_type)
media_info['formats'].extend(formats)
@@ -2490,7 +2489,7 @@ class InfoExtractor(object):
s_attr = extract_attributes(source_tag)
# data-video-src and data-src are non standard but seen
# several times in the wild
src = dict_get(s_attr, ('src', 'data-video-src', 'data-src'))
src = strip_or_none(dict_get(s_attr, ('src', 'data-video-src', 'data-src')))
if not src:
continue
f = parse_content_type(s_attr.get('type'))
@@ -2533,7 +2532,7 @@ class InfoExtractor(object):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src')
src = strip_or_none(track_attributes.get('src'))
if not src:
continue
lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
@@ -2817,6 +2816,33 @@ class InfoExtractor(object):
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""
Apply first Set-Cookie header instead of the last. Experimental.
Some sites (e.g. [1-3]) may serve two cookies under the same name
in Set-Cookie header and expect the first (old) one to be set rather
than second (new). However, as of RFC6265 the newer one cookie
should be set into cookie store what actually happens.
We will workaround this issue by resetting the cookie to
the first one manually.
1. https://new.vk.com/
2. https://github.com/ytdl-org/youtube-dl/issues/9841#issuecomment-227871201
3. https://learning.oreilly.com/
"""
for header, cookies in url_handle.headers.items():
if header.lower() != 'set-cookie':
continue
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
cookie_value = re.search(
r'%s=(.+?);.*?\b[Dd]omain=(.+?)(?:[,;]|$)' % cookie, cookies)
if cookie_value:
value, domain = cookie_value.groups()
self._set_cookie(domain, cookie, value)
break
def get_testcases(self, include_onlymatching=False):
t = getattr(self, '_TEST', None)
if t:

View File

@@ -1,39 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = {
'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3',
'info_dict': {
'id': '184',
'ext': 'mp4',
'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(
r'so\.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage)
thumbnail = self._search_regex(
r'so\.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {
'id': video_id,
'url': final_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -103,19 +103,6 @@ class CrunchyrollBaseIE(InfoExtractor):
def _real_initialize(self):
self._login()
def _download_webpage(self, url_or_request, *args, **kwargs):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
# should be imposed or not (from what I can see it just takes the first language
# ignoring the priority and requires it to correspond the IP). By the way this causes
# Crunchyroll to not work in georestriction cases in some browsers that don't place
# the locale lang first in header. However allowing any language seems to workaround the issue.
request.add_header('Accept-Language', '*')
return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
@staticmethod
def _add_skip_wall(url):
parsed_url = compat_urlparse.urlparse(url)
@@ -269,6 +256,19 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'1080': ('80', '108'),
}
def _download_webpage(self, url_or_request, *args, **kwargs):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
# should be imposed or not (from what I can see it just takes the first language
# ignoring the priority and requires it to correspond the IP). By the way this causes
# Crunchyroll to not work in georestriction cases in some browsers that don't place
# the locale lang first in header. However allowing any language seems to workaround the issue.
request.add_header('Accept-Language', '*')
return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
def _decrypt_subtitles(self, data, iv, id):
data = bytes_to_intlist(compat_b64decode(data))
iv = bytes_to_intlist(compat_b64decode(iv))
@@ -661,9 +661,8 @@ class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
webpage = self._download_webpage(
self._add_skip_wall(url), show_id,
headers=self.geo_verification_headers())
title = self._html_search_regex(
r'(?s)<h1[^>]*>\s*<span itemprop="name">(.*?)</span>',
webpage, 'title')
title = self._html_search_meta('name', webpage, default=None)
episode_paths = re.findall(
r'(?s)<li id="showview_videos_media_(\d+)"[^>]+>.*?<a href="([^"]+)"',
webpage)

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_timestamp
from .youtube import YoutubeIE
class CtsNewsIE(InfoExtractor):
@@ -14,8 +15,8 @@ class CtsNewsIE(InfoExtractor):
'info_dict': {
'id': '201501291578109',
'ext': 'mp4',
'title': '以色列.真主黨交火 3人死亡',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...',
'title': '以色列.真主黨交火 3人死亡 - 華視新聞網',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人員也不幸罹難。大陸陝西、河南、安徽、江蘇和湖北五個省份出現大暴雪,嚴重影響陸空交通,不過九華山卻出現...',
'timestamp': 1422528540,
'upload_date': '20150129',
}
@@ -26,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
'info_dict': {
'id': '201309031304098',
'ext': 'mp4',
'title': '韓國31歲童顏男 貌如十多歲小孩',
'title': '韓國31歲童顏男 貌如十多歲小孩 - 華視新聞網',
'description': '越有年紀的人越希望看起來年輕一點而南韓卻有一位31歲的男子看起來像是11、12歲的小孩身...',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1378205880,
@@ -62,8 +63,7 @@ class CtsNewsIE(InfoExtractor):
video_url = mp4_feed['source_url']
else:
self.to_screen('Not CTSPlayer video, trying Youtube...')
youtube_url = self._search_regex(
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
youtube_url = YoutubeIE._extract_url(page)
return self.url_result(youtube_url, ie='Youtube')

View File

@@ -48,7 +48,14 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'(?i)https?://(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|#)/)?video|swf)/(?P<id>[^/?_]+)'
_VALID_URL = r'''(?ix)
https?://
(?:
(?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
(?:www\.)?lequipe\.fr/video
)
/(?P<id>[^/?_]+)
'''
IE_NAME = 'dailymotion'
_FORMATS = [
@@ -133,14 +140,26 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}, {
'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/x791mem',
'only_matching': True,
}, {
'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
urls = []
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
return list(map(lambda m: unescapeHTML(m[1]), matches))
# https://developer.dailymotion.com/player#player-parameters
for mobj in re.finditer(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage):
urls.append(unescapeHTML(mobj.group('url')))
for mobj in re.finditer(
r'(?s)DM\.player\([^,]+,\s*{.*?video[\'"]?\s*:\s*["\']?(?P<id>[0-9a-zA-Z]+).+?}\s*\);', webpage):
urls.append('https://www.dailymotion.com/embed/video/' + mobj.group('id'))
return urls
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -7,50 +7,51 @@ from .common import InfoExtractor
class DBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_VALID_URL = r'https?://(?:www\.)?dagbladet\.no/video/(?:(?:embed|(?P<display_id>[^/]+))/)?(?P<id>[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'md5': '2e24f67936517b143a234b4cadf792ec',
'url': 'https://www.dagbladet.no/video/PynxJnNWChE/',
'md5': 'b8f850ba1860adbda668d367f9b77699',
'info_dict': {
'id': '3649835190001',
'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'id': 'PynxJnNWChE',
'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
'description': 'md5:49cc8370e7d66e8a2ef15c3b4631fd3f',
'thumbnail': r're:https?://.*\.jpg',
'timestamp': 1404039863,
'upload_date': '20140629',
'duration': 69.544,
'uploader_id': '1027729757001',
'upload_date': '20160916',
'duration': 69,
'uploader_id': 'UCk5pvsyZJoYJBd7_oFPTlRQ',
'uploader': 'Dagbladet',
},
'add_ie': ['BrightcoveNew']
'add_ie': ['Youtube']
}, {
'url': 'http://dbtv.no/3649835190001',
'url': 'https://www.dagbladet.no/video/embed/xlGmyIeN9Jo/?autoplay=false',
'only_matching': True,
}, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/vice/5000634109001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/filmtrailer/3359293614001',
'url': 'https://www.dagbladet.no/video/truer-iran-bor-passe-dere/PalfB2Cw',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dagbladet\.no/video/embed/(?:[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8}).*?)\1',
webpage)]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
return {
display_id, video_id = re.match(self._VALID_URL, url).groups()
info = {
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id,
'display_id': display_id,
'ie_key': 'BrightcoveNew',
}
if len(video_id) == 11:
info.update({
'url': video_id,
'ie_key': 'Youtube',
})
else:
info.update({
'url': 'jwplatform:' + video_id,
'ie_key': 'JWPlatform',
})
return info

View File

@@ -5,23 +5,17 @@ import re
import string
from .discoverygo import DiscoveryGoBaseIE
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
ExtractorError,
try_get,
)
from ..compat import compat_urllib_parse_unquote
from ..utils import ExtractorError
from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://
(?P<site>
(?:(?:www|go)\.)?discovery|
(?:www\.)?
(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
@@ -40,15 +34,15 @@ class DiscoveryIE(DiscoveryGoBaseIE):
cookingchanneltv|
motortrend
)
)\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
)\.com/tv-shows/(?P<show_slug>[^/]+)/(?:video|full-episode)s/(?P<id>[^./?#]+)'''
_TESTS = [{
'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry',
'info_dict': {
'id': '5a2d9b4d6b66d17a5026e1fd',
'id': '5a2f35ce6b66d17a5026e29e',
'ext': 'mp4',
'title': 'Dave Foley',
'description': 'md5:4b39bcafccf9167ca42810eb5f28b01f',
'duration': 608,
'title': 'Riding with Matthew Perry',
'description': 'md5:a34333153e79bc4526019a5129e7f878',
'duration': 84,
},
'params': {
'skip_download': True, # requires ffmpeg
@@ -56,20 +50,20 @@ class DiscoveryIE(DiscoveryGoBaseIE):
}, {
'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
'only_matching': True,
}, {
'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
'only_matching': True,
}, {
# using `show_slug` is important to get the correct video data
'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
_API_BASE_URL = 'https://api.discovery.com/v1/'
def _real_extract(self, url):
site, path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
react_data = self._parse_json(self._search_regex(
r'window\.__reactTransmitPacket\s*=\s*({.+?});',
webpage, 'react data'), display_id)
content_blocks = react_data['layout'][path]['contentBlocks']
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id']
site, show_slug, display_id = re.match(self._VALID_URL, url).groups()
access_token = None
cookies = self._get_cookies(url)
@@ -79,27 +73,36 @@ class DiscoveryIE(DiscoveryGoBaseIE):
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)),
video_id, fatal=False) or {}
display_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json(
'https://%s.com/anonymous' % site, display_id, query={
'https://%s.com/anonymous' % site, display_id,
'Downloading token JSON metadata', query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token']
try:
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
try:
video = self._download_json(
self._API_BASE_URL + 'content/videos',
display_id, 'Downloading content JSON metadata',
headers=headers, query={
'embed': 'show.name',
'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags',
'slug': display_id,
'show_slug': show_slug,
})[0]
video_id = video['id']
stream = self._download_json(
'https://api.discovery.com/v1/streaming/video/' + video_id,
display_id, headers=headers)
self._API_BASE_URL + 'streaming/video/' + video_id,
display_id, 'Downloading streaming JSON metadata', headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json(

View File

@@ -0,0 +1,97 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import int_or_none
class DLiveVODIE(InfoExtractor):
IE_NAME = 'dlive:vod'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://dlive.tv/p/pdp+3mTzOl4WR',
'info_dict': {
'id': '3mTzOl4WR',
'ext': 'mp4',
'title': 'Minecraft with james charles epic',
'upload_date': '20190701',
'timestamp': 1562011015,
'uploader_id': 'pdp',
}
}, {
'url': 'https://dlive.tv/p/pdpreplay+D-RD-xSZg',
'only_matching': True,
}]
def _real_extract(self, url):
uploader_id, vod_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
'https://graphigo.prd.dlive.tv/', vod_id,
data=json.dumps({'query': '''query {
pastBroadcast(permlink:"%s+%s") {
content
createdAt
length
playbackUrl
title
thumbnailUrl
viewCount
}
}''' % (uploader_id, vod_id)}).encode())['data']['pastBroadcast']
title = broadcast['title']
formats = self._extract_m3u8_formats(
broadcast['playbackUrl'], vod_id, 'mp4', 'm3u8_native')
self._sort_formats(formats)
return {
'id': vod_id,
'title': title,
'uploader_id': uploader_id,
'formats': formats,
'description': broadcast.get('content'),
'thumbnail': broadcast.get('thumbnailUrl'),
'timestamp': int_or_none(broadcast.get('createdAt'), 1000),
'view_count': int_or_none(broadcast.get('viewCount')),
}
class DLiveStreamIE(InfoExtractor):
IE_NAME = 'dlive:stream'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/(?!p/)(?P<id>[\w.-]+)'
def _real_extract(self, url):
display_name = self._match_id(url)
user = self._download_json(
'https://graphigo.prd.dlive.tv/', display_name,
data=json.dumps({'query': '''query {
userByDisplayName(displayname:"%s") {
livestream {
content
createdAt
title
thumbnailUrl
watchingCount
}
username
}
}''' % display_name}).encode())['data']['userByDisplayName']
livestream = user['livestream']
title = livestream['title']
username = user['username']
formats = self._extract_m3u8_formats(
'https://live.prd.dlive.tv/hls/live/%s.m3u8' % username,
display_name, 'mp4')
self._sort_formats(formats)
return {
'id': display_name,
'title': self._live_title(title),
'uploader': display_name,
'uploader_id': username,
'formats': formats,
'description': livestream.get('content'),
'thumbnail': livestream.get('thumbnailUrl'),
'is_live': True,
'timestamp': int_or_none(livestream.get('createdAt'), 1000),
'view_count': int_or_none(livestream.get('watchingCount')),
}

View File

@@ -24,7 +24,7 @@ from ..utils import (
class DRTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio/ondemand)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['DK']
IE_NAME = 'drtv'
@@ -80,6 +80,9 @@ class DRTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dr.dk/radio/p4kbh/regionale-nyheder-kh4/p4-nyheder-2019-06-26-17-30-9',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import (
@@ -18,7 +19,7 @@ from ..utils import (
class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com|ca))/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
@@ -32,6 +33,12 @@ class EinthusanIE(InfoExtractor):
}, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True,
}, {
'url': 'https://einthusan.com/movie/watch/9097/',
'only_matching': True,
}, {
'url': 'https://einthusan.ca/movie/watch/4E9n/?lang=hindi',
'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
@@ -41,7 +48,9 @@ class EinthusanIE(InfoExtractor):
)).decode('utf-8'), video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
@@ -53,7 +62,7 @@ class EinthusanIE(InfoExtractor):
page_id = self._html_search_regex(
'<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
video_data = self._download_json(
'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
'https://%s/ajax/movie/watch/%s/' % (host, video_id), video_id,
data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({

View File

@@ -216,17 +216,14 @@ class FiveThirtyEightIE(InfoExtractor):
_TEST = {
'url': 'http://fivethirtyeight.com/features/how-the-6-8-raiders-can-still-make-the-playoffs/',
'info_dict': {
'id': '21846851',
'ext': 'mp4',
'id': '56032156',
'ext': 'flv',
'title': 'FiveThirtyEight: The Raiders can still make the playoffs',
'description': 'Neil Paine breaks down the simplest scenario that will put the Raiders into the playoffs at 8-8.',
'timestamp': 1513960621,
'upload_date': '20171222',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}
def _real_extract(self, url):
@@ -234,9 +231,8 @@ class FiveThirtyEightIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'data-video-id=["\'](?P<id>\d+)',
webpage, 'video id', group='id')
embed_url = self._search_regex(
r'<iframe[^>]+src=["\'](https?://fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/\d+)',
webpage, 'embed url')
return self.url_result(
'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())
return self.url_result(embed_url, 'AbcNewsVideo')

View File

@@ -58,17 +58,8 @@ from .ard import (
ARDMediathekIE,
)
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVInfoIE,
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
TheOperaPlatformIE,
ArteTVPlaylistIE,
)
from .asiancrush import (
@@ -113,6 +104,8 @@ from .bild import BildIE
from .bilibili import (
BiliBiliIE,
BiliBiliBangumiIE,
BilibiliAudioIE,
BilibiliAudioAlbumIE,
)
from .biobiochiletv import BioBioChileTVIE
from .bitchute import (
@@ -173,6 +166,7 @@ from .cbs import CBSIE
from .cbslocal import CBSLocalIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import (
CBSNewsEmbedIE,
CBSNewsIE,
CBSNewsLiveVideoIE,
)
@@ -240,7 +234,6 @@ from .condenast import CondeNastIE
from .corus import CorusIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
from .crooksandliars import CrooksAndLiarsIE
from .crunchyroll import (
CrunchyrollIE,
@@ -404,11 +397,7 @@ from .frontendmasters import (
FrontendMastersCourseIE
)
from .funimation import FunimationIE
from .funk import (
FunkMixIE,
FunkChannelIE,
)
from .funnyordie import FunnyOrDieIE
from .funk import FunkIE
from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
@@ -592,6 +581,7 @@ from .linkedin import (
)
from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
@@ -745,7 +735,6 @@ from .nexx import (
NexxIE,
NexxEmbedIE,
)
from .nfb import NFBIE
from .nfl import NFLIE
from .nhk import NhkVodIE
from .nhl import NHLIE
@@ -772,13 +761,6 @@ from .nova import (
NovaEmbedIE,
NovaIE,
)
from .novamov import (
AuroraVidIE,
CloudTimeIE,
NowVideoIE,
VideoWeedIE,
WholeCloudIE,
)
from .nowness import (
NownessIE,
NownessPlaylistIE,
@@ -896,12 +878,12 @@ from .polskieradio import (
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE
from .porncom import PornComIE
from .pornflip import PornFlipIE
from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
PornHubPlaylistIE,
PornHubUserVideosIE,
PornHubUserIE,
PornHubPagedVideoListIE,
PornHubUserVideosUploadIE,
)
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
@@ -911,7 +893,6 @@ from .puhutv import (
PuhuTVSerieIE,
)
from .presstv import PressTVIE
from .promptfile import PromptFileIE
from .prosiebensat1 import ProSiebenSat1IE
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
@@ -946,7 +927,10 @@ from .raywenderlich import (
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import RedBullTVIE
from .redbulltv import (
RedBullTVIE,
RedBullTVRrnContentIE,
)
from .reddit import (
RedditIE,
RedditRIE,
@@ -985,7 +969,6 @@ from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE
from .rtvs import RTVSIE
from .rudo import RudoIE
from .ruhd import RUHDIE
from .rutube import (
RutubeIE,
@@ -1145,6 +1128,7 @@ from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import (
TeleQuebecIE,
TeleQuebecSquatIE,
TeleQuebecEmissionIE,
TeleQuebecLiveIE,
)
@@ -1273,6 +1257,10 @@ from .udn import UDNEmbedIE
from .ufctv import UFCTVIE
from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE
from .dlive import (
DLiveVODIE,
DLiveStreamIE,
)
from .umg import UMGDeIE
from .unistra import UnistraIE
from .unity import UnityIE
@@ -1294,7 +1282,6 @@ from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vessel import VesselIE
from .vesti import VestiIE
from .vevo import (
VevoIE,
@@ -1425,7 +1412,6 @@ from .weibo import (
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wsj import (
@@ -1439,6 +1425,7 @@ from .xfileshare import XFileShareIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
XHamsterUserIE,
)
from .xiami import (
XiamiSongIE,
@@ -1462,6 +1449,7 @@ from .yahoo import (
YahooSearchIE,
YahooGyaOPlayerIE,
YahooGyaOIE,
YahooJapanNewsIE,
)
from .yandexdisk import YandexDiskIE
from .yandexmusic import (

View File

@@ -428,7 +428,7 @@ class FacebookIE(InfoExtractor):
timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None))
thumbnail = self._og_search_thumbnail(webpage)
thumbnail = self._html_search_meta(['og:image', 'twitter:image'], webpage)
view_count = parse_count(self._search_regex(
r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',

View File

@@ -9,7 +9,7 @@ from ..utils import int_or_none
class FiveTVIE(InfoExtractor):
_VALID_URL = r'''(?x)
http://
https?://
(?:www\.)?5-tv\.ru/
(?:
(?:[^/]+/)+(?P<id>\d+)|
@@ -39,6 +39,7 @@ class FiveTVIE(InfoExtractor):
'duration': 180,
},
}, {
# redirect to https://www.5-tv.ru/projects/1000095/izvestia-glavnoe/
'url': 'http://www.5-tv.ru/glavnoe/#itemDetails',
'info_dict': {
'id': 'glavnoe',
@@ -46,6 +47,7 @@ class FiveTVIE(InfoExtractor):
'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$',
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'redirect to «Известия. Главное» project page',
}, {
'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/',
'only_matching': True,
@@ -70,7 +72,7 @@ class FiveTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
[r'<div[^>]+?class="flowplayer[^>]+?data-href="([^"]+)"',
[r'<div[^>]+?class="(?:flow)?player[^>]+?data-href="([^"]+)"',
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'],
webpage, 'video url')

View File

@@ -1,89 +1,21 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
from .nexx import NexxIE
from ..compat import compat_str
from ..utils import (
int_or_none,
try_get,
str_or_none,
)
class FunkBaseIE(InfoExtractor):
_HEADERS = {
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4',
}
_AUTH = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4'
@staticmethod
def _make_headers(referer):
headers = FunkBaseIE._HEADERS.copy()
headers['Referer'] = referer
return headers
def _make_url_result(self, video):
return {
'_type': 'url_transparent',
'url': 'nexx:741:%s' % video['sourceId'],
'ie_key': NexxIE.ie_key(),
'id': video['sourceId'],
'title': video.get('title'),
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'season_number': int_or_none(video.get('seasonNr')),
'episode_number': int_or_none(video.get('episodeNr')),
}
class FunkMixIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/mix/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
class FunkIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funk\.net/(?:channel|playlist)/[^/]+/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/die-realste-kifferdoku-aller-zeiten',
'md5': '8edf617c2f2b7c9847dfda313f199009',
'info_dict': {
'id': '123748',
'ext': 'mp4',
'title': '"Die realste Kifferdoku aller Zeiten"',
'description': 'md5:c97160f5bafa8d47ec8e2e461012aa9d',
'timestamp': 1490274721,
'upload_date': '20170323',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
mix_id = mobj.group('id')
alias = mobj.group('alias')
lists = self._download_json(
'https://www.funk.net/api/v3.1/curation/curatedLists/',
mix_id, headers=self._make_headers(url), query={
'size': 100,
})['_embedded']['curatedListList']
metas = next(
l for l in lists
if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
video = next(
meta['videoDataDelegate']
for meta in metas
if try_get(
meta, lambda x: x['videoDataDelegate']['alias'],
compat_str) == alias)
return self._make_url_result(video)
class FunkChannelIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/channel/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funk.net/channel/ba/die-lustigsten-instrumente-aus-dem-internet-teil-2',
'url': 'https://www.funk.net/channel/ba-793/die-lustigsten-instrumente-aus-dem-internet-teil-2-1155821',
'md5': '8dd9d9ab59b4aa4173b3197f2ea48e81',
'info_dict': {
'id': '1155821',
'ext': 'mp4',
@@ -92,83 +24,26 @@ class FunkChannelIE(FunkBaseIE):
'timestamp': 1514507395,
'upload_date': '20171229',
},
'params': {
'skip_download': True,
},
}, {
# only available via byIdList API
'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
'info_dict': {
'id': '205067',
'ext': 'mp4',
'title': 'Martin Sonneborn erklärt die EU',
'description': 'md5:050f74626e4ed87edf4626d2024210c0',
'timestamp': 1494424042,
'upload_date': '20170510',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'url': 'https://www.funk.net/playlist/neuesteVideos/kameras-auf-dem-fusion-festival-1618699',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
alias = mobj.group('alias')
headers = self._make_headers(url)
video = None
# Id-based channels are currently broken on their side: webplayer
# tries to process them via byChannelAlias endpoint and fails
# predictably.
for page_num in itertools.count():
by_channel_alias = self._download_json(
'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s'
% channel_id,
'Downloading byChannelAlias JSON page %d' % (page_num + 1),
headers=headers, query={
'filterFsk': 'false',
'sort': 'creationDate,desc',
'size': 100,
'page': page_num,
}, fatal=False)
if not by_channel_alias:
break
video_list = try_get(
by_channel_alias, lambda x: x['_embedded']['videoList'], list)
if not video_list:
break
try:
video = next(r for r in video_list if r.get('alias') == alias)
break
except StopIteration:
pass
if not try_get(
by_channel_alias, lambda x: x['_links']['next']):
break
if not video:
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList',
channel_id, 'Downloading byIdList JSON', headers=headers,
query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video:
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter',
channel_id, 'Downloading filter JSON', headers=headers, query={
'channelId': channel_id,
'size': 100,
})['result']
video = next(r for r in results if r.get('alias') == alias)
return self._make_url_result(video)
display_id, nexx_id = re.match(self._VALID_URL, url).groups()
video = self._download_json(
'https://www.funk.net/api/v4.0/videos/' + nexx_id, nexx_id)
return {
'_type': 'url_transparent',
'url': 'nexx:741:' + nexx_id,
'ie_key': NexxIE.ie_key(),
'id': nexx_id,
'title': video.get('title'),
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'channel_id': str_or_none(video.get('channelId')),
'display_id': display_id,
'tags': video.get('tags'),
'thumbnail': video.get('imageUrlLandscape'),
}

View File

@@ -1,162 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
unified_timestamp,
)
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|articles|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_TESTS = [{
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
'info_dict': {
'id': '0732f586d7',
'ext': 'mp4',
'title': 'Heart-Shaped Box: Literal Video Version',
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
'thumbnail': r're:^http:.*\.jpg$',
'uploader': 'DASjr',
'timestamp': 1317904928,
'upload_date': '20111006',
'duration': 318.3,
},
}, {
'url': 'http://www.funnyordie.com/embed/e402820827',
'info_dict': {
'id': 'e402820827',
'ext': 'mp4',
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': r're:^http:.*\.jpg$',
'timestamp': 1398988800,
'upload_date': '20140502',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.funnyordie.com/articles/ebf5e34fc8/10-hours-of-walking-in-nyc-as-a-man',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
links = re.findall(r'<source src="([^"]+/v)[^"]+\.([^"]+)" type=\'video', webpage)
if not links:
raise ExtractorError('No media links available for %s' % video_id)
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
m3u8_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
webpage, 'm3u8 url', group='url')
formats = []
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
source_formats = list(filter(
lambda f: f.get('vcodec') != 'none', m3u8_formats))
bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)(?=[,/])', m3u8_url)]
bitrates.sort()
if source_formats:
self._sort_formats(source_formats)
for bitrate, f in zip(bitrates, source_formats or [{}] * len(bitrates)):
for path, ext in links:
ff = f.copy()
if ff:
if ext != 'mp4':
ff = dict(
[(k, v) for k, v in ff.items()
if k in ('height', 'width', 'format_id')])
ff.update({
'format_id': ff['format_id'].replace('hls', ext),
'ext': ext,
'protocol': 'http',
})
else:
ff.update({
'format_id': '%s-%d' % (ext, bitrate),
'vbr': bitrate,
})
ff['url'] = self._proto_relative_url(
'%s%d.%s' % (path, bitrate, ext))
formats.append(ff)
self._check_formats(formats, video_id)
formats.extend(m3u8_formats)
self._sort_formats(
formats, field_preference=('height', 'width', 'tbr', 'format_id'))
subtitles = {}
for src, src_lang in re.findall(r'<track kind="captions" src="([^"]+)" srclang="([^"]+)"', webpage):
subtitles[src_lang] = [{
'ext': src.split('/')[-1],
'url': 'http://www.funnyordie.com%s' % src,
}]
timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp', default=None))
uploader = self._html_search_regex(
r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
webpage, 'uploader', default=None)
title, description, thumbnail, duration = [None] * 4
medium = self._parse_json(
self._search_regex(
r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
default='{}'),
video_id, fatal=False)
if medium:
title = medium.get('title')
duration = float_or_none(medium.get('duration'))
if not timestamp:
timestamp = unified_timestamp(medium.get('publishDate'))
post = self._parse_json(
self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
default='{}'),
video_id, fatal=False)
if post:
if not title:
title = post.get('name')
description = post.get('description')
thumbnail = post.get('picture')
if not title:
title = self._og_search_title(webpage)
if not description:
description = self._og_search_description(webpage)
if not duration:
duration = int_or_none(self._html_search_meta(
('video:duration', 'duration'), webpage, 'duration', default=False))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -1,35 +1,84 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import (
determine_ext,
int_or_none,
mimetype2ext,
parse_iso8601,
)
class FusionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fusion\.(?:net|tv)/video/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?fusion\.(?:net|tv)/(?:video/|show/.+?\bvideo=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://fusion.tv/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
'info_dict': {
'id': 'ZpcWNoMTE6x6uVIIWYpHh0qQDjxBuq5P',
'id': '3145868',
'ext': 'mp4',
'title': 'U.S. and Panamanian forces work together to stop a vessel smuggling drugs',
'description': 'md5:0cc84a9943c064c0f46b128b41b1b0d7',
'duration': 140.0,
'timestamp': 1442589635,
'uploader': 'UNIVISON',
'upload_date': '20150918',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
'add_ie': ['Anvato'],
}, {
'url': 'http://fusion.tv/video/201781',
'only_matching': True,
}, {
'url': 'https://fusion.tv/show/food-exposed-with-nelufar-hedayat/?ancla=full-episodes&video=588644',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._match_id(url)
video = self._download_json(
'https://platform.fusion.net/wp-json/fusiondotnet/v1/video/' + video_id, video_id)
ooyala_code = self._search_regex(
r'data-ooyala-id=(["\'])(?P<code>(?:(?!\1).)+)\1',
webpage, 'ooyala code', group='code')
info = {
'id': video_id,
'title': video['title'],
'description': video.get('excerpt'),
'timestamp': parse_iso8601(video.get('published')),
'series': video.get('show'),
}
return OoyalaIE._build_url_result(ooyala_code)
formats = []
src = video.get('src') or {}
for f_id, f in src.items():
for q_id, q in f.items():
q_url = q.get('url')
if not q_url:
continue
ext = determine_ext(q_url, mimetype2ext(q.get('type')))
if ext == 'smil':
formats.extend(self._extract_smil_formats(q_url, video_id, fatal=False))
elif f_id == 'm3u8-variant' or (ext == 'm3u8' and q_id == 'Variant'):
formats.extend(self._extract_m3u8_formats(
q_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': '-'.join([f_id, q_id]),
'url': q_url,
'width': int_or_none(q.get('width')),
'height': int_or_none(q.get('height')),
'tbr': int_or_none(self._search_regex(r'_(\d+)\.m(?:p4|3u8)', q_url, 'bitrate')),
'ext': 'mp4' if ext == 'm3u8' else ext,
'protocol': 'm3u8_native' if ext == 'm3u8' else 'https',
})
if formats:
self._sort_formats(formats)
info['formats'] = formats
else:
info.update({
'_type': 'url',
'url': 'anvato:uni:' + video['video_ids']['anvato'],
'ie_key': 'Anvato',
})
return info

View File

@@ -1,12 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
clean_html,
get_element_by_class,
get_element_by_id,
)
class GameInformerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>[^.?&#]+)'
_TESTS = [{
# normal Brightcove embed code extracted with BrightcoveNewIE._extract_url
'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx',
'md5': '292f26da1ab4beb4c9099f1304d2b071',
'info_dict': {
@@ -18,16 +25,25 @@ class GameInformerIE(InfoExtractor):
'upload_date': '20150928',
'uploader_id': '694940074001',
},
}
}, {
# Brightcove id inside unique element with field--name-field-brightcove-video-id class
'url': 'https://www.gameinformer.com/video-feature/new-gameplay-today/2019/07/09/new-gameplay-today-streets-of-rogue',
'info_dict': {
'id': '6057111913001',
'ext': 'mp4',
'title': 'New Gameplay Today Streets Of Rogue',
'timestamp': 1562699001,
'upload_date': '20190709',
'uploader_id': '694940074001',
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(
url, display_id, headers=self.geo_verification_headers())
brightcove_id = self._search_regex(
[r'<[^>]+\bid=["\']bc_(\d+)', r"getVideo\('[^']+video_id=(\d+)"],
webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
brightcove_id)
brightcove_id = clean_html(get_element_by_class('field--name-field-brightcove-video-id', webpage) or get_element_by_id('video-source-content', webpage))
brightcove_url = self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id if brightcove_id else BrightcoveNewIE._extract_url(self, webpage)
return self.url_result(brightcove_url, 'BrightcoveNew', brightcove_id)

View File

@@ -77,7 +77,6 @@ from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .vessel import VesselIE
from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE
@@ -2075,6 +2074,22 @@ class GenericIE(InfoExtractor):
},
'playlist_count': 6,
},
{
# Squarespace video embed, 2019-08-28
'url': 'http://ootboxford.com',
'info_dict': {
'id': 'Tc7b_JGdZfw',
'title': 'Out of the Blue, at Childish Things 10',
'ext': 'mp4',
'description': 'md5:a83d0026666cf5ee970f8bd1cfd69c7f',
'uploader_id': 'helendouglashouse',
'uploader': 'Helen & Douglas House',
'upload_date': '20140328',
},
'params': {
'skip_download': True,
},
},
{
# Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
@@ -2104,6 +2119,23 @@ class GenericIE(InfoExtractor):
},
'expected_warnings': ['Failed to download MPD manifest'],
},
{
# DailyMotion embed with DM.player
'url': 'https://www.beinsports.com/us/copa-del-rey/video/the-locker-room-valencia-beat-barca-in-copa/1203804',
'info_dict': {
'id': 'k6aKkGHd9FJs4mtJN39',
'ext': 'mp4',
'title': 'The Locker Room: Valencia Beat Barca In Copa del Rey Final',
'description': 'This video is private.',
'uploader_id': 'x1jf30l',
'uploader': 'beIN SPORTS USA',
'upload_date': '20190528',
'timestamp': 1559062971,
},
'params': {
'skip_download': True,
},
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -2209,7 +2241,7 @@ class GenericIE(InfoExtractor):
default_search = 'fixup_error'
if default_search in ('auto', 'auto_warning', 'fixup_error'):
if '/' in url:
if re.match(r'^[^\s/]+\.[^\s/]+/', url):
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
elif default_search != 'fixup_error':
@@ -2378,6 +2410,12 @@ class GenericIE(InfoExtractor):
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage)
# Unescape squarespace embeds to be detected by generic extractor,
# see https://github.com/ytdl-org/youtube-dl/issues/21294
webpage = re.sub(
r'<div[^>]+class=[^>]*?\bsqs-video-wrapper\b[^>]*>',
lambda x: unescapeHTML(x.group(0)), webpage)
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
@@ -2452,11 +2490,6 @@ class GenericIE(InfoExtractor):
if tp_urls:
return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
# Look for Vessel embeds
vessel_urls = VesselIE._extract_urls(webpage)
if vessel_urls:
return self.playlist_from_matches(vessel_urls, video_id, video_title, ie=VesselIE.ie_key())
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@@ -2583,19 +2616,6 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group(1), 'Mpora')
# Look for embedded NovaMov-based player
mobj = re.search(
r'''(?x)<(?:pagespeed_)?iframe[^>]+?src=(["\'])
(?P<url>http://(?:(?:embed|www)\.)?
(?:novamov\.com|
nowvideo\.(?:ch|sx|eu|at|ag|co)|
videoweed\.(?:es|com)|
movshare\.(?:net|sx|ag)|
divxstage\.(?:eu|net|ch|co|at|ag))
/embed\.php.+?)\1''', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'))
# Look for embedded Facebook player
facebook_urls = FacebookIE._extract_urls(webpage)
if facebook_urls:

View File

@@ -11,7 +11,7 @@ from ..utils import (
class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
_VALID_URL = r'https?://(?:(?:www|giant|thumbs)\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#\.]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
@@ -44,12 +44,21 @@ class GfycatIE(InfoExtractor):
'categories': list,
'age_limit': 0,
}
}, {
'url': 'https://gfycat.com/ru/RemarkableDrearyAmurstarfish',
'only_matching': True
}, {
'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
'only_matching': True
}, {
'url': 'https://gfycat.com/acceptablehappygoluckyharborporpoise-baseball',
'only_matching': True
}, {
'url': 'https://thumbs.gfycat.com/acceptablehappygoluckyharborporpoise-size_restricted.gif',
'only_matching': True
}, {
'url': 'https://giant.gfycat.com/acceptablehappygoluckyharborporpoise.mp4',
'only_matching': True
}]
def _real_extract(self, url):

View File

@@ -96,21 +96,31 @@ class GloboIE(InfoExtractor):
video = self._download_json(
'http://api.globovideos.com/videos/%s/playlist' % video_id,
video_id)['videos'][0]
if video.get('encrypted') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
title = video['title']
formats = []
subtitles = {}
for resource in video['resources']:
resource_id = resource.get('_id')
resource_url = resource.get('url')
if not resource_id or not resource_url:
resource_type = resource.get('type')
if not resource_url or (resource_type == 'media' and not resource_id) or resource_type not in ('subtitle', 'media'):
continue
if resource_type == 'subtitle':
subtitles.setdefault(resource.get('language') or 'por', []).append({
'url': resource_url,
})
continue
security = self._download_json(
'http://security.video.globo.com/videos/%s/hash' % video_id,
video_id, 'Downloading security hash for %s' % resource_id, query={
'player': 'flash',
'version': '17.0.0.132',
'player': 'desktop',
'version': '5.19.1',
'resource_id': resource_id,
})
@@ -122,19 +132,18 @@ class GloboIE(InfoExtractor):
'%s returned error: %s' % (self.IE_NAME, message), expected=True)
continue
hash_code = security_hash[:2]
received_time = security_hash[2:12]
received_random = security_hash[12:22]
received_md5 = security_hash[22:]
assert security_hash[:2] in ('04', '14')
received_time = security_hash[3:13]
received_md5 = security_hash[24:]
sign_time = compat_str(int(received_time) + 86400)
padding = '%010d' % random.randint(1, 10000000000)
md5_data = (received_md5 + sign_time + padding + '0xFF01DD').encode()
md5_data = (received_md5 + sign_time + padding + '0xAC10FD').encode()
signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
signed_hash = hash_code + received_time + received_random + sign_time + padding + signed_md5
signed_hash = security_hash[:23] + sign_time + padding + signed_md5
signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
signed_url = '%s?h=%s&k=html5&a=%s&u=%s' % (resource_url, signed_hash, 'F' if video.get('subscriber_only') else 'A', security.get('user') or '')
if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats(
signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
@@ -164,7 +173,8 @@ class GloboIE(InfoExtractor):
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -34,9 +34,13 @@ class GoIE(AdobePassIE):
'watchdisneyxd': {
'brand': '009',
'resource_id': 'DisneyXD',
},
'disneynow': {
'brand': '011',
'resource_id': 'Disney',
}
}
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
_VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
% '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
_TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
@@ -71,6 +75,9 @@ class GoIE(AdobePassIE):
# brand 008
'url': 'http://disneynow.go.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}, {
'url': 'https://disneynow.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}]
def _extract_videos(self, brand, video_id='-1', show_id='-1'):
@@ -80,7 +87,9 @@ class GoIE(AdobePassIE):
display_id)['video']
def _real_extract(self, url):
sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
mobj = re.match(self._VALID_URL, url)
sub_domain = mobj.group('sub_domain') or mobj.group('sub_domain_2')
video_id, display_id = mobj.group('id', 'display_id')
site_info = self._SITE_INFO.get(sub_domain, {})
brand = site_info.get('brand')
if not video_id or not site_info:
@@ -89,7 +98,7 @@ class GoIE(AdobePassIE):
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)', webpage, 'video id',
default=None)
default=video_id)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',

View File

@@ -105,8 +105,7 @@ class HeiseIE(InfoExtractor):
webpage, default=None) or self._html_search_meta(
'description', webpage)
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
def _make_kaltura_result(kaltura_url):
return {
'_type': 'url_transparent',
'url': smuggle_url(kaltura_url, {'source_url': url}),
@@ -115,6 +114,16 @@ class HeiseIE(InfoExtractor):
'description': description,
}
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
return _make_kaltura_result(kaltura_url)
kaltura_id = self._search_regex(
r'entry-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'kaltura id',
default=None, group='id')
if kaltura_id:
return _make_kaltura_result('kaltura:2238431:%s' % kaltura_id)
yt_urls = YoutubeIE._extract_urls(webpage)
if yt_urls:
return self.playlist_from_matches(

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import hashlib
import hmac
import re
import time
import uuid
@@ -126,6 +127,8 @@ class HotStarIE(HotStarBaseIE):
format_url = url_or_none(playback_set.get('playbackUrl'))
if not format_url:
continue
format_url = re.sub(
r'(?<=//staragvod)(\d)', r'web\1', format_url)
tags = str_or_none(playback_set.get('tagsCombination')) or ''
if tags and 'encryption:plain' not in tags:
continue
@@ -133,7 +136,8 @@ class HotStarIE(HotStarBaseIE):
try:
if 'package:hls' in tags or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls'))
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls'))
elif 'package:dash' in tags or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash'))

View File

@@ -22,7 +22,7 @@ from ..utils import (
class InstagramIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv)/(?P<id>[^/?#&]+))'
_TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
@@ -92,6 +92,9 @@ class InstagramIE(InfoExtractor):
}, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True,
}, {
'url': 'https://www.instagram.com/tv/aye83DjauH/',
'only_matching': True,
}]
@staticmethod

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|video)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
'md5': 'fa8899fa601eb7c83a64e9d568bdf325',

View File

@@ -103,6 +103,11 @@ class KalturaIE(InfoExtractor):
{
'url': 'https://www.kaltura.com:443/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto',
'only_matching': True,
},
{
# unavailable source format
'url': 'kaltura:513551:1_66x4rg7o',
'only_matching': True,
}
]
@@ -146,14 +151,15 @@ class KalturaIE(InfoExtractor):
if mobj:
embed_info = mobj.groupdict()
for k, v in embed_info.items():
embed_info[k] = v.strip()
if v:
embed_info[k] = v.strip()
url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
escaped_pid = re.escape(embed_info['partner_id'])
service_url = re.search(
r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
service_mobj = re.search(
r'<script[^>]+src=(["\'])(?P<id>(?:https?:)?//(?:(?!\1).)+)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
webpage)
if service_url:
url = smuggle_url(url, {'service_url': service_url.group(1)})
if service_mobj:
url = smuggle_url(url, {'service_url': service_mobj.group('id')})
return url
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
@@ -306,12 +312,17 @@ class KalturaIE(InfoExtractor):
f['fileExt'] = 'mp4'
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
format_id = '%(fileExt)s-%(bitrate)s' % f
# Source format may not be available (e.g. kaltura:513551:1_66x4rg7o)
if f.get('isOriginal') is True and not self._is_valid_url(
video_url, entry_id, format_id):
continue
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
# -f mp4-56)
vcodec = 'none' if 'videoCodecId' not in f and f.get(
'frameRate') == 0 else f.get('videoCodecId')
formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f,
'format_id': format_id,
'ext': f.get('fileExt'),
'tbr': int_or_none(f['bitrate']),
'fps': int_or_none(f.get('frameRate')),

View File

@@ -6,8 +6,8 @@ import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
determine_ext,
extract_attributes,
ExtractorError,
float_or_none,
int_or_none,
@@ -19,6 +19,7 @@ from ..utils import (
class LecturioBaseIE(InfoExtractor):
_API_BASE_URL = 'https://app.lecturio.com/api/en/latest/html5/'
_LOGIN_URL = 'https://app.lecturio.com/en/login'
_NETRC_MACHINE = 'lecturio'
@@ -67,51 +68,56 @@ class LecturioIE(LecturioBaseIE):
_VALID_URL = r'''(?x)
https://
(?:
app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture|
(?:www\.)?lecturio\.de/[^/]+/(?P<id_de>[^/?#&]+)\.vortrag
app\.lecturio\.com/([^/]+/(?P<nt>[^/?#&]+)\.lecture|(?:\#/)?lecture/c/\d+/(?P<id>\d+))|
(?:www\.)?lecturio\.de/[^/]+/(?P<nt_de>[^/?#&]+)\.vortrag
)
'''
_TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
'md5': '9a42cf1d8282a6311bf7211bbde26fde',
'info_dict': {
'id': '39634',
'ext': 'mp4',
'title': 'Important Concepts and Terms Introduction to Microbiology',
'title': 'Important Concepts and Terms Introduction to Microbiology',
},
'skip': 'Requires lecturio account credentials',
}, {
'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
'only_matching': True,
}, {
'url': 'https://app.lecturio.com/#/lecture/c/6434/39634',
'only_matching': True,
}]
_CC_LANGS = {
'Arabic': 'ar',
'Bulgarian': 'bg',
'German': 'de',
'English': 'en',
'Spanish': 'es',
'Persian': 'fa',
'French': 'fr',
'Japanese': 'ja',
'Polish': 'pl',
'Pashto': 'ps',
'Russian': 'ru',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') or mobj.group('id_de')
webpage = self._download_webpage(
'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
display_id)
lecture_id = self._search_regex(
r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
api_url = self._search_regex(
r'lectureDataLink\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'api url', group='url')
video = self._download_json(api_url, display_id)
nt = mobj.group('nt') or mobj.group('nt_de')
lecture_id = mobj.group('id')
display_id = nt or lecture_id
api_path = 'lectures/' + lecture_id if lecture_id else 'lecture/' + nt + '.json'
video = self._download_json(
self._API_BASE_URL + api_path, display_id)
title = video['title'].strip()
if not lecture_id:
pid = video.get('productId') or video.get('uid')
if pid:
spid = pid.split('_')
if spid and len(spid) == 2:
lecture_id = spid[1]
formats = []
for format_ in video['content']['media']:
@@ -129,24 +135,30 @@ class LecturioIE(LecturioBaseIE):
continue
label = str_or_none(format_.get('label'))
filesize = int_or_none(format_.get('fileSize'))
formats.append({
f = {
'url': file_url,
'format_id': label,
'filesize': float_or_none(filesize, invscale=1000)
})
}
if label:
mobj = re.match(r'(\d+)p\s*\(([^)]+)\)', label)
if mobj:
f.update({
'format_id': mobj.group(2),
'height': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats)
subtitles = {}
automatic_captions = {}
cc = self._parse_json(
self._search_regex(
r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles',
default='{}'), display_id, fatal=False)
for cc_label, cc_url in cc.items():
cc_url = url_or_none(cc_url)
captions = video.get('captions') or []
for cc in captions:
cc_url = cc.get('url')
if not cc_url:
continue
lang = self._search_regex(
cc_label = cc.get('translatedCode')
lang = cc.get('languageCode') or self._search_regex(
r'/([a-z]{2})_', cc_url, 'lang',
default=cc_label.split()[0] if cc_label else 'en')
original_lang = self._search_regex(
@@ -160,7 +172,7 @@ class LecturioIE(LecturioBaseIE):
})
return {
'id': lecture_id,
'id': lecture_id or nt,
'title': title,
'formats': formats,
'subtitles': subtitles,
@@ -169,37 +181,40 @@ class LecturioIE(LecturioBaseIE):
class LecturioCourseIE(LecturioBaseIE):
_VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.course'
_TEST = {
_VALID_URL = r'https://app\.lecturio\.com/(?:[^/]+/(?P<nt>[^/?#&]+)\.course|(?:#/)?course/c/(?P<id>\d+))'
_TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
'info_dict': {
'id': 'microbiology-introduction',
'title': 'Microbiology: Introduction',
'description': 'md5:13da8500c25880c6016ae1e6d78c386a',
},
'playlist_count': 45,
'skip': 'Requires lecturio account credentials',
}
}, {
'url': 'https://app.lecturio.com/#/course/c/6434',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
nt, course_id = re.match(self._VALID_URL, url).groups()
display_id = nt or course_id
api_path = 'courses/' + course_id if course_id else 'course/content/' + nt + '.json'
course = self._download_json(
self._API_BASE_URL + api_path, display_id)
entries = []
for mobj in re.finditer(
r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>',
webpage):
params = extract_attributes(mobj.group(0))
lecture_url = urljoin(url, params.get('data-url'))
lecture_id = params.get('data-id')
for lecture in course.get('lectures', []):
lecture_id = str_or_none(lecture.get('id'))
lecture_url = lecture.get('url')
if lecture_url:
lecture_url = urljoin(url, lecture_url)
else:
lecture_url = 'https://app.lecturio.com/#/lecture/c/%s/%s' % (course_id, lecture_id)
entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
title = self._search_regex(
r'<span[^>]+class=["\']content-title[^>]+>([^<]+)', webpage,
'title', default=None)
return self.playlist_result(entries, display_id, title)
return self.playlist_result(
entries, display_id, course.get('title'),
clean_html(course.get('description')))
class LecturioDeCourseIE(LecturioBaseIE):

View File

@@ -326,7 +326,7 @@ class LetvCloudIE(InfoExtractor):
elif play_json.get('code'):
raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
else:
raise ExtractorError('Letv cloud returned an unknwon error')
raise ExtractorError('Letv cloud returned an unknown error')
def b64decode(s):
return compat_b64decode(s).decode('utf-8')

View File

@@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
class LiveJournalIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?livejournal\.com/video/album/\d+.+?\bid=(?P<id>\d+)'
_TEST = {
'url': 'https://andrei-bt.livejournal.com/video/album/407/?mode=view&id=51272',
'md5': 'adaf018388572ced8a6f301ace49d4b2',
'info_dict': {
'id': '1263729',
'ext': 'mp4',
'title': 'Истребители против БПЛА',
'upload_date': '20190624',
'timestamp': 1561406715,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
record = self._parse_json(self._search_regex(
r'Site\.page\s*=\s*({.+?});', webpage,
'page data'), video_id)['video']['record']
storage_id = compat_str(record['storageid'])
title = record.get('name')
if title:
# remove filename extension(.mp4, .mov, etc...)
title = title.rsplit('.', 1)[0]
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'thumbnail': record.get('thumbnail'),
'timestamp': int_or_none(record.get('timecreate')),
'url': 'eagleplatform:vc.videos.livejournal.com:' + storage_id,
'ie_key': 'EaglePlatform',
}

View File

@@ -82,6 +82,10 @@ class LiveLeakIE(InfoExtractor):
}, {
'url': 'https://www.liveleak.com/view?t=HvHi_1523016227',
'only_matching': True,
}, {
# No original video
'url': 'https://www.liveleak.com/view?t=C26ZZ_1558612804',
'only_matching': True,
}]
@staticmethod
@@ -134,11 +138,13 @@ class LiveLeakIE(InfoExtractor):
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
formats.append({
'format_id': 'original' + ('-' + format_id if format_id else ''),
'url': orig_url,
'preference': 1,
})
format_id = 'original' + ('-' + format_id if format_id else '')
if self._is_valid_url(orig_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': orig_url,
'preference': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats

View File

@@ -117,6 +117,10 @@ class LyndaIE(LyndaBaseIE):
}, {
'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
'only_matching': True,
}, {
# Status="NotFound", Message="Transcript not found"
'url': 'https://www.lynda.com/ASP-NET-tutorials/What-you-should-know/5034180/2811512-4.html',
'only_matching': True,
}]
def _raise_unavailable(self, video_id):
@@ -247,12 +251,17 @@ class LyndaIE(LyndaBaseIE):
def _get_subtitles(self, video_id):
url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
subs = self._download_json(url, None, False)
subs = self._download_webpage(
url, video_id, 'Downloading subtitles JSON', fatal=False)
if not subs or 'Status="NotFound"' in subs:
return {}
subs = self._parse_json(subs, video_id, fatal=False)
if not subs:
return {}
fixed_subs = self._fix_subtitles(subs)
if fixed_subs:
return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
else:
return {}
return {}
class LyndaCourseIE(LyndaBaseIE):

View File

@@ -79,6 +79,10 @@ class MGTVIE(InfoExtractor):
'ext': 'mp4',
'tbr': tbr,
'protocol': 'm3u8_native',
'http_headers': {
'Referer': url,
},
'format_note': stream.get('name'),
})
self._sort_formats(formats)

View File

@@ -164,7 +164,7 @@ class MixcloudIE(InfoExtractor):
def decrypt_url(f_url):
for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'):
decrypted_url = self._decrypt_xor_cipher(k, f_url)
if re.search(r'^https?://[0-9a-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
if re.search(r'^https?://[0-9A-Za-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
return decrypted_url
for url_key in ('url', 'hlsUrl', 'dashUrl'):

View File

@@ -10,7 +10,6 @@ from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_unquote
from ..utils import (
smuggle_url,
try_get,
update_url_query,
int_or_none,
)
@@ -85,27 +84,41 @@ class NBCIE(AdobePassIE):
permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + compat_urllib_parse_unquote(permalink)
response = self._download_json(
'https://api.nbc.com/v3/videos', video_id, query={
'filter[permalink]': permalink,
'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
'fields[shows]': 'shortTitle',
'include': 'show.shortTitle',
'https://friendship.nbc.co/v2/graphql', video_id, query={
'query': '''{
page(name: "%s", platform: web, type: VIDEO, userId: "0") {
data {
... on VideoPageData {
description
episodeNumber
keywords
locked
mpxAccountId
mpxGuid
rating
seasonNumber
secondaryTitle
seriesShortTitle
}
}
}
}''' % permalink,
})
video_data = response['data'][0]['attributes']
video_data = response['data']['page']['data']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
video_id = video_data['guid']
title = video_data['title']
if video_data.get('entitlement') == 'auth':
video_id = video_data['mpxGuid']
title = video_data['secondaryTitle']
if video_data.get('locked'):
resource = self._get_mvpd_resource(
'nbcentertainment', title, video_id,
video_data.get('vChipRating'))
video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query(
'http://link.theplatform.com/s/NnzsPC/media/guid/2410887629/' + video_id,
'http://link.theplatform.com/s/NnzsPC/media/guid/%s/%s' % (video_data.get('mpxAccountId') or '2410887629', video_id),
query), {'force_smil_url': True})
return {
'_type': 'url_transparent',
@@ -117,7 +130,7 @@ class NBCIE(AdobePassIE):
'season_number': int_or_none(video_data.get('seasonNumber')),
'episode_number': int_or_none(video_data.get('episodeNumber')),
'episode': title,
'series': try_get(response, lambda x: x['included'][0]['attributes']['shortTitle']),
'series': video_data.get('seriesShortTitle'),
'ie_key': 'ThePlatform',
}

View File

@@ -295,13 +295,23 @@ class NexxIE(InfoExtractor):
video = None
def find_video(result):
if isinstance(result, dict):
return result
elif isinstance(result, list):
vid = int(video_id)
for v in result:
if try_get(v, lambda x: x['general']['ID'], int) == vid:
return v
return None
response = self._download_json(
'https://arc.nexx.cloud/api/video/%s.json' % video_id,
video_id, fatal=False)
if response and isinstance(response, dict):
result = response.get('result')
if result and isinstance(result, dict):
video = result
if result:
video = find_video(result)
# not all videos work via arc, e.g. nexx:741:1269984
if not video:
@@ -348,7 +358,7 @@ class NexxIE(InfoExtractor):
request_token = hashlib.md5(
''.join((op, domain_id, secret)).encode('utf-8')).hexdigest()
video = self._call_api(
result = self._call_api(
domain_id, 'videos/%s/%s' % (op, video_id), video_id, data={
'additionalfields': 'language,channel,actors,studio,licenseby,slug,subtitle,teaser,description',
'addInteractionOptions': '1',
@@ -363,6 +373,7 @@ class NexxIE(InfoExtractor):
'X-Request-CID': cid,
'X-Request-Token': request_token,
})
video = find_video(result)
general = video['general']
title = general['title']

View File

@@ -1,112 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
int_or_none,
qualities,
urlencode_postdata,
xpath_text,
)
class NFBIE(InfoExtractor):
IE_NAME = 'nfb'
IE_DESC = 'National Film Board of Canada'
_VALID_URL = r'https?://(?:www\.)?(?:nfb|onf)\.ca/film/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'https://www.nfb.ca/film/qallunaat_why_white_people_are_funny',
'info_dict': {
'id': 'qallunaat_why_white_people_are_funny',
'ext': 'flv',
'title': 'Qallunaat! Why White People Are Funny ',
'description': 'md5:6b8e32dde3abf91e58857b174916620c',
'duration': 3128,
'creator': 'Mark Sandiford',
'uploader': 'Mark Sandiford',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
config = self._download_xml(
'https://www.nfb.ca/film/%s/player_config' % video_id,
video_id, 'Downloading player config XML',
data=urlencode_postdata({'getConfig': 'true'}),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'X-NFB-Referer': 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf'
})
title, description, thumbnail, duration, uploader, author = [None] * 6
thumbnails, formats = [[]] * 2
subtitles = {}
for media in config.findall('./player/stream/media'):
if media.get('type') == 'posterImage':
quality_key = qualities(('low', 'high'))
thumbnails = []
for asset in media.findall('assets/asset'):
asset_url = xpath_text(asset, 'default/url', default=None)
if not asset_url:
continue
quality = asset.get('quality')
thumbnails.append({
'url': asset_url,
'id': quality,
'preference': quality_key(quality),
})
elif media.get('type') == 'video':
title = xpath_text(media, 'title', fatal=True)
for asset in media.findall('assets/asset'):
quality = asset.get('quality')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', quality or '', 'height', default=None))
for node in asset:
streamer = xpath_text(node, 'streamerURI', default=None)
if not streamer:
continue
play_path = xpath_text(node, 'url', default=None)
if not play_path:
continue
formats.append({
'url': streamer,
'app': streamer.split('/', 3)[3],
'play_path': play_path,
'rtmp_live': False,
'ext': 'flv',
'format_id': '%s-%s' % (node.tag, quality) if quality else node.tag,
'height': height,
})
self._sort_formats(formats)
description = clean_html(xpath_text(media, 'description'))
uploader = xpath_text(media, 'author')
duration = int_or_none(media.get('duration'))
for subtitle in media.findall('./subtitles/subtitle'):
subtitle_url = xpath_text(subtitle, 'url', default=None)
if not subtitle_url:
continue
lang = xpath_text(subtitle, 'lang', default='en')
subtitles.setdefault(lang, []).append({
'url': subtitle_url,
'ext': (subtitle.get('format') or determine_ext(subtitle_url)).lower(),
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnails': thumbnails,
'duration': duration,
'creator': uploader,
'uploader': uploader,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -10,6 +10,18 @@ class NhkVodIE(InfoExtractor):
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
# clip
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999011/',
'md5': '256a1be14f48d960a7e61e2532d95ec3',
'info_dict': {
'id': 'a95j5iza',
'ext': 'mp4',
'title': "Dining with the Chef - Chef Saito's Family recipe: MENCHI-KATSU",
'description': 'md5:5aee4a9f9d81c26281862382103b0ea5',
'timestamp': 1565965194,
'upload_date': '20190816',
},
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}, {
@@ -19,7 +31,7 @@ class NhkVodIE(InfoExtractor):
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}]
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sodesdlist/v7/episode/%s/%s/all%s.json'
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7/episode/%s/%s/all%s.json'
def _real_extract(self, url):
lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
@@ -28,7 +40,10 @@ class NhkVodIE(InfoExtractor):
is_video = m_type == 'video'
episode = self._download_json(
self._API_URL_TEMPLATE % ('v' if is_video else 'r', episode_id, lang, '/all' if is_video else ''),
self._API_URL_TEMPLATE % (
'v' if is_video else 'r',
'clip' if episode_id[:4] == '9999' else 'esd',
episode_id, lang, '/all' if is_video else ''),
episode_id, query={'apikey': 'EJfK8jdS57GqlupFgAfAAwr573q01y6k'})['data']['episodes'][0]
title = episode.get('sub_title_clean') or episode['sub_title']
@@ -60,8 +75,8 @@ class NhkVodIE(InfoExtractor):
if is_video:
info.update({
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:' + episode['vod_id'],
'ie_key': 'Piksel',
'url': 'https://player.piksel.com/v/refid/nhkworld/prefid/' + episode['vod_id'],
})
else:
audio = episode['audio']

View File

@@ -85,7 +85,8 @@ class NickBrIE(MTVServicesInfoExtractor):
https?://
(?:
(?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br|
(?:www\.)?nickjr\.[a-z]{2}
(?:www\.)?nickjr\.[a-z]{2}|
(?:www\.)?nickelodeonjunior\.fr
)
/(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?\#.]+)
'''
@@ -101,6 +102,9 @@ class NickBrIE(MTVServicesInfoExtractor):
}, {
'url': 'http://www.nickjr.de/blaze-und-die-monster-maschinen/videos/f6caaf8f-e4e8-4cc1-b489-9380d6dcd059/',
'only_matching': True,
}, {
'url': 'http://www.nickelodeonjunior.fr/paw-patrol-la-pat-patrouille/videos/episode-401-entier-paw-patrol/',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -45,7 +45,11 @@ class NineNowIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
page_data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.*?});', webpage,
'page data'), display_id)
'page data', default='{}'), display_id, fatal=False)
if not page_data:
page_data = self._parse_json(self._parse_json(self._search_regex(
r'window\.__data\s*=\s*JSON\.parse\s*\(\s*(".+?")\s*\)\s*;',
webpage, 'page data'), display_id), display_id)
for kind in ('episode', 'clip'):
current_key = page_data.get(kind, {}).get(

View File

@@ -25,9 +25,14 @@ class NonkTubeIE(NuevoBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._extract_nuevo(
'https://www.nonktube.com/media/nuevo/econfig.php?key=%s'
% video_id, video_id)
webpage = self._download_webpage(url, video_id)
info['age_limit'] = 18
title = self._og_search_title(webpage)
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
info.update({
'id': video_id,
'title': title,
'age_limit': 18,
})
return info

View File

@@ -1,212 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
ExtractorError,
NO_DEFAULT,
sanitized_Request,
urlencode_postdata,
)
class NovaMovIE(InfoExtractor):
IE_NAME = 'novamov'
IE_DESC = 'NovaMov'
_VALID_URL_TEMPLATE = r'''(?x)
http://
(?:
(?:www\.)?%(host)s/(?:file|video|mobile/\#/videos)/|
(?:(?:embed|www)\.)%(host)s/embed(?:\.php|/)?\?(?:.*?&)?\bv=
)
(?P<id>[a-z\d]{13})
'''
_VALID_URL = _VALID_URL_TEMPLATE % {'host': r'novamov\.com'}
_HOST = 'www.novamov.com'
_FILE_DELETED_REGEX = r'This file no longer exists on our servers!</h2>'
_FILEKEY_REGEX = r'flashvars\.filekey=(?P<filekey>"?[^"]+"?);'
_TITLE_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>([^<]+)</h3>'
_DESCRIPTION_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>[^<]+</h3><p>([^<]+)</p>'
_URL_TEMPLATE = 'http://%s/video/%s'
_TEST = None
def _check_existence(self, webpage, video_id):
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
def _real_extract(self, url):
video_id = self._match_id(url)
url = self._URL_TEMPLATE % (self._HOST, video_id)
webpage = self._download_webpage(
url, video_id, 'Downloading video page')
self._check_existence(webpage, video_id)
def extract_filekey(default=NO_DEFAULT):
filekey = self._search_regex(
self._FILEKEY_REGEX, webpage, 'filekey', default=default)
if filekey is not default and (filekey[0] != '"' or filekey[-1] != '"'):
return self._search_regex(
r'var\s+%s\s*=\s*"([^"]+)"' % re.escape(filekey), webpage, 'filekey', default=default)
else:
return filekey
filekey = extract_filekey(default=None)
if not filekey:
fields = self._hidden_inputs(webpage)
post_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', webpage,
'post url', default=url, group='url')
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(url, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(fields))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
request.add_header('Referer', post_url)
webpage = self._download_webpage(
request, video_id, 'Downloading continue to the video page')
self._check_existence(webpage, video_id)
filekey = extract_filekey()
title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title')
description = self._html_search_regex(self._DESCRIPTION_REGEX, webpage, 'description', default='', fatal=False)
api_response = self._download_webpage(
'http://%s/api/player.api.php?key=%s&file=%s' % (self._HOST, filekey, video_id), video_id,
'Downloading video api response')
response = compat_urlparse.parse_qs(api_response)
if 'error_msg' in response:
raise ExtractorError('%s returned error: %s' % (self.IE_NAME, response['error_msg'][0]), expected=True)
video_url = response['url'][0]
return {
'id': video_id,
'url': video_url,
'title': title,
'description': description
}
class WholeCloudIE(NovaMovIE):
IE_NAME = 'wholecloud'
IE_DESC = 'WholeCloud'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'(?:wholecloud\.net|movshare\.(?:net|sx|ag))'}
_HOST = 'www.wholecloud.net'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<strong>Title:</strong> ([^<]+)</p>'
_DESCRIPTION_REGEX = r'<strong>Description:</strong> ([^<]+)</p>'
_TEST = {
'url': 'http://www.wholecloud.net/video/559e28be54d96',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': '559e28be54d96',
'ext': 'flv',
'title': 'dissapeared image',
'description': 'optical illusion dissapeared image magic illusion',
}
}
class NowVideoIE(NovaMovIE):
IE_NAME = 'nowvideo'
IE_DESC = 'NowVideo'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
_HOST = 'www.nowvideo.to'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<h4>([^<]+)</h4>'
_DESCRIPTION_REGEX = r'</h4>\s*<p>([^<]+)</p>'
_TEST = {
'url': 'http://www.nowvideo.sx/video/f1d6fce9a968b',
'md5': '12c82cad4f2084881d8bc60ee29df092',
'info_dict': {
'id': 'f1d6fce9a968b',
'ext': 'flv',
'title': 'youtubedl test video BaWjenozKc',
'description': 'Description',
},
}
class VideoWeedIE(NovaMovIE):
IE_NAME = 'videoweed'
IE_DESC = 'VideoWeed'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'videoweed\.(?:es|com)'}
_HOST = 'www.videoweed.es'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<h1 class="text_shadow">([^<]+)</h1>'
_URL_TEMPLATE = 'http://%s/file/%s'
_TEST = {
'url': 'http://www.videoweed.es/file/b42178afbea14',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': 'b42178afbea14',
'ext': 'flv',
'title': 'optical illusion dissapeared image magic illusion',
'description': ''
},
}
class CloudTimeIE(NovaMovIE):
IE_NAME = 'cloudtime'
IE_DESC = 'CloudTime'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'cloudtime\.to'}
_HOST = 'www.cloudtime.to'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<div[^>]+class=["\']video_det["\'][^>]*>\s*<strong>([^<]+)</strong>'
_TEST = None
class AuroraVidIE(NovaMovIE):
IE_NAME = 'auroravid'
IE_DESC = 'AuroraVid'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'auroravid\.to'}
_HOST = 'www.auroravid.to'
_FILE_DELETED_REGEX = r'This file no longer exists on our servers!<'
_TESTS = [{
'url': 'http://www.auroravid.to/video/4rurhn9x446jj',
'md5': '7205f346a52bbeba427603ba10d4b935',
'info_dict': {
'id': '4rurhn9x446jj',
'ext': 'flv',
'title': 'search engine optimization',
'description': 'search engine optimization is used to rank the web page in the google search engine'
},
'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
}, {
'url': 'http://www.auroravid.to/embed/?v=4rurhn9x446jj',
'only_matching': True,
}]

View File

@@ -406,7 +406,7 @@ class NRKTVSerieBaseIE(InfoExtractor):
def _extract_series(self, webpage, display_id, fatal=True):
config = self._parse_json(
self._search_regex(
(r'INITIAL_DATA_*\s*=\s*({.+?})\s*;',
(r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
display_id, fatal=False)

File diff suppressed because it is too large Load Diff

View File

@@ -86,12 +86,13 @@ class ORFTVthekIE(InfoExtractor):
if value:
format_id_list.append(value)
format_id = '-'.join(format_id_list)
if determine_ext(fd['src']) == 'm3u8':
ext = determine_ext(src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
fd['src'], video_id, 'mp4', m3u8_id=format_id))
elif determine_ext(fd['src']) == 'f4m':
src, video_id, 'mp4', m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
fd['src'], video_id, f4m_id=format_id))
src, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'format_id': format_id,

View File

@@ -5,26 +5,27 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_str,
# compat_str,
compat_HTTPError,
)
from ..utils import (
clean_html,
ExtractorError,
remove_end,
# remove_end,
str_or_none,
strip_or_none,
unified_timestamp,
urljoin,
# urljoin,
)
class PacktPubBaseIE(InfoExtractor):
_PACKT_BASE = 'https://www.packtpub.com'
_MAPT_REST = '%s/mapt-rest' % _PACKT_BASE
# _PACKT_BASE = 'https://www.packtpub.com'
_STATIC_PRODUCTS_BASE = 'https://static.packt-cdn.com/products/'
class PacktPubIE(PacktPubBaseIE):
_VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>[^/]+)/(?P<id>[^/]+)(?:/(?P<display_id>[^/?&#]+))?'
_TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
@@ -40,6 +41,9 @@ class PacktPubIE(PacktPubBaseIE):
}, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
'only_matching': True,
}, {
'url': 'https://subscription.packtpub.com/video/programming/9781838988906/p1/video1_1/business-card-project',
'only_matching': True,
}]
_NETRC_MACHINE = 'packtpub'
_TOKEN = None
@@ -50,9 +54,9 @@ class PacktPubIE(PacktPubBaseIE):
return
try:
self._TOKEN = self._download_json(
self._MAPT_REST + '/users/tokens', None,
'https://services.packtpub.com/auth-v1/users/tokens', None,
'Downloading Authorization Token', data=json.dumps({
'email': username,
'username': username,
'password': password,
}).encode())['data']['access']
except ExtractorError as e:
@@ -61,54 +65,40 @@ class PacktPubIE(PacktPubBaseIE):
raise ExtractorError(message, expected=True)
raise
def _handle_error(self, response):
if response.get('status') != 'success':
raise ExtractorError(
'% said: %s' % (self.IE_NAME, response['message']),
expected=True)
def _download_json(self, *args, **kwargs):
response = super(PacktPubIE, self)._download_json(*args, **kwargs)
self._handle_error(response)
return response
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_id, chapter_id, video_id = mobj.group(
'course_id', 'chapter_id', 'id')
course_id, chapter_id, video_id, display_id = re.match(self._VALID_URL, url).groups()
headers = {}
if self._TOKEN:
headers['Authorization'] = 'Bearer ' + self._TOKEN
video = self._download_json(
'%s/users/me/products/%s/chapters/%s/sections/%s'
% (self._MAPT_REST, course_id, chapter_id, video_id), video_id,
'Downloading JSON video', headers=headers)['data']
try:
video_url = self._download_json(
'https://services.packtpub.com/products-v1/products/%s/%s/%s' % (course_id, chapter_id, video_id), video_id,
'Downloading JSON video', headers=headers)['data']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
self.raise_login_required('This video is locked')
raise
content = video.get('content')
if not content:
self.raise_login_required('This video is locked')
# TODO: find a better way to avoid duplicating course requests
# metadata = self._download_json(
# '%s/products/%s/chapters/%s/sections/%s/metadata'
# % (self._MAPT_REST, course_id, chapter_id, video_id),
# video_id)['data']
video_url = content['file']
metadata = self._download_json(
'%s/products/%s/chapters/%s/sections/%s/metadata'
% (self._MAPT_REST, course_id, chapter_id, video_id),
video_id)['data']
title = metadata['pageTitle']
course_title = metadata.get('title')
if course_title:
title = remove_end(title, ' - %s' % course_title)
timestamp = unified_timestamp(metadata.get('publicationDate'))
thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
# title = metadata['pageTitle']
# course_title = metadata.get('title')
# if course_title:
# title = remove_end(title, ' - %s' % course_title)
# timestamp = unified_timestamp(metadata.get('publicationDate'))
# thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'timestamp': timestamp,
'title': display_id or video_id, # title,
# 'thumbnail': thumbnail,
# 'timestamp': timestamp,
}
@@ -119,6 +109,7 @@ class PacktPubCourseIE(PacktPubBaseIE):
'info_dict': {
'id': '9781787122215',
'title': 'Learn Nodejs by building 12 projects [Video]',
'description': 'md5:489da8d953f416e51927b60a1c7db0aa',
},
'playlist_count': 90,
}, {
@@ -136,35 +127,38 @@ class PacktPubCourseIE(PacktPubBaseIE):
url, course_id = mobj.group('url', 'id')
course = self._download_json(
'%s/products/%s/metadata' % (self._MAPT_REST, course_id),
course_id)['data']
self._STATIC_PRODUCTS_BASE + '%s/toc' % course_id, course_id)
metadata = self._download_json(
self._STATIC_PRODUCTS_BASE + '%s/summary' % course_id,
course_id, fatal=False) or {}
entries = []
for chapter_num, chapter in enumerate(course['tableOfContents'], 1):
if chapter.get('type') != 'chapter':
continue
children = chapter.get('children')
if not isinstance(children, list):
for chapter_num, chapter in enumerate(course['chapters'], 1):
chapter_id = str_or_none(chapter.get('id'))
sections = chapter.get('sections')
if not chapter_id or not isinstance(sections, list):
continue
chapter_info = {
'chapter': chapter.get('title'),
'chapter_number': chapter_num,
'chapter_id': chapter.get('id'),
'chapter_id': chapter_id,
}
for section in children:
if section.get('type') != 'section':
continue
section_url = section.get('seoUrl')
if not isinstance(section_url, compat_str):
for section in sections:
section_id = str_or_none(section.get('id'))
if not section_id or section.get('contentType') != 'video':
continue
entry = {
'_type': 'url_transparent',
'url': urljoin(url + '/', section_url),
'url': '/'.join([url, chapter_id, section_id]),
'title': strip_or_none(section.get('title')),
'description': clean_html(section.get('summary')),
'thumbnail': metadata.get('coverImage'),
'timestamp': unified_timestamp(metadata.get('publicationDate')),
'ie_key': PacktPubIE.ie_key(),
}
entry.update(chapter_info)
entries.append(entry)
return self.playlist_result(entries, course_id, course.get('title'))
return self.playlist_result(
entries, course_id, metadata.get('title'),
clean_html(metadata.get('about')))

View File

@@ -18,81 +18,385 @@ from ..utils import (
class PeerTubeIE(InfoExtractor):
_INSTANCES_RE = r'''(?:
# Taken from https://instances.joinpeertube.org/instances
peertube\.rainbowswingers\.net|
tube\.stanisic\.nl|
peer\.suiri\.us|
medias\.libox\.fr|
videomensoif\.ynh\.fr|
peertube\.travelpandas\.eu|
peertube\.rachetjay\.fr|
peertube\.montecsys\.fr|
tube\.eskuero\.me|
peer\.tube|
peertube\.umeahackerspace\.se|
tube\.nx-pod\.de|
video\.monsieurbidouille\.fr|
tube\.openalgeria\.org|
peertube\.pointsecu\.fr|
vid\.lelux\.fi|
video\.anormallostpod\.ovh|
tube\.crapaud-fou\.org|
peertube\.stemy\.me|
lostpod\.space|
exode\.me|
peertube\.snargol\.com|
vis\.ion\.ovh|
videosdulib\.re|
v\.mbius\.io|
videos\.judrey\.eu|
peertube\.osureplayviewer\.xyz|
peertube\.mathieufamily\.ovh|
www\.videos-libr\.es|
fightforinfo\.com|
peertube\.fediverse\.ru|
peertube\.oiseauroch\.fr|
video\.nesven\.eu|
v\.bearvideo\.win|
video\.qoto\.org|
justporn\.cc|
video\.vny\.fr|
peervideo\.club|
tube\.taker\.fr|
peertube\.chantierlibre\.org|
tube\.ipfixe\.info|
tube\.kicou\.info|
tube\.dodsorf\.as|
videobit\.cc|
video\.yukari\.moe|
videos\.elbinario\.net|
hkvideo\.live|
pt\.tux\.tf|
www\.hkvideo\.live|
FIGHTFORINFO\.com|
pt\.765racing\.com|
peertube\.gnumeria\.eu\.org|
nordenmedia\.com|
peertube\.co\.uk|
tube\.darfweb\.eu|
tube\.kalah-france\.org|
0ch\.in|
vod\.mochi\.academy|
film\.node9\.org|
peertube\.hatthieves\.es|
video\.fitchfamily\.org|
peertube\.ddns\.net|
video\.ifuncle\.kr|
video\.fdlibre\.eu|
tube\.22decembre\.eu|
peertube\.harmoniescreatives\.com|
tube\.fabrigli\.fr|
video\.thedwyers\.co|
video\.bruitbruit\.com|
peertube\.foxfam\.club|
peer\.philoxweb\.be|
videos\.bugs\.social|
peertube\.malbert\.xyz|
peertube\.bilange\.ca|
libretube\.net|
diytelevision\.com|
peertube\.fedilab\.app|
libre\.video|
video\.mstddntfdn\.online|
us\.tv|
peertube\.sl-network\.fr|
peertube\.dynlinux\.io|
peertube\.david\.durieux\.family|
peertube\.linuxrocks\.online|
peerwatch\.xyz|
v\.kretschmann\.social|
tube\.otter\.sh|
yt\.is\.nota\.live|
tube\.dragonpsi\.xyz|
peertube\.boneheadmedia\.com|
videos\.funkwhale\.audio|
watch\.44con\.com|
peertube\.gcaillaut\.fr|
peertube\.icu|
pony\.tube|
spacepub\.space|
tube\.stbr\.io|
v\.mom-gay\.faith|
tube\.port0\.xyz|
peertube\.simounet\.net|
play\.jergefelt\.se|
peertube\.zeteo\.me|
tube\.danq\.me|
peertube\.kerenon\.com|
tube\.fab-l3\.org|
tube\.calculate\.social|
peertube\.mckillop\.org|
tube\.netzspielplatz\.de|
vod\.ksite\.de|
peertube\.laas\.fr|
tube\.govital\.net|
peertube\.stephenson\.cc|
bistule\.nohost\.me|
peertube\.kajalinifi\.de|
video\.ploud\.jp|
video\.omniatv\.com|
peertube\.ffs2play\.fr|
peertube\.leboulaire\.ovh|
peertube\.tronic-studio\.com|
peertube\.public\.cat|
peertube\.metalbanana\.net|
video\.1000i100\.fr|
peertube\.alter-nativ-voll\.de|
tube\.pasa\.tf|
tube\.worldofhauru\.xyz|
pt\.kamp\.site|
peertube\.teleassist\.fr|
videos\.mleduc\.xyz|
conf\.tube|
media\.privacyinternational\.org|
pt\.forty-two\.nl|
video\.halle-leaks\.de|
video\.grosskopfgames\.de|
peertube\.schaeferit\.de|
peertube\.jackbot\.fr|
tube\.extinctionrebellion\.fr|
peertube\.f-si\.org|
video\.subak\.ovh|
videos\.koweb\.fr|
peertube\.zergy\.net|
peertube\.roflcopter\.fr|
peertube\.floss-marketing-school\.com|
vloggers\.social|
peertube\.iriseden\.eu|
videos\.ubuntu-paris\.org|
peertube\.mastodon\.host|
armstube\.com|
peertube\.s2s\.video|
peertube\.lol|
tube\.open-plug\.eu|
open\.tube|
peertube\.ch|
peertube\.normandie-libre\.fr|
peertube\.slat\.org|
video\.lacaveatonton\.ovh|
peertube\.uno|
peertube\.servebeer\.com|
peertube\.fedi\.quebec|
tube\.h3z\.jp|
tube\.plus200\.com|
peertube\.eric\.ovh|
tube\.metadocs\.cc|
tube\.unmondemeilleur\.eu|
gouttedeau\.space|
video\.antirep\.net|
nrop\.cant\.at|
tube\.ksl-bmx\.de|
tube\.plaf\.fr|
tube\.tchncs\.de|
video\.devinberg\.com|
hitchtube\.fr|
peertube\.kosebamse\.com|
yunopeertube\.myddns\.me|
peertube\.varney\.fr|
peertube\.anon-kenkai\.com|
tube\.maiti\.info|
tubee\.fr|
videos\.dinofly\.com|
toobnix\.org|
videotape\.me|
voca\.tube|
video\.heromuster\.com|
video\.lemediatv\.fr|
video\.up\.edu\.ph|
balafon\.video|
video\.ivel\.fr|
thickrips\.cloud|
pt\.laurentkruger\.fr|
video\.monarch-pass\.net|
peertube\.artica\.center|
video\.alternanet\.fr|
indymotion\.fr|
fanvid\.stopthatimp\.net|
video\.farci\.org|
v\.lesterpig\.com|
video\.okaris\.de|
tube\.pawelko\.net|
peertube\.mablr\.org|
tube\.fede\.re|
pytu\.be|
evertron\.tv|
devtube\.dev-wiki\.de|
raptube\.antipub\.org|
video\.selea\.se|
peertube\.mygaia\.org|
video\.oh14\.de|
peertube\.livingutopia\.org|
peertube\.the-penguin\.de|
tube\.thechangebook\.org|
tube\.anjara\.eu|
pt\.pube\.tk|
video\.samedi\.pm|
mplayer\.demouliere\.eu|
widemus\.de|
peertube\.me|
peertube\.zapashcanon\.fr|
video\.latavernedejohnjohn\.fr|
peertube\.pcservice46\.fr|
peertube\.mazzonetto\.eu|
video\.irem\.univ-paris-diderot\.fr|
video\.livecchi\.cloud|
alttube\.fr|
video\.coop\.tools|
video\.cabane-libre\.org|
peertube\.openstreetmap\.fr|
videos\.alolise\.org|
irrsinn\.video|
video\.antopie\.org|
scitech\.video|
tube2\.nemsia\.org|
video\.amic37\.fr|
peertube\.freeforge\.eu|
video\.arbitrarion\.com|
video\.datsemultimedia\.com|
stoptrackingus\.tv|
peertube\.ricostrongxxx\.com|
docker\.videos\.lecygnenoir\.info|
peertube\.togart\.de|
tube\.postblue\.info|
videos\.domainepublic\.net|
peertube\.cyber-tribal\.com|
video\.gresille\.org|
peertube\.dsmouse\.net|
cinema\.yunohost\.support|
tube\.theocevaer\.fr|
repro\.video|
tube\.4aem\.com|
quaziinc\.com|
peertube\.metawurst\.space|
videos\.wakapo\.com|
video\.ploud\.fr|
video\.freeradical\.zone|
tube\.valinor\.fr|
refuznik\.video|
pt\.kircheneuenburg\.de|
peertube\.asrun\.eu|
peertube\.lagob\.fr|
videos\.side-ways\.net|
91video\.online|
video\.valme\.io|
video\.taboulisme\.com|
videos-libr\.es|
tv\.mooh\.fr|
nuage\.acostey\.fr|
video\.monsieur-a\.fr|
peertube\.librelois\.fr|
videos\.pair2jeux\.tube|
videos\.pueseso\.club|
peer\.mathdacloud\.ovh|
media\.assassinate-you\.net|
vidcommons\.org|
ptube\.rousset\.nom\.fr|
tube\.cyano\.at|
videos\.squat\.net|
video\.iphodase\.fr|
peertube\.makotoworkshop\.org|
peertube\.serveur\.slv-valbonne\.fr|
vault\.mle\.party|
hostyour\.tv|
videos\.hack2g2\.fr|
libre\.tube|
pire\.artisanlogiciel\.net|
videos\.numerique-en-commun\.fr|
video\.netsyms\.com|
video\.die-partei\.social|
video\.writeas\.org|
peertube\.swarm\.solvingmaz\.es|
tube\.pericoloso\.ovh|
watching\.cypherpunk\.observer|
videos\.adhocmusic\.com|
tube\.rfc1149\.net|
peertube\.librelabucm\.org|
videos\.numericoop\.fr|
peertube\.koehn\.com|
peertube\.anarchmusicall\.net|
tube\.kampftoast\.de|
vid\.y-y\.li|
peertube\.xtenz\.xyz|
diode\.zone|
tube\.egf\.mn|
peertube\.nomagic\.uk|
visionon\.tv|
videos\.koumoul\.com|
video\.rastapuls\.com|
video\.mantlepro\.com|
video\.deadsuperhero\.com|
peertube\.musicstudio\.pro|
peertube\.we-keys\.fr|
artitube\.artifaille\.fr|
peertube\.ethernia\.net|
tube\.midov\.pl|
peertube\.fr|
watch\.snoot\.tube|
peertube\.donnadieu\.fr|
argos\.aquilenet\.fr|
tube\.nemsia\.org|
tube\.bruniau\.net|
videos\.darckoune\.moe|
tube\.traydent\.info|
dev\.videos\.lecygnenoir\.info|
peertube\.nayya\.org|
peertube\.live|
peertube\.mofgao\.space|
video\.lequerrec\.eu|
peertube\.amicale\.net|
aperi\.tube|
tube\.ac-lyon\.fr|
video\.lw1\.at|
www\.yiny\.org|
videos\.pofilo\.fr|
tube\.lou\.lt|
choob\.h\.etbus\.ch|
tube\.hoga\.fr|
peertube\.heberge\.fr|
video\.obermui\.de|
videos\.cloudfrancois\.fr|
betamax\.video|
video\.typica\.us|
tube\.piweb\.be|
video\.blender\.org|
peertube\.cat|
tube\.kdy\.ch|
pe\.ertu\.be|
peertube\.social|
videos\.lescommuns\.org|
tv\.datamol\.org|
videonaute\.fr|
dialup\.express|
peertube\.nogafa\.org|
peertube\.pl|
megatube\.lilomoino\.fr|
peertube\.tamanoir\.foucry\.net|
peertube\.inapurna\.org|
peertube\.netzspielplatz\.de|
video\.deadsuperhero\.com|
peertube\.devosi\.org|
peertube\.1312\.media|
tube\.worldofhauru\.xyz|
tube\.bootlicker\.party|
skeptikon\.fr|
peertube\.geekshell\.fr|
tube\.opportunis\.me|
peertube\.peshane\.net|
video\.blueline\.mg|
tube\.homecomputing\.fr|
videos\.cloudfrancois\.fr|
peertube\.viviers-fibre\.net|
tube\.ouahpiti\.info|
video\.tedomum\.net|
video\.g3l\.org|
fontube\.fr|
peertube\.gaialabs\.ch|
peertube\.extremely\.online|
peertube\.public-infrastructure\.eu|
tube\.kher\.nl|
peertube\.qtg\.fr|
tube\.22decembre\.eu|
facegirl\.me|
video\.migennes\.net|
janny\.moe|
tube\.p2p\.legal|
video\.atlanti\.se|
troll\.tv|
peertube\.geekael\.fr|
vid\.leotindall\.com|
video\.anormallostpod\.ovh|
p-tube\.h3z\.jp|
tube\.darfweb\.eu|
videos\.iut-orsay\.fr|
peertube\.solidev\.net|
videos\.symphonie-of-code\.fr|
testtube\.ortg\.de|
videos\.cemea\.org|
peertube\.gwendalavir\.eu|
video\.passageenseine\.fr|
videos\.festivalparminous\.org|
peertube\.touhoppai\.moe|
peertube\.duckdns\.org|
sikke\.fi|
peertube\.mastodon\.host|
firedragonvideos\.com|
vidz\.dou\.bet|
peertube\.koehn\.com|
peer\.hostux\.social|
share\.tube|
peertube\.walkingmountains\.fr|
medias\.libox\.fr|
peertube\.moe|
peertube\.xyz|
jp\.peertube\.network|
videos\.benpro\.fr|
tube\.otter\.sh|
peertube\.angristan\.xyz|
peertube\.parleur\.net|
peer\.ecutsa\.fr|
peertube\.heraut\.eu|
peertube\.tifox\.fr|
peertube\.maly\.io|
vod\.mochi\.academy|
exode\.me|
coste\.video|
tube\.aquilenet\.fr|
peertube\.gegeweb\.eu|
framatube\.org|
@@ -100,18 +404,11 @@ class PeerTubeIE(InfoExtractor):
tube\.conferences-gesticulees\.net|
peertube\.datagueule\.tv|
video\.lqdn\.fr|
meilleurtube\.delire\.party|
tube\.mochi\.academy|
peertube\.dav\.li|
media\.zat\.im|
pytu\.be|
peertube\.valvin\.fr|
peertube\.nsa\.ovh|
video\.colibris-outilslibres\.org|
video\.hispagatos\.org|
tube\.svnet\.fr|
peertube\.video|
videos\.lecygnenoir\.info|
peertube3\.cpy\.re|
peertube2\.cpy\.re|
videos\.tcit\.fr|
@@ -126,7 +423,7 @@ class PeerTubeIE(InfoExtractor):
(?P<id>%s)
''' % (_INSTANCES_RE, _UUID_RE)
_TESTS = [{
'url': 'https://peertube.moe/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
'url': 'https://peertube.cpy.re/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
'md5': '80f24ff364cc9d333529506a263e7feb',
'info_dict': {
'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c',
@@ -168,7 +465,7 @@ class PeerTubeIE(InfoExtractor):
@staticmethod
def _extract_peertube_url(webpage, source_url):
mobj = re.match(
r'https?://(?P<host>[^/]+)/videos/watch/(?P<id>%s)'
r'https?://(?P<host>[^/]+)/videos/(?:watch|embed)/(?P<id>%s)'
% PeerTubeIE._UUID_RE, source_url)
if mobj and any(p in webpage for p in (
'<title>PeerTube<',

View File

@@ -14,7 +14,7 @@ class PhilharmonieDeParisIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)|
live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|embed(?:app)?/|misc/Playlist\.ashx\?id=)|
pad\.philharmoniedeparis\.fr/doc/CIMU/
)
(?P<id>\d+)
@@ -40,6 +40,12 @@ class PhilharmonieDeParisIE(InfoExtractor):
}, {
'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
'only_matching': True,
}, {
'url': 'https://live.philharmoniedeparis.fr/embedapp/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
'only_matching': True,
}, {
'url': 'https://live.philharmoniedeparis.fr/embed/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
'only_matching': True,
}]
_LIVE_URL = 'https://live.philharmoniedeparis.fr'

View File

@@ -15,18 +15,17 @@ from ..utils import (
class PikselIE(InfoExtractor):
_VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
_VALID_URL = r'https?://player\.piksel\.com/v/(?:refid/[^/]+/prefid/)?(?P<id>[a-z0-9_]+)'
_TESTS = [
{
'url': 'http://player.piksel.com/v/nv60p12f',
'md5': 'd9c17bbe9c3386344f9cfd32fad8d235',
'url': 'http://player.piksel.com/v/ums2867l',
'md5': '34e34c8d89dc2559976a6079db531e85',
'info_dict': {
'id': 'nv60p12f',
'id': 'ums2867l',
'ext': 'mp4',
'title': 'فن الحياة - الحلقة 1',
'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور',
'timestamp': 1465231790,
'upload_date': '20160606',
'title': 'GX-005 with Caption',
'timestamp': 1481335659,
'upload_date': '20161210'
}
},
{
@@ -39,8 +38,13 @@ class PikselIE(InfoExtractor):
'title': 'WAW- State of Washington vs. Donald J. Trump, et al',
'description': 'State of Washington vs. Donald J. Trump, et al, Case Number 17-CV-00141-JLR, TRO Hearing, Civil Rights Case, 02/3/2017, 1:00 PM (PST), Seattle Federal Courthouse, Seattle, WA, Judge James L. Robart presiding.',
'timestamp': 1486171129,
'upload_date': '20170204',
'upload_date': '20170204'
}
},
{
# https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2019240/
'url': 'http://player.piksel.com/v/refid/nhkworld/prefid/nw_vod_v_en_2019_240_20190823233000_02_1566873477',
'only_matching': True,
}
]
@@ -53,8 +57,11 @@ class PikselIE(InfoExtractor):
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-de-program-uuid=[\'"]([a-z0-9]+)',
webpage, 'program uuid', default=display_id)
app_token = self._search_regex([
r'clientAPI\s*:\s*"([^"]+)"',
r'data-de-api-key\s*=\s*"([^"]+)"'
@@ -113,6 +120,13 @@ class PikselIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
for caption in video_data.get('captions', []):
caption_url = caption.get('url')
if caption_url:
subtitles.setdefault(caption.get('locale', 'en'), []).append({
'url': caption_url})
return {
'id': video_id,
'title': title,
@@ -120,4 +134,5 @@ class PikselIE(InfoExtractor):
'thumbnail': video_data.get('thumbnailUrl'),
'timestamp': parse_iso8601(video_data.get('dateadd')),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -18,43 +18,10 @@ from ..utils import (
)
class PlatziIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
platzi\.com/clases| # es version
courses\.platzi\.com/classes # en version
)/[^/]+/(?P<id>\d+)-[^/?\#&]+
'''
class PlatziBaseIE(InfoExtractor):
_LOGIN_URL = 'https://platzi.com/login/'
_NETRC_MACHINE = 'platzi'
_TESTS = [{
'url': 'https://platzi.com/clases/1311-next-js/12074-creando-nuestra-primera-pagina/',
'md5': '8f56448241005b561c10f11a595b37e3',
'info_dict': {
'id': '12074',
'ext': 'mp4',
'title': 'Creando nuestra primera página',
'description': 'md5:4c866e45034fc76412fbf6e60ae008bc',
'duration': 420,
},
'skip': 'Requires platzi account credentials',
}, {
'url': 'https://courses.platzi.com/classes/1367-communication-codestream/13430-background/',
'info_dict': {
'id': '13430',
'ext': 'mp4',
'title': 'Background',
'description': 'md5:49c83c09404b15e6e71defaf87f6b305',
'duration': 360,
},
'skip': 'Requires platzi account credentials',
'params': {
'skip_download': True,
},
}]
def _real_initialize(self):
self._login()
@@ -97,6 +64,42 @@ class PlatziIE(InfoExtractor):
'Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
class PlatziIE(PlatziBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
platzi\.com/clases| # es version
courses\.platzi\.com/classes # en version
)/[^/]+/(?P<id>\d+)-[^/?\#&]+
'''
_TESTS = [{
'url': 'https://platzi.com/clases/1311-next-js/12074-creando-nuestra-primera-pagina/',
'md5': '8f56448241005b561c10f11a595b37e3',
'info_dict': {
'id': '12074',
'ext': 'mp4',
'title': 'Creando nuestra primera página',
'description': 'md5:4c866e45034fc76412fbf6e60ae008bc',
'duration': 420,
},
'skip': 'Requires platzi account credentials',
}, {
'url': 'https://courses.platzi.com/classes/1367-communication-codestream/13430-background/',
'info_dict': {
'id': '13430',
'ext': 'mp4',
'title': 'Background',
'description': 'md5:49c83c09404b15e6e71defaf87f6b305',
'duration': 360,
},
'skip': 'Requires platzi account credentials',
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
lecture_id = self._match_id(url)
@@ -104,7 +107,11 @@ class PlatziIE(InfoExtractor):
data = self._parse_json(
self._search_regex(
r'client_data\s*=\s*({.+?})\s*;', webpage, 'client data'),
# client_data may contain "};" so that we have to try more
# strict regex first
(r'client_data\s*=\s*({.+?})\s*;\s*\n',
r'client_data\s*=\s*({.+?})\s*;'),
webpage, 'client data'),
lecture_id)
material = data['initialState']['material']
@@ -146,7 +153,7 @@ class PlatziIE(InfoExtractor):
}
class PlatziCourseIE(InfoExtractor):
class PlatziCourseIE(PlatziBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:

View File

@@ -39,7 +39,12 @@ class Porn91IE(InfoExtractor):
r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title')
title = title.replace('\n', '')
info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
video_link_url = self._search_regex(
r'<textarea[^>]+id=["\']fm-video_link[^>]+>([^<]+)</textarea>',
webpage, 'video link')
videopage = self._download_webpage(video_link_url, video_id)
info_dict = self._parse_html5_media_entries(url, videopage, video_id)[0]
duration = parse_duration(self._search_regex(
r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False))

View File

@@ -1,101 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
)
from ..utils import (
int_or_none,
try_get,
unified_timestamp,
)
class PornFlipIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:v|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.pornflip.com/v/wz7DfNhMmep',
'md5': '98c46639849145ae1fd77af532a9278c',
'info_dict': {
'id': 'wz7DfNhMmep',
'ext': 'mp4',
'title': '2 Amateurs swallow make his dream cumshots true',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 112,
'timestamp': 1481655502,
'upload_date': '20161213',
'uploader_id': '106786',
'uploader': 'figifoto',
'view_count': int,
'age_limit': 18,
}
}, {
'url': 'https://www.pornflip.com/embed/wz7DfNhMmep',
'only_matching': True,
}, {
'url': 'https://www.pornflip.com/v/EkRD6-vS2-s',
'only_matching': True,
}, {
'url': 'https://www.pornflip.com/embed/EkRD6-vS2-s',
'only_matching': True,
}, {
'url': 'https://www.pornflip.com/v/NG9q6Pb_iK8',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.pornflip.com/v/%s' % video_id, video_id)
flashvars = compat_parse_qs(self._search_regex(
r'<embed[^>]+flashvars=(["\'])(?P<flashvars>(?:(?!\1).)+)\1',
webpage, 'flashvars', group='flashvars'))
title = flashvars['video_vars[title]'][0]
def flashvar(kind):
return try_get(
flashvars, lambda x: x['video_vars[%s]' % kind][0], compat_str)
formats = []
for key, value in flashvars.items():
if not (value and isinstance(value, list)):
continue
format_url = value[0]
if key == 'video_vars[hds_manifest]':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
continue
height = self._search_regex(
r'video_vars\[video_urls\]\[(\d+)', key, 'height', default=None)
if not height:
continue
formats.append({
'url': format_url,
'format_id': 'http-%s' % height,
'height': int_or_none(height),
})
self._sort_formats(formats)
uploader = self._html_search_regex(
(r'<span[^>]+class="name"[^>]*>\s*<a[^>]+>\s*<strong>(?P<uploader>[^<]+)',
r'<meta[^>]+content=(["\'])[^>]*\buploaded by (?P<uploader>.+?)\1'),
webpage, 'uploader', fatal=False, group='uploader')
return {
'id': video_id,
'formats': formats,
'title': title,
'thumbnail': flashvar('big_thumb'),
'duration': int_or_none(flashvar('duration')),
'timestamp': unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp')),
'uploader_id': flashvar('author_id'),
'uploader': uploader,
'view_count': int_or_none(flashvar('views')),
'age_limit': 18,
}

View File

@@ -170,7 +170,7 @@ class PornHubIE(PornHubBaseIE):
def dl_webpage(platform):
self._set_cookie(host, 'platform', platform)
return self._download_webpage(
'http://www.%s/view_video.php?viewkey=%s' % (host, video_id),
'https://www.%s/view_video.php?viewkey=%s' % (host, video_id),
video_id, 'Downloading %s webpage' % platform)
webpage = dl_webpage('pc')
@@ -372,37 +372,99 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
entries, playlist_id, title, playlist.get('description'))
class PornHubPlaylistIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/playlist/(?P<id>\d+)'
class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?pornhub\.(?:com|net)/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{
'url': 'http://www.pornhub.com/playlist/4667351',
'info_dict': {
'id': '4667351',
'title': 'Nataly Hot',
},
'playlist_mincount': 2,
'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118,
}, {
'url': 'https://de.pornhub.com/playlist/4667351',
'url': 'https://www.pornhub.com/pornstar/liz-vicious',
'info_dict': {
'id': 'liz-vicious',
},
'playlist_mincount': 118,
}, {
'url': 'https://www.pornhub.com/users/russianveet69',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/channels/povd',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/model/zoe_ph?abc=1',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('id')
return self.url_result(
'%s/videos' % mobj.group('url'), ie=PornHubPagedVideoListIE.ie_key(),
video_id=user_id)
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos'
class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
@staticmethod
def _has_more(webpage):
return re.search(
r'''(?x)
<li[^>]+\bclass=["\']page_next|
<link[^>]+\brel=["\']next|
<button[^>]+\bid=["\']moreDataBtn
''', webpage) is not None
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
item_id = mobj.group('id')
page = int_or_none(self._search_regex(
r'\bpage=(\d+)', url, 'page', default=None))
entries = []
for page_num in (page, ) if page is not None else itertools.count(1):
try:
webpage = self._download_webpage(
url, item_id, 'Downloading page %d' % page_num,
query={'page': page_num})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break
raise
page_entries = self._extract_entries(webpage, host)
if not page_entries:
break
entries.extend(page_entries)
if not self._has_more(webpage):
break
return self.playlist_result(orderedSet(entries), item_id)
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': {
'id': 'zoe_ph',
},
'playlist_mincount': 171,
'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True,
}, {
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos',
'info_dict': {
'id': 'pornstar/jenny-blighe/videos',
},
'playlist_mincount': 149,
}, {
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos?page=3',
'info_dict': {
'id': 'pornstar/jenny-blighe/videos',
},
'playlist_mincount': 40,
}, {
# default sorting as Top Rated Videos
'url': 'https://www.pornhub.com/channels/povd/videos',
'info_dict': {
'id': 'povd',
'id': 'channels/povd/videos',
},
'playlist_mincount': 293,
}, {
@@ -421,31 +483,87 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/model/jayndrea/videos/upload',
# Most Viewed Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=mv',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
# Top Rated Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=tr',
'only_matching': True,
}, {
# Longest Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=lg',
'only_matching': True,
}, {
# Newest Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=cm',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos/paid',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos/fanonly',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video?page=3',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video/search?search=123',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/categories/teen',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/categories/teen?page=3',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/hd',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/hd?page=3',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/described-video',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/described-video?page=2',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video/incategories/60fps-1/hd-porn',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/playlist/44121572',
'info_dict': {
'id': 'playlist/44121572',
},
'playlist_mincount': 132,
}, {
'url': 'https://www.pornhub.com/playlist/4667351',
'only_matching': True,
}, {
'url': 'https://de.pornhub.com/playlist/4667351',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
user_id = mobj.group('id')
@classmethod
def suitable(cls, url):
return (False
if PornHubIE.suitable(url) or PornHubUserIE.suitable(url) or PornHubUserVideosUploadIE.suitable(url)
else super(PornHubPagedVideoListIE, cls).suitable(url))
entries = []
for page_num in itertools.count(1):
try:
webpage = self._download_webpage(
url, user_id, 'Downloading page %d' % page_num,
query={'page': page_num})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break
raise
page_entries = self._extract_entries(webpage, host)
if not page_entries:
break
entries.extend(page_entries)
return self.playlist_result(entries, user_id)
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
_TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': {
'id': 'jenny-blighe',
},
'playlist_mincount': 129,
}, {
'url': 'https://www.pornhub.com/model/zoe_ph/videos/upload',
'only_matching': True,
}]

View File

@@ -1,70 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
urlencode_postdata,
)
class PromptFileIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?promptfile\.com/l/(?P<id>[0-9A-Z\-]+)'
_TEST = {
'url': 'http://www.promptfile.com/l/86D1CE8462-576CAAE416',
'md5': '5a7e285a26e0d66d9a263fae91bc92ce',
'info_dict': {
'id': '86D1CE8462-576CAAE416',
'ext': 'mp4',
'title': 'oceans.mp4',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if re.search(r'<div.+id="not_found_msg".+>(?!We are).+</div>[^-]', webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
chash = self._search_regex(
r'val\("([^"]*)"\s*\+\s*\$\("#chash"\)', webpage, 'chash')
fields = self._hidden_inputs(webpage)
keys = list(fields.keys())
chash_key = keys[0] if len(keys) == 1 else next(
key for key in keys if key.startswith('cha'))
fields[chash_key] = chash + fields[chash_key]
webpage = self._download_webpage(
url, video_id, 'Downloading video page',
data=urlencode_postdata(fields),
headers={'Content-type': 'application/x-www-form-urlencoded'})
video_url = self._search_regex(
(r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>\s*Download File',
r'<a[^>]+href=(["\'])(?P<url>https?://(?:www\.)?promptfile\.com/file/(?:(?!\1).)+)\1'),
webpage, 'video url', group='url')
title = self._html_search_regex(
r'<span.+title="([^"]+)">', webpage, 'title')
thumbnail = self._html_search_regex(
r'<div id="player_overlay">.*button>.*?<img src="([^"]+)"',
webpage, 'thumbnail', fatal=False, flags=re.DOTALL)
formats = [{
'format_id': 'sd',
'url': video_url,
'ext': determine_ext(title),
}]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -16,6 +16,11 @@ from ..utils import (
class ProSiebenSat1BaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
_ACCESS_ID = None
_SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
_V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
def _extract_video_info(self, url, clip_id):
client_location = url
@@ -31,93 +36,128 @@ class ProSiebenSat1BaseIE(InfoExtractor):
if video.get('is_protected') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
duration = float_or_none(video.get('duration'))
source_ids = [compat_str(source['id']) for source in video['sources']]
client_id = self._SALT[:2] + sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
sources = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
clip_id, 'Downloading sources JSON', query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
})
server_id = sources['server_id']
def fix_bitrate(bitrate):
bitrate = int_or_none(bitrate)
if not bitrate:
return None
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
formats = []
for source_id in source_ids:
client_id = self._SALT[:2] + sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
urls = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
clip_id, 'Downloading urls JSON', fatal=False, query={
if self._ACCESS_ID:
raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
server_token = (self._download_json(
self._V4_BASE_URL + 'protocols', clip_id,
'Downloading protocols JSON',
headers=self.geo_verification_headers(), query={
'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct).encode()).hexdigest(),
'video_id': clip_id,
}, fatal=False) or {}).get('server_token')
if server_token:
urls = (self._download_json(
self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct + server_token + self._SUPPORTED_PROTOCOLS).encode()).hexdigest(),
'protocols': self._SUPPORTED_PROTOCOLS,
'server_token': server_token,
'video_id': clip_id,
}, fatal=False) or {}).get('urls') or {}
for protocol, variant in urls.items():
source_url = variant.get('clear', {}).get('url')
if not source_url:
continue
if protocol == 'dash':
formats.extend(self._extract_mpd_formats(
source_url, clip_id, mpd_id=protocol, fatal=False))
elif protocol == 'hls':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id=protocol, fatal=False))
else:
formats.append({
'url': source_url,
'format_id': protocol,
})
if not formats:
source_ids = [compat_str(source['id']) for source in video['sources']]
client_id = self._SALT[:2] + sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
sources = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
clip_id, 'Downloading sources JSON', query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
'server_id': server_id,
'source_ids': source_id,
})
if not urls:
continue
if urls.get('status_code') != 0:
raise ExtractorError('This video is unavailable', expected=True)
urls_sources = urls['sources']
if isinstance(urls_sources, dict):
urls_sources = urls_sources.values()
for source in urls_sources:
source_url = source.get('url')
if not source_url:
server_id = sources['server_id']
def fix_bitrate(bitrate):
bitrate = int_or_none(bitrate)
if not bitrate:
return None
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
for source_id in source_ids:
client_id = self._SALT[:2] + sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
urls = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
clip_id, 'Downloading urls JSON', fatal=False, query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
'server_id': server_id,
'source_ids': source_id,
})
if not urls:
continue
protocol = source.get('protocol')
mimetype = source.get('mimetype')
if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(
source_url, clip_id, f4m_id='hds', fatal=False))
elif mimetype == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif mimetype == 'application/dash+xml':
formats.extend(self._extract_mpd_formats(
source_url, clip_id, mpd_id='dash', fatal=False))
else:
tbr = fix_bitrate(source['bitrate'])
if protocol in ('rtmp', 'rtmpe'):
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj:
continue
path = mobj.group('path')
mp4colon_index = path.rfind('mp4:')
app = path[:mp4colon_index]
play_path = path[mp4colon_index:]
formats.append({
'url': '%s/%s' % (mobj.group('url'), app),
'app': app,
'play_path': play_path,
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
'page_url': 'http://www.prosieben.de',
'tbr': tbr,
'ext': 'flv',
'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
})
if urls.get('status_code') != 0:
raise ExtractorError('This video is unavailable', expected=True)
urls_sources = urls['sources']
if isinstance(urls_sources, dict):
urls_sources = urls_sources.values()
for source in urls_sources:
source_url = source.get('url')
if not source_url:
continue
protocol = source.get('protocol')
mimetype = source.get('mimetype')
if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(
source_url, clip_id, f4m_id='hds', fatal=False))
elif mimetype == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif mimetype == 'application/dash+xml':
formats.extend(self._extract_mpd_formats(
source_url, clip_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': source_url,
'tbr': tbr,
'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
})
tbr = fix_bitrate(source['bitrate'])
if protocol in ('rtmp', 'rtmpe'):
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj:
continue
path = mobj.group('path')
mp4colon_index = path.rfind('mp4:')
app = path[:mp4colon_index]
play_path = path[mp4colon_index:]
formats.append({
'url': '%s/%s' % (mobj.group('url'), app),
'app': app,
'play_path': play_path,
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
'page_url': 'http://www.prosieben.de',
'tbr': tbr,
'ext': 'flv',
'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
})
else:
formats.append({
'url': source_url,
'tbr': tbr,
'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
})
self._sort_formats(formats)
return {
'duration': duration,
'duration': float_or_none(video.get('duration')),
'formats': formats,
}
@@ -344,6 +384,11 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
_TOKEN = 'prosieben'
_SALT = '01!8d8F_)r9]4s[qeuXfP%'
_CLIENT_NAME = 'kolibri-2.0.19-splec4'
_ACCESS_ID = 'x_prosiebenmaxx-de'
_ENCRYPTION_KEY = 'Eeyeey9oquahthainoofashoyoikosag'
_IV = 'Aeluchoc6aevechuipiexeeboowedaok'
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',

View File

@@ -104,3 +104,25 @@ class RedBullTVIE(InfoExtractor):
'formats': formats,
'subtitles': subtitles,
}
class RedBullTVRrnContentIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)/(?:video|live)/rrn:content:[^:]+:(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:live-videos:e3e6feb4-e95f-50b7-962a-c70f8fd13c73/mens-dh-finals-fort-william',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:videos:a36a0f36-ff1b-5db8-a69d-ee11a14bf48b/tn-ts-style?playlist=rrn:content:event-profiles:83f05926-5de8-5389-b5e4-9bb312d715e8:extras',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_url = self._og_search_url(webpage)
return self.url_result(
video_url, ie=RedBullTVIE.ie_key(),
video_id=RedBullTVIE._match_id(video_url))

View File

@@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
merge_dicts,
str_to_int,
unified_strdate,
url_or_none,
@@ -45,11 +46,14 @@ class RedTubeIE(InfoExtractor):
if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
raise ExtractorError('Video %s has been removed' % video_id, expected=True)
title = self._html_search_regex(
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
webpage, 'title', group='title',
default=None) or self._og_search_title(webpage)
info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'):
info['title'] = self._html_search_regex(
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
webpage, 'title', group='title',
default=None) or self._og_search_title(webpage)
formats = []
sources = self._parse_json(
@@ -88,28 +92,28 @@ class RedTubeIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._search_regex(
r'<span[^>]+>ADDED ([^<]+)<',
webpage, 'upload date', fatal=False))
r'<span[^>]+>(?:ADDED|Published on) ([^<]+)<',
webpage, 'upload date', default=None))
duration = int_or_none(self._og_search_property(
'video:duration', webpage, default=None) or self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', default=None))
view_count = str_to_int(self._search_regex(
(r'<div[^>]*>Views</div>\s*<div[^>]*>\s*([\d,.]+)',
r'<span[^>]*>VIEWS</span>\s*</td>\s*<td>\s*([\d,.]+)'),
webpage, 'view count', fatal=False))
r'<span[^>]*>VIEWS</span>\s*</td>\s*<td>\s*([\d,.]+)',
r'<span[^>]+\bclass=["\']video_view_count[^>]*>\s*([\d,.]+)'),
webpage, 'view count', default=None))
# No self-labeling, but they describe themselves as
# "Home of Videos Porno"
age_limit = 18
return {
return merge_dicts(info, {
'id': video_id,
'ext': 'mp4',
'title': title,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'age_limit': age_limit,
'formats': formats,
}
})

View File

@@ -4,32 +4,34 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
strip_or_none,
unescapeHTML,
str_or_none,
urlencode_postdata,
)
class RoosterTeethIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/episode/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/(?:episode|watch)/(?P<id>[^/?#&]+)'
_LOGIN_URL = 'https://roosterteeth.com/login'
_NETRC_MACHINE = 'roosterteeth'
_TESTS = [{
'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'md5': 'e2bd7764732d785ef797700a2489f212',
'info_dict': {
'id': '26576',
'id': '9156',
'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'ext': 'mp4',
'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
'title': 'Million Dollars, But... The Game Announcement',
'description': 'md5:168a54b40e228e79f4ddb141e89fe4f5',
'thumbnail': r're:^https?://.*\.png$',
'series': 'Million Dollars, But...',
'episode': 'Million Dollars, But... The Game Announcement',
'comment_count': int,
},
}, {
'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31',
@@ -47,6 +49,9 @@ class RoosterTeethIE(InfoExtractor):
# only available for FIRST members
'url': 'http://roosterteeth.com/episode/rt-docs-the-world-s-greatest-head-massage-the-world-s-greatest-head-massage-an-asmr-journey-part-one',
'only_matching': True,
}, {
'url': 'https://roosterteeth.com/watch/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'only_matching': True,
}]
def _login(self):
@@ -89,60 +94,55 @@ class RoosterTeethIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
api_episode_url = 'https://svod-be.roosterteeth.com/api/v1/episodes/%s' % display_id
webpage = self._download_webpage(url, display_id)
episode = strip_or_none(unescapeHTML(self._search_regex(
(r'videoTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<title>(?P<title>[^<]+)</title>'), webpage, 'title',
default=None, group='title')))
title = strip_or_none(self._og_search_title(
webpage, default=None)) or episode
m3u8_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>http.+?\.m3u8.*?)\1',
webpage, 'm3u8 url', default=None, group='url')
if not m3u8_url:
if re.search(r'<div[^>]+class=["\']non-sponsor', webpage):
self.raise_login_required(
'%s is only available for FIRST members' % display_id)
if re.search(r'<div[^>]+class=["\']golive-gate', webpage):
self.raise_login_required('%s is not available yet' % display_id)
raise ExtractorError('Unable to extract m3u8 URL')
try:
m3u8_url = self._download_json(
api_episode_url + '/videos', display_id,
'Downloading video JSON metadata')['data'][0]['attributes']['url']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if self._parse_json(e.cause.read().decode(), display_id).get('access') is False:
self.raise_login_required(
'%s is only available for FIRST members' % display_id)
raise
formats = self._extract_m3u8_formats(
m3u8_url, display_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
m3u8_url, display_id, 'mp4', 'm3u8_native', m3u8_id='hls')
self._sort_formats(formats)
description = strip_or_none(self._og_search_description(webpage))
thumbnail = self._proto_relative_url(self._og_search_thumbnail(webpage))
episode = self._download_json(
api_episode_url, display_id,
'Downloading episode JSON metadata')['data'][0]
attributes = episode['attributes']
title = attributes.get('title') or attributes['display_title']
video_id = compat_str(episode['id'])
series = self._search_regex(
(r'<h2>More ([^<]+)</h2>', r'<a[^>]+>See All ([^<]+) Videos<'),
webpage, 'series', fatal=False)
comment_count = int_or_none(self._search_regex(
r'>Comments \((\d+)\)<', webpage,
'comment count', fatal=False))
video_id = self._search_regex(
(r'containerId\s*=\s*["\']episode-(\d+)\1',
r'<div[^<]+id=["\']episode-(\d+)'), webpage,
'video id', default=display_id)
thumbnails = []
for image in episode.get('included', {}).get('images', []):
if image.get('type') == 'episode_image':
img_attributes = image.get('attributes') or {}
for k in ('thumb', 'small', 'medium', 'large'):
img_url = img_attributes.get(k)
if img_url:
thumbnails.append({
'id': k,
'url': img_url,
})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'series': series,
'episode': episode,
'comment_count': comment_count,
'description': attributes.get('description') or attributes.get('caption'),
'thumbnails': thumbnails,
'series': attributes.get('show_title'),
'season_number': int_or_none(attributes.get('season_number')),
'season_id': attributes.get('season_id'),
'episode': title,
'episode_number': int_or_none(attributes.get('number')),
'episode_id': str_or_none(episode.get('uuid')),
'formats': formats,
'channel_id': attributes.get('channel_id'),
'duration': int_or_none(attributes.get('length')),
}

View File

@@ -32,7 +32,7 @@ class RtlNlIE(InfoExtractor):
'duration': 1167.96,
},
}, {
# best format avaialble a3t
# best format available a3t
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
'md5': 'dea7474214af1271d91ef332fb8be7ea',
'info_dict': {

View File

@@ -1,9 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
)
class RTPIE(InfoExtractor):
@@ -18,10 +20,6 @@ class RTPIE(InfoExtractor):
'description': 'As paixões musicais de António Cartaxo e António Macedo',
'thumbnail': r're:^https?://.*\.jpg',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas',
'only_matching': True,
@@ -33,57 +31,36 @@ class RTPIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'twitter:title', webpage, display_name='title', fatal=True)
description = self._html_search_meta('description', webpage)
thumbnail = self._og_search_thumbnail(webpage)
player_config = self._search_regex(
r'(?s)RTPPLAY\.player\.newPlayer\(\s*(\{.*?\})\s*\)', webpage, 'player config')
config = self._parse_json(player_config, video_id)
path, ext = config.get('file').rsplit('.', 1)
formats = [{
'format_id': 'rtmp',
'ext': ext,
'vcodec': config.get('type') == 'audio' and 'none' or None,
'preference': -2,
'url': 'rtmp://{streamer:s}/{application:s}'.format(**config),
'app': config.get('application'),
'play_path': '{ext:s}:{path:s}'.format(ext=ext, path=path),
'page_url': url,
'rtmp_live': config.get('live', False),
'player_url': 'http://programas.rtp.pt/play/player.swf?v3',
'rtmp_real_time': True,
}]
# Construct regular HTTP download URLs
replacements = {
'audio': {
'format_id': 'mp3',
'pattern': r'^nas2\.share/wavrss/',
'repl': 'http://rsspod.rtp.pt/podcasts/',
'vcodec': 'none',
},
'video': {
'format_id': 'mp4_h264',
'pattern': r'^nas2\.share/h264/',
'repl': 'http://rsspod.rtp.pt/videocasts/',
'vcodec': 'h264',
},
}
r = replacements[config['type']]
if re.match(r['pattern'], config['file']) is not None:
formats.append({
'format_id': r['format_id'],
'url': re.sub(r['pattern'], r['repl'], config['file']),
'vcodec': r['vcodec'],
})
self._sort_formats(formats)
config = self._parse_json(self._search_regex(
r'(?s)RTPPlayer\(({.+?})\);', webpage,
'player config'), video_id, js_to_json)
file_url = config['file']
ext = determine_ext(file_url)
if ext == 'm3u8':
file_key = config.get('fileKey')
formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=file_key)
if file_key:
formats.append({
'url': 'https://cdn-ondemand.rtp.pt' + file_key,
'preference': 1,
})
self._sort_formats(formats)
else:
formats = [{
'url': file_url,
'ext': ext,
}]
if config.get('mediaType') == 'audio':
for f in formats:
f['vcodec'] = 'none'
return {
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'thumbnail': thumbnail,
'description': self._html_search_meta(['description', 'twitter:description'], webpage),
'thumbnail': config.get('poster') or self._og_search_thumbnail(webpage),
}

View File

@@ -1,53 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
get_element_by_class,
unified_strdate,
)
class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://rudo.video/vod/oTzw0MGnyG',
'md5': '2a03a5b32dd90a04c83b6d391cf7b415',
'info_dict': {
'id': 'oTzw0MGnyG',
'ext': 'mp4',
'title': 'Comentario Tomás Mosciatti',
'upload_date': '20160617',
},
}
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, encoding='iso-8859-1')
jwplayer_data = self._parse_json(self._search_regex(
r'(?s)playerInstance\.setup\(({.+?})\)', webpage, 'jwplayer data'), video_id,
transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
info_dict = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, m3u8_id='hls', mpd_id='dash')
info_dict.update({
'title': self._og_search_title(webpage),
'upload_date': unified_strdate(get_element_by_class('date', webpage)),
})
return info_dict

View File

@@ -1,15 +1,18 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
sanitized_Request,
std_headers,
urlencode_postdata,
update_url_query,
)
@@ -31,44 +34,53 @@ class SafariBaseIE(InfoExtractor):
if username is None:
return
headers = std_headers.copy()
if 'Referer' not in headers:
headers['Referer'] = self._LOGIN_URL
_, urlh = self._download_webpage_handle(
'https://learning.oreilly.com/accounts/login-check/', None,
'Downloading login page')
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login form', headers=headers)
def is_logged(urlh):
return 'learning.oreilly.com/home/' in compat_str(urlh.geturl())
def is_logged(webpage):
return any(re.search(p, webpage) for p in (
r'href=["\']/accounts/logout/', r'>Sign Out<'))
if is_logged(login_page):
if is_logged(urlh):
self.LOGGED_IN = True
return
csrf = self._html_search_regex(
r"name='csrfmiddlewaretoken'\s+value='([^']+)'",
login_page, 'csrf token')
redirect_url = compat_str(urlh.geturl())
parsed_url = compat_urlparse.urlparse(redirect_url)
qs = compat_parse_qs(parsed_url.query)
next_uri = compat_urlparse.urljoin(
'https://api.oreilly.com', qs['next'][0])
login_form = {
'csrfmiddlewaretoken': csrf,
'email': username,
'password1': password,
'login': 'Sign In',
'next': '',
}
auth, urlh = self._download_json_handle(
'https://www.oreilly.com/member/auth/login/', None, 'Logging in',
data=json.dumps({
'email': username,
'password': password,
'redirect_uri': next_uri,
}).encode(), headers={
'Content-Type': 'application/json',
'Referer': redirect_url,
}, expected_status=400)
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form), headers=headers)
login_page = self._download_webpage(
request, None, 'Logging in')
if not is_logged(login_page):
credentials = auth.get('credentials')
if (not auth.get('logged_in') and not auth.get('redirect_uri')
and credentials):
raise ExtractorError(
'Login failed; make sure your credentials are correct and try again.',
expected=True)
'Unable to login: %s' % credentials, expected=True)
self.LOGGED_IN = True
# oreilly serves two same instances of the following cookies
# in Set-Cookie header and expects first one to be actually set
for cookie in ('groot_sessionid', 'orm-jwt', 'orm-rt'):
self._apply_first_set_cookie_header(urlh, cookie)
_, urlh = self._download_webpage_handle(
auth.get('redirect_uri') or next_uri, None, 'Completing login',)
if is_logged(urlh):
self.LOGGED_IN = True
return
raise ExtractorError('Unable to log in')
class SafariIE(SafariBaseIE):
@@ -76,7 +88,7 @@ class SafariIE(SafariBaseIE):
IE_DESC = 'safaribooksonline.com online video'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/
(?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/
(?:
library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>[^/?\#&]+)\.html|
videos/[^/]+/[^/]+/(?P<reference_id>[^-]+-[^/?\#&]+)
@@ -107,6 +119,9 @@ class SafariIE(SafariBaseIE):
}, {
'url': 'https://learning.oreilly.com/videos/hadoop-fundamentals-livelessons/9780133392838/9780133392838-00_SeriesIntro',
'only_matching': True,
}, {
'url': 'https://www.oreilly.com/library/view/hadoop-fundamentals-livelessons/9780133392838/00_SeriesIntro.html',
'only_matching': True,
}]
_PARTNER_ID = '1926081'
@@ -163,7 +178,7 @@ class SafariIE(SafariBaseIE):
class SafariApiIE(SafariBaseIE):
IE_NAME = 'safari:api'
_VALID_URL = r'https?://(?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html'
_VALID_URL = r'https?://(?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html'
_TESTS = [{
'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
@@ -188,7 +203,7 @@ class SafariCourseIE(SafariBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/
(?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/
(?:
library/view/[^/]+|
api/v1/book|
@@ -219,6 +234,9 @@ class SafariCourseIE(SafariBaseIE):
}, {
'url': 'https://learning.oreilly.com/videos/hadoop-fundamentals-livelessons/9780133392838',
'only_matching': True,
}, {
'url': 'https://www.oreilly.com/library/view/hadoop-fundamentals-livelessons/9780133392838/',
'only_matching': True,
}]
@classmethod

Some files were not shown because too many files have changed in this diff Show More