More string operators for --match-filter

This commit is contained in:
Max Teegen 2021-06-13 16:25:19 +02:00
parent c2350cac24
commit 3b74d490e0
2 changed files with 102 additions and 23 deletions

115
README.md
View File

@ -128,6 +128,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
CIDR notation
## Video Selection:
<<<<<<< HEAD
--playlist-start NUMBER Playlist video to start at (default is
1)
--playlist-end NUMBER Playlist video to end at (default is
@ -160,27 +161,33 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
than COUNT views
--max-views COUNT Do not download any videos with more
than COUNT views
--match-filter FILTER Generic video filter. Specify any key
(see the "OUTPUT TEMPLATE" for a list
of available keys) to match if the key
is present, !key to check if the key is
not present, key > NUMBER (like
"comment_count > 12", also works with
>=, <, <=, !=, =) to compare against a
number, key = 'LITERAL' (like "uploader
= 'Mike Smith'", also works with !=) to
match against a string literal and & to
require multiple matches. Values which
are not known are excluded unless you
put a question mark (?) after the
operator. For example, to only match
videos that have been liked more than
100 times and disliked less than 50
times (or the dislike functionality is
not available at the given service),
but who also have a description, use
--match-filter "like_count > 100 &
--match-filter FILTER Generic video filter. Specify any key (see
the "OUTPUT TEMPLATE" for a list of
available keys) to match if the key is
present, !key to check if the key is not
present, key > NUMBER (like "comment_count
> 12", also works with >=, <, <=, !=, =) to
compare against a number, key = 'LITERAL'
(like "uploader = 'Mike Smith'", also works
with !=) to match against a string literal
and & to require multiple matches. Values
which are not known are excluded unless you
put a question mark (?) after the operator.
For example, to only match videos that have
been liked more than 100 times and disliked
less than 50 times (or the dislike
functionality is not available at the given
service), but who also have a description,
use --match-filter "like_count > 100 &
dislike_count <? 50 & description" .
For matching strings, the oparators ~= and
!~= check for string containment and
exclusion. The operators *= and !*= search
for a regular expression.
For example, to only match videos which
have neither 'sponsored' nor 'Sponsored' in
the title, use --match-filter "title !*=
'[Ss]ponsored'".
--no-playlist Download only the video, if the URL
refers to a video and a playlist.
--yes-playlist Download the playlist, if the URL
@ -192,6 +199,74 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
downloaded videos in it.
--include-ads Download advertisements as well
(experimental)
=======
--playlist-start NUMBER Playlist video to start at (default is 1)
--playlist-end NUMBER Playlist video to end at (default is last)
--playlist-items ITEM_SPEC Playlist video items to download. Specify
indices of the videos in the playlist
separated by commas like: "--playlist-items
1,2,5,8" if you want to download videos
indexed 1, 2, 5, 8 in the playlist. You can
specify range: "--playlist-items
1-3,7,10-13", it will download the videos
at index 1, 2, 3, 7, 10, 11, 12 and 13.
--match-title REGEX Download only matching titles (regex or
caseless sub-string)
--reject-title REGEX Skip download for matching titles (regex or
caseless sub-string)
--max-downloads NUMBER Abort after downloading NUMBER files
--min-filesize SIZE Do not download any videos smaller than
SIZE (e.g. 50k or 44.6m)
--max-filesize SIZE Do not download any videos larger than SIZE
(e.g. 50k or 44.6m)
--date DATE Download only videos uploaded in this date
--datebefore DATE Download only videos uploaded on or before
this date (i.e. inclusive)
--dateafter DATE Download only videos uploaded on or after
this date (i.e. inclusive)
--min-views COUNT Do not download any videos with less than
COUNT views
--max-views COUNT Do not download any videos with more than
COUNT views
--match-filter FILTER Generic video filter. Specify any key (see
the "OUTPUT TEMPLATE" for a list of
available keys) to match if the key is
present, !key to check if the key is not
present, key > NUMBER (like "comment_count
> 12", also works with >=, <, <=, !=, =) to
compare against a number, key = 'LITERAL'
(like "uploader = 'Mike Smith'", also works
with !=) to match against a string literal
and & to require multiple matches. Values
which are not known are excluded unless you
put a question mark (?) after the operator.
For example, to only match videos that have
been liked more than 100 times and disliked
less than 50 times (or the dislike
functionality is not available at the given
service), but who also have a description,
use --match-filter "like_count > 100 &
dislike_count <? 50 & description" .
For matching strings, the oparators ~= and
!~= check for string containment and
exclusion. The operators *= and !*= search
for a regular expression.
For example, to only match videos which
have neither 'sponsored' nor 'Sponsored' in
the title, use --match-filter "title !*=
'[Ss]ponsored'"
--no-playlist Download only the video, if the URL refers
to a video and a playlist.
--yes-playlist Download the playlist, if the URL refers to
a video and a playlist.
--age-limit YEARS Download only videos suitable for the given
age
--download-archive FILE Download only videos not listed in the
archive file. Record the IDs of all
downloaded videos in it.
--include-ads Download advertisements as well
(experimental)
>>>>>>> dd954e809 (More string operators for --match-filter)
## Download Options:
-r, --limit-rate RATE Maximum download rate in bytes per

View File

@ -4369,6 +4369,10 @@ def _match_one(filter_part, dct):
'>=': operator.ge,
'=': operator.eq,
'!=': operator.ne,
'~=': operator.contains,
'!~=': lambda left, right: not operator.contains(left, right),
'*=': lambda left, right: bool(re.search(right, left)),
'!*=': lambda left, right: not bool(re.search(right, left)),
}
operator_rex = re.compile(r'''(?x)\s*
(?P<key>[a-z_]+)
@ -4392,14 +4396,14 @@ def _match_one(filter_part, dct):
# https://github.com/ytdl-org/youtube-dl/issues/11082).
or actual_value is not None and m.group('intval') is not None
and isinstance(actual_value, compat_str)):
if m.group('op') not in ('=', '!='):
raise ValueError(
'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else:
if m.group('op') in ('~=', '!~=', '*=', '!*='):
raise ValueError(
'Operator %s only supports string values!' % m.group('op'))
try:
comparison_value = int(m.group('intval'))
except ValueError: