{<script(.(?!</script>))*?(doubleclick|flycast|burstnet|spylog)+?.*?</script>}
is invalid: it contains a constraint (the lookahead)
followed by a quantifier, which is expressly forbidden in
the man page for re_syntax.
I think the author actually wanted something like
{<script
(?=.*</script>).*?(doubleclick|flycast|burstnet|spylog).*?</script>}
which runs OK (albeit slowly; there is probably a better way
to express this).
OTOH, the regexp engine should complain about the bad syntax
and return an error, instead of getting into an infinite
loop.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>The regular expression
>{<script(.(?!</script>))*?
>(doubleclick|flycast|burstnet|spylog)+?.*?</script>}
>is invalid: it contains a constraint (the lookahead)
>followed by a quantifier, which is expressly forbidden in
>the man page for re_syntax.
You aren't right. If you follow lookahead constraint
directly (NB!) with quantifier, interpreter reports an
error. And my construction works fine (interpreter doesn't
report any error), because lookahead atom doesn't follow
any quantifier. But works quickly only on small buffer.
Also, if you wait a long-long time (you can decrease text
size between <script> and </script> in example),
interpreter un-hungs, and return correct result.
>I think the author actually wanted something like
>{<script(?=.*</script>).*?
>(doubleclick|flycast|burstnet|spylog).*?&lt;/script>}
>which runs OK (albeit slowly; there is probably a
>better way to express this).
You aren't right again. :)
Try:
# my expression
set my {<script(.(?!</script>))*?(123)+?.*?</script>}
# your expression
set mo {<script(?=.*</script>).*?(123).*?</script>}
# sample text
set text {<script> 123 </script> <script> 345 </script>
<script> 123 </script>}
# Correct result
regsub -all $my $text --- res
2
set res
--- <script> 345 </script> ---
# Not correct result:
regsub -all $mo $text --- res
2
set res
--- ---
Btw, if you know, how to get first result without using of
constraint lookahead with non-greedy quantifier, let me
know. :)
>OTOH, the regexp engine should complain about the
>bad syntax
>and return an error, instead of getting into an infinite
>loop.
IMHO, itsn't infinite loop, but only very slow work with
lookahead constraints.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The "capturing parens" are very costly in this case;
replacing them with the "non-capturing" variant produces the
desired result. The regex becomes then
Logged In: YES
user_id=148712
Bug confirmed: tclsh seems to get in an infinite loop.
Attaching the file now
Logged In: YES
user_id=148712
The regular expression
{<script(.(?!</script>))*?(doubleclick|flycast|burstnet|spylog)+?.*?</script>}
is invalid: it contains a constraint (the lookahead)
followed by a quantifier, which is expressly forbidden in
the man page for re_syntax.
I think the author actually wanted something like
{<script
(?=.*</script>).*?(doubleclick|flycast|burstnet|spylog).*?</script>}
which runs OK (albeit slowly; there is probably a better way
to express this).
OTOH, the regexp engine should complain about the bad syntax
and return an error, instead of getting into an infinite
loop.
Logged In: YES
user_id=286083
>The regular expression
>{<script(.(?!</script>))*?
>(doubleclick|flycast|burstnet|spylog)+?.*?</script>}
>is invalid: it contains a constraint (the lookahead)
>followed by a quantifier, which is expressly forbidden in
>the man page for re_syntax.
You aren't right. If you follow lookahead constraint
directly (NB!) with quantifier, interpreter reports an
error. And my construction works fine (interpreter doesn't
report any error), because lookahead atom doesn't follow
any quantifier. But works quickly only on small buffer.
Also, if you wait a long-long time (you can decrease text
size between <script> and </script> in example),
interpreter un-hungs, and return correct result.
>I think the author actually wanted something like
>{<script(?=.*</script>).*?
>(doubleclick|flycast|burstnet|spylog).*?&lt;/script>}
>which runs OK (albeit slowly; there is probably a
>better way to express this).
You aren't right again. :)
Try:
# my expression
set my {<script(.(?!</script>))*?(123)+?.*?</script>}
# your expression
set mo {<script(?=.*</script>).*?(123).*?</script>}
# sample text
set text {<script> 123 </script> <script> 345 </script>
<script> 123 </script>}
# Correct result
regsub -all $my $text --- res
2
set res
--- <script> 345 </script> ---
# Not correct result:
regsub -all $mo $text --- res
2
set res
--- ---
Btw, if you know, how to get first result without using of
constraint lookahead with non-greedy quantifier, let me
know. :)
>OTOH, the regexp engine should complain about the
>bad syntax
>and return an error, instead of getting into an infinite
>loop.
IMHO, itsn't infinite loop, but only very slow work with
lookahead constraints.
Logged In: YES
user_id=148712
Indeed, I am both wrong and sorry.
The "capturing parens" are very costly in this case;
replacing them with the "non-capturing" variant produces the
desired result. The regex becomes then
{<script(?:.(?!</script>))*?(?:doubleclick|flycast|burstnet|spylog)+?.*?</script>}
Logged In: YES
user_id=148712
The originally submitted script does perform as intended,
only very slowly. The use of non-capturing parens solves the
issue.
It now seems clear that the bug is really a performance and
not a functionality issue.
I am therefore changing the summary description and reducing
the priority.