Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#2608 lsearch -regexp error introduced in 8.4.2

obsolete: 8.4.5
closed-rejected
5
2004-03-01
2004-02-28
Rick Macdonald
No

When a list containing an embedded list element
starting with a number is used as a regexp in lsearch,
a parsing error occurs:

couldn't compile regular expression pattern: invalid
repetition count(s)

If the first element is a text string (eg {zxcv 1234})
the error does not occur.

I hit this problem when moving from 8.3.3 to 8.4.5. I
tried every 8.4 release (downloaded directly from SF
and built fresh) and determined it first broke in
8.4.2. I hope this helps.

/usr/local/src/tcltk/tcl8.4.1/unix$ gcc --version
gcc (GCC) 3.3.3 20040125 (prerelease) (Debian)

The details below apply to tests run on Solaris 5.8 and
Linux (Debian sid). Platform/OS does not seem to be an
issue.

It is the {1234 zxcv} element in the following list
that triggers the problem:

% list "" 2134 qwer "1234 zxcv" 2345 asdf
{} 2134 qwer {1234 zxcv} 2345 asdf

In the examples below I'm searching an empty list but
this is just to simplify the examples and isn't an issue.

/usr/local/src/tcltk/tcl8.4.2/unix$
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./tclsh
% info patchlevel
8.4.2
% lsearch -regexp {} ^[list "" 2134 qwer "1234 zxcv"
2345 asdf]
couldn't compile regular expression pattern: invalid
repetition count(s)
% lsearch -regexp {} ^[list "" 2134 qwer "zxcv 1234"
2345 asdf]
-1
% lsearch -regexp {} ^[list "" 2134 qwer "asdf zxcv"
2345 asdf]
-1
% exit

/usr/local/src/tcltk/tcl8.4.1/unix$
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./tclsh
% info patchlevel
8.4.1
% lsearch -regexp {} ^[list "" 2134 qwer "1234 zxcv"
2345 asdf]
-1
% lsearch -regexp {} ^[list "" 2134 qwer "zxcv 1234"
2345 asdf]
-1
% lsearch -regexp {} ^[list "" 2134 qwer "asdf zxcv"
2345 asdf]
-1
% exit

Discussion

  • Don Porter
    Don Porter
    2004-03-01

    Logged In: YES
    user_id=80530

    so [lsearch] has nothing to do with this
    right. It's just a question of what
    strings form a legal regexp ?

    % regexp "^{} 2134 qwer {1234 zxcv} 2345 asdf" {}
    couldn't compile regular expression pattern: invalid
    repetition count(s)

     
  • Don Porter
    Don Porter
    2004-03-01

    Logged In: YES
    user_id=80530

    hmmm... no.

    % lsearch -regexp {} {^{} 2134 qwer {1234 zxcv} 2345 asdf}
    couldn't compile regular expression pattern: invalid
    repetition count(s)
    % info patch
    8.4.6

    % lsearch -regexp {} {^{} 2134 qwer {1234 zxcv} 2345 asdf}
    -1
    % info patch
    8.3.5
    % regexp {^{} 2134 qwer {1234 zxcv} 2345 asdf} {}
    couldn't compile regular expression pattern: invalid
    repetition count(s)

     
  • Don Porter
    Don Porter
    2004-03-01

    • assigned_to: pvgoran --> dkf
     
  • Don Porter
    Don Porter
    2004-03-01

    Logged In: YES
    user_id=80530

    It looks like when [lsearch] was given an
    empty list, it did not bother to compile the
    regular expression, because there was no
    comparing to be done, so invalid regexp's
    were not reported. It's that slight change
    in behavior that's causing you trouble?

    Probably you should correct your code so
    it does not construct an invalid regexp for
    passing in.

    Passing to dkf for another opinion on
    whether this error behavior is something
    we needed to preserve.

     
  • Logged In: YES
    user_id=79902

    Well, *I* don't think we need to keep that (lack of) failure
    mode. Especially as if the code was ever so unfortunate as
    to search against a non-empty list, it'd fail for sure. A
    failure that is masked under some non-obvious circumstances
    is trouble waiting to happen IMHO.

    Closing this, though if anyone has a good argument why
    that's wrong and this bug should be fixed instead, I'd love
    to hear it.

     
    • status: open --> closed-rejected
     
  • Rick Macdonald
    Rick Macdonald
    2004-03-02

    Logged In: YES
    user_id=493198

    I hadn't realized that searching an empty list in my example
    made a difference.

    I looked again, and with a closer example to what my app is
    doing I see that it worked in tcl8.0 but fails as of tcl8.2.
    I don't have 8.1 handy.

    I've restated an example here to make it more clear that
    what I am doing is searching a list of lists to match with a
    list element. I'm using -regexp to anchor the match at the
    beginning. In my mind, "^myelement" is a simple rexexp of
    the anchor "^" folowed by some data that is "unfortunately"
    being contrued as part of the regexp itself.

    I can certainly accept it if your answer is that tcl is
    behaving as desired and expected, but if you don't mind
    please confirm this in light of the revised example here. I
    do understand that after parsing, all my examples are
    probably seen the same by tcl. I wouldn't have thought this
    a bug if it had originally failed years ago when the code
    was first written.

    If there is no bug here I have some refactoring to do. In
    some cases I can use -exact, otheres -glob, but in some I'll
    have to loop through and compare each element manually. Or,
    can anybody see any clever quoting that could be done to the
    lsearch below so that $myelement is not seen as part of the
    regexp?

    tclsh8.0
    % set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]]
    {{3 5} 1 3} {{4 6} 4 6}
    % set myelement [list [list 4 6]]
    {4 6}
    % lsearch -regexp $mylist ^$myelement
    1
    %

    tclsh8.4
    % set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]]
    {{3 5} 1 3} {{4 6} 4 6}
    % set myelement [list [list 4 6]]
    {4 6}
    % lsearch -regexp $mylist ^$myelement
    couldn't compile regular expression pattern: quantifier
    operand invalid

    "-glob" works for this particular case:
    % lsearch -glob $mylist $myelement*
    1

     
  • Don Porter
    Don Porter
    2004-03-02

    Logged In: YES
    user_id=80530

    that makes more sense; thanks
    for following up.

    From Tcl 8.0 -> 8.2, Tcl was extended
    to support Unicode. This included a
    new [regexp] engine capable of scanning
    Unicode strings, and also extended to
    recognize so-called Advanced Regular
    Expressions. (ARE)

    Looks like your examples include regexps
    that mean something different (and invalid)
    when parsed as ARE's, than they did in
    8.0 when only Basic RE's were known.

    See

    http://tmml.sourceforge.net/doc/tcl/re_syntax.html

    or the corresponding part of your local
    Tcl documentation for details on the new
    regexp's availble in Tcl 8.1 and later.

     
  • Rick Macdonald
    Rick Macdonald
    2004-03-02

    Logged In: YES
    user_id=493198

    Sorry, I totally missed that regexp had undergone such
    expansion. All I need to do is force the Basic RE behaviour
    by adding (?b):

    tclsh
    % info patchlevel
    8.4.2
    % set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]]
    {{3 5} 1 3} {{4 6} 4 6}
    % set myelement [list [list 4 6]]
    {4 6}
    % lsearch -regexp $mylist ^$myelement
    couldn't compile regular expression pattern: quantifier
    operand invalid
    % lsearch -regexp $mylist (?b)^$myelement
    1
    %