From: <no...@so...> - 2002-07-29 10:56:51
|
Bugs item #578363, was opened at 2002-07-07 15:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=578363&group_id=10894 Category: 41. Regexp Group: None >Status: Closed >Resolution: Fixed Priority: 6 Submitted By: Pavel Goran (pvgoran) Assigned to: Donal K. Fellows (dkf) Summary: [:xdigit:] makes RE to behave strange Initial Comment: Tcl Version: 8.4a3 Platform: Windows Code sample: set str {2:::DebugWin32} set re {([[:xdigit:]])([[:space:]]*)} puts "[regexp $re $str match xdigit spaces]" puts "match=$match" puts "xdigit=$xdigit" puts "spaces=$spaces" This gives: 1 match=2:::DebugWin32 xdigit=2 spaces=:::DebugWin32 Observed behaviour: "spaces=:::DebugWin32" Desired behaviour: "spaces=" Comment: It looks like the [[:xdigit:]] bracket expression causes the [[:space:]] bracket expression to match any symbol. If [[:xdigit:]] is replaced, for example, by [[:digit:]], or [[:space:]] is replaced by [[:alpha:]], all is going right. (Initially, I noticed this problem with \s instead of [[:space:]].) ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2002-07-29 11:44 Message: Logged In: YES user_id=79902 Reviewing your second pair of patches, I've decided to go instead with specifying the number of ranges as 3 because hex-digits are understood to only be done using the standard western digit characters (plus the six alphas in both cases, of course.) Unless there's a good reason for matching the number characters used in other alphabet systems, but then there'll also be a need for a locale-specific version of 'A-F', yes? :^) ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2002-07-29 11:06 Message: Logged In: YES user_id=79902 You've found the fault in the RE engine? I'm impressed; that code is non-trivial. Do you want to become a maintainer of this section? (For future reference, single patches rooted at the top of the CVS tree are easiest to work with by far.) I'll now be able to have a look at fixing this problem (with my general wherever-its-needed maintainer hat on.) ---------------------------------------------------------------------- Comment By: Pavel Goran (pvgoran) Date: 2002-07-28 17:42 Message: Logged In: YES user_id=383758 Yes, I definitely had to attach the files, since inserting them into the comment text give very strange formatting. Is this a bug in SourceForge.net software, or it is caused by my Opera browser? :) ---------------------------------------------------------------------- Comment By: Pavel Goran (pvgoran) Date: 2002-07-28 17:31 Message: Logged In: YES user_id=383758 This bug is caused by the error in generic/regc_cvec.c. Patch for: File "generic/regc_cvec.c", Branch "MAIN", Revision 1.4 --- regc_cvec.c Sun Jul 28 22:34:17 2002 +++ regc_cvec.c.new Sun Jul 28 23:15:34 2002 @@ -50,7 +50,7 @@ cv = (struct cvec *)MALLOC(n); if (cv == NULL) return NULL; - cv->chrspace = nc; + cv->chrspace = nchrs; cv->chrs = (chr *)&cv->mcces[nmcces]; /* chrs just after MCCE ptrs */ cv->mccespace = nmcces; cv->ranges = cv- >chrs + nchrs + nmcces*(MAXMCCE+1); It's strange that such a serious error was not noticed yet. I also found the inconsistency in generic/regc_locale.c. The existing code should work without problems, but it is not correct. It can be fixed in two ways. The first one: Patch for: File "generic/regc_locale.c", Branch "MAIN", Revision 1.8 --- regc_locale.c Sun Jul 28 22:33:28 2002 +++ regc_locale.c.new-1 Sun Jul 28 22:38:02 2002 @@ -842,7 +842,10 @@ case CC_XDIGIT: cv = getcvec(v, 0, NUM_DIGIT_RANGE+2, 0); if (cv) { - addrange(cv, '0', '9'); + for (i = 0; i < NUM_DIGIT_RANGE; i++) { + addrange(cv, digitRangeTable[i].start, + digitRangeTable[i].end); + } addrange(cv, 'a', 'f'); addrange(cv, 'A', 'F'); } The second one: Patch for: File "generic/regc_locale.c", Branch "MAIN", Revision 1.8 --- regc_locale.c Sun Jul 28 22:33:28 2002 +++ regc_locale.c.new- 2 Sun Jul 28 23:15:03 2002 @@ -840,7 +840,7 @@ } break; case CC_XDIGIT: - cv = getcvec(v, 0, NUM_DIGIT_RANGE+2, 0); + cv = getcvec(v, 0, 3, 0); if (cv) { addrange(cv, '0', '9'); addrange(cv, 'a', 'f'); The first way is IMO preferrable. P.S. Maybe it was better to attach three diff files, instead of inserting them in the text? ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2002-07-08 10:49 Message: Logged In: YES user_id=79902 Strange indeed! If only the RE engine was less cryptic... Suggested workaround: replace [[:xdigit:]] with [0-9a-fA-F] which works. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=578363&group_id=10894 |