ADENDUM: While testing the replacement of '\b' with '(?:[\s.\,!\?]|$)' we noticed the following weird behaviour: We copy/pasted the regex from line 713 of bombre.txt to the testRe file: \bR+\s?\S?\s?\W?[e\xE3\xE8-\xEB]+\s?\S?\s?\W?P+\s?\S?\s?\W?[L!|1]+\s?\S?\s?\W?[I1!|lt\xEC-\xEF]+\s?\S?\s?\W?C+\s?\S?\s?\W?[a\xA4\xE0-\xE6@]\b # REPLICA The test word in the ASSP Analyze GUI is: replicação After pressing the "ANALYZE" button we see 2 DIFFERENT results simultaneously: "testRe" catches "repl" while "bombRe"...
ADENDUM: While testing the replacement of '\b' with '(?:[\s.\,!\?]|$)' we noticed the following weird behaviour: We copy/pasted the regex from line 713 of bombre.txt to the testRe file: \bR+\s?\S?\s?\W?[e\xE3\xE8-\xEB]+\s?\S?\s?\W?P+\s?\S?\s?\W?[L!|1]+\s?\S?\s?\W?[I1!|lt\xEC-\xEF]+\s?\S?\s?\W?C+\s?\S?\s?\W?[a\xA4\xE0-\xE6@]\b # REPLICA The test word in the ASSP Analyze GUI is: replicação After pressing the "ANALYZE" button we see 2 DIFFERENT results simultaneously: "TestRe" catches "repl" while "bombRe"...
ADENDUM: While testing the replacement of '\b' with '(?:[\s.\,!\?]|$)' we noticed the following weird behaviour: We copy/pasted the regex from line 713 of bombre.txt to the testRe file: \bR+\s?\S?\s?\W?[e\xE3\xE8-\xEB]+\s?\S?\s?\W?P+\s?\S?\s?\W?[L!|1]+\s?\S?\s?\W?[I1!|lt\xEC-\xEF]+\s?\S?\s?\W?C+\s?\S?\s?\W?[a\xA4\xE0-\xE6@]\b # REPLICA The test word in the ASSP Analyze GUI is: replicação After pressing the "ANALYZE" button we see 2 DIFFERENT results simultaneously: "TestRe" catches "repl" while "bombRe"...
In the first moment, due to its behaviour, I interpreted it as a hidden bug inside ASSP. I now understand that it is something related to the way Perl implements regex. A mix Perl+ASSP. At this moment, I still believe that it is possible to work on ASSP to avoid it (again, I am not a Perl programmer and thus I may be grossily wrong) . Also, even if possible to fix this in ASSP code, it may (or not ?) be terribly dificult to implement. Thus, my way to contribute to the ASSP development is to watch...
Thomas, I believe that it is better to show you from where I initialy got the problem: The perl regex that originated this report is supplied by ASSP inside the file ASSP_2.6.1_18128_install.zip in the following path: \assp\files\bombre.txt Line 713: \bR+\s?\S?\s?\W?[e\xE3\xE8-\xEB]+\s?\S?\s?\W?P+\s?\S?\s?\W?[L!|1]+\s?\S?\s?\W?[I1!|lt\xEC-\xEF]+\s?\S?\s?\W?C+\s?\S?\s?\W?[a\xA4\xE0-\xE6@]\b# REPLICA If you are using the original bombre.txt file in your server, just enter the word replicação in the...
I am running ASSP on Perl 5.24 I always test my regexes using "RegexBuddy" (Jean Goyvaerts). It has settings to emulate Perl. It did not present the problems I saw in ASSP. I read "effectiveperlprogramming.com" link you proposed: Change those \b to \b{wb}: use v5.22.1; my $string = "fred and barney's lodge v2.0"; while( $string =~ m/\b{wb}(\w.*?)\b{wb}/g ) { say $1; } Now barney's and v2.0 stick together: fred and barney's lodge v2.0 I am not a Perl expert, but the above example apears to hint that...
regex do not work correctly with accented letters - The bug stands
Of course in ASSP the regex was correct \breplica\b Sorry for the typo in the report. The Bug stands.