Menu

#58 captureComponents on empty string inconsistent

RegexKitLite 4.0
open-accepted
5
2014-08-14
2010-12-30
No

If an empty string invokes captureComponentsMatchedByRegex, the results should be an array of empty strings (one for each component, plus one for the entire match). However, the following code reports that out of 1000 iterations, the capture array is empty 500 times, and has two elements the other 500 times.

int iterations = 1000;
int empties = 0, twos = 0;
for (int i = 0; i < iterations; i++) {
NSString *exp = @"";
NSString *regex = @"( *)";
NSArray *matches = [exp captureComponentsMatchedByRegex:regex];
if ([matches count] == 0) {
empties++;
} else if ([matches count] == 2) {
twos++;
}
}
NSLog(@"%d empties, %d twos out of %d iterations", empties, twos, iterations);

Discussion

  • John Engelhart

    John Engelhart - 2010-12-30

    Yep, it's a bug. It has to do with the way the ICU library and API report that the "regex has finished matching". Well, at least in the older versions of the library this information wasn't easily available (but don't quote me, it's been a long time). I thought I had nailed all the corner cases long ago- this is the first bug of this type in years. It's the zero length of the string to be matched that has regex matches that's causing the problem.

    I have a temporary fix, but I'm going to spend some time making sure there aren't other weird corner cases. Expect a fix in the next day or so.

     
  • John Engelhart

    John Engelhart - 2010-12-30
    • status: open --> open-accepted
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.