captureComponents on empty string inconsistent
Status: Beta
Brought to you by:
jengelhart
If an empty string invokes captureComponentsMatchedByRegex, the results should be an array of empty strings (one for each component, plus one for the entire match). However, the following code reports that out of 1000 iterations, the capture array is empty 500 times, and has two elements the other 500 times.
int iterations = 1000;
int empties = 0, twos = 0;
for (int i = 0; i < iterations; i++) {
NSString *exp = @"";
NSString *regex = @"( *)";
NSArray *matches = [exp captureComponentsMatchedByRegex:regex];
if ([matches count] == 0) {
empties++;
} else if ([matches count] == 2) {
twos++;
}
}
NSLog(@"%d empties, %d twos out of %d iterations", empties, twos, iterations);
Yep, it's a bug. It has to do with the way the ICU library and API report that the "regex has finished matching". Well, at least in the older versions of the library this information wasn't easily available (but don't quote me, it's been a long time). I thought I had nailed all the corner cases long ago- this is the first bug of this type in years. It's the zero length of the string to be matched that has regex matches that's causing the problem.
I have a temporary fix, but I'm going to spend some time making sure there aren't other weird corner cases. Expect a fix in the next day or so.