I'm using RegexKit to match a list of names against a user-supplied regular expression.
The following code results in an infinite loop when used with the expression ".*":
RKEnumerator *matchEnum = [name matchEnumeratorWithRegex:regEx]
if(matchEnum) {
while ((captureRanges = [matchEnum nextRanges]) != NULL) {
// Do Stuff...
}
}
This is the technique given in the documentation (http://regexkit.sourceforge.net/Documentation/RegexKitProgrammingGuide.html#NSStringAdditions), so unless I messed up somewhere it looks like a bug!
By the way, the final NSRange it keeps returning over and over is {<length of string>, 0}
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's due to the fact that a match length of zero is not correctly handled. I common case where this can happen is with the regex '^(.*)$' searching a multiline string, and the string has "\n\n" somewhere in it (i.e., a blank line). The blank line has a match length of zero, and the (buggy) update logic fails to handle this case properly: it never advances past it, it just adds the length (0) to the current position.... which gets you right back to where you started at the next time. :(
If you want to fix the bug yourself, you can take a look at RKEnumerator.m and patch up the method _updateToNextMatch (near the very bottom of the file). The line
I haven't actually tested that, mind you, but I suspect that you get the general gist.
The bug also exists in the string replacement methods. In the file NSString.m, in the function RKStringByMatchingAndExpanding, if you search for the line:
In the meantime I've been using a workaround: simply check the match length. I can't think of a real-world situation where match length of 0 would be returned, so if the enumerator returns {..., 0} I treat that as NULL.
(This program is working with single-line strings that are guaranteed to be at least 1 character in length.)
If that ends up not being enough, I'll take a look through the code. I haven't downloaded the source yet; I'm using the precompiled framework for now.
Thanks again,
Richard
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using RegexKit to match a list of names against a user-supplied regular expression.
The following code results in an infinite loop when used with the expression ".*":
RKEnumerator *matchEnum = [name matchEnumeratorWithRegex:regEx]
if(matchEnum) {
while ((captureRanges = [matchEnum nextRanges]) != NULL) {
// Do Stuff...
}
}
This is the technique given in the documentation (http://regexkit.sourceforge.net/Documentation/RegexKitProgrammingGuide.html#NSStringAdditions), so unless I messed up somewhere it looks like a bug!
By the way, the final NSRange it keeps returning over and over is {<length of string>, 0}
Thanks!
Yea, this is a bug. :( There's actually a bug open on this already: https://sourceforge.net/tracker/index.php?func=detail&aid=1958025&group_id=204582&atid=990188
It's due to the fact that a match length of zero is not correctly handled. I common case where this can happen is with the regex '^(.*)$' searching a multiline string, and the string has "\n\n" somewhere in it (i.e., a blank line). The blank line has a match length of zero, and the (buggy) update logic fails to handle this case properly: it never advances past it, it just adds the length (0) to the current position.... which gets you right back to where you started at the next time. :(
If you want to fix the bug yourself, you can take a look at RKEnumerator.m and patch up the method _updateToNextMatch (near the very bottom of the file). The line
atBufferLocation = (resultUTF8Ranges[0].location + resultUTF8Ranges[0].length);
should be updated to something like:
atBufferLocation = (resultUTF8Ranges[0].location + resultUTF8Ranges[0].length + (resultUTF8Ranges[0].length == 0) ? 1 : 0);
I haven't actually tested that, mind you, but I suspect that you get the general gist.
The bug also exists in the string replacement methods. In the file NSString.m, in the function RKStringByMatchingAndExpanding, if you search for the line:
searchIndex = matchRanges[0].location + matchRanges[0].length;
it needs the same fix, basically something like:
searchIndex = matchRanges[0].location + matchRanges[0].length + (matchRanges[0].length == 0) ? 1 : 0;
Again, I haven't tested that fix.
Thanks for the detailed response!
In the meantime I've been using a workaround: simply check the match length. I can't think of a real-world situation where match length of 0 would be returned, so if the enumerator returns {..., 0} I treat that as NULL.
(This program is working with single-line strings that are guaranteed to be at least 1 character in length.)
If that ends up not being enough, I'll take a look through the code. I haven't downloaded the source yet; I'm using the precompiled framework for now.
Thanks again,
Richard