Hi; can anyone spot what I'm doing wrong in the first of these match/replaces? Or, if the answer is right, why my expectation is wrong? I would expect the answer to be "mMmMm", but it is "MmMmMm". I threw in some additional tests; all except #3 give the result I expect. #3, as you can see is very similar to #1 but with "." for the replacement rather than "M".
Are you sure your #1 example is correct? I would expect 'mMmMm', not 'MmMmMm' as you gave as the result.
I'll take a look and see what I can find. Of course, it's always the corner cases that seems to get you in the end. Off the top of my head, I don't think I even included any tests for zero length strings in the unit tests. :(
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the first fails with:
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/collectionAdditions.m:893: error: -[collectionAdditions testStringMatchAndReplace] : "[searchAndReplacedString isEqualToString:@"mMmMm"]" should be true. String: MmMmMm
the second succeeds.
I stepped through the execution and I can see vaguely what is happening in RKStringByMatchingAndExpanding: because the ref string is empty, no instructions are put in the referenceInstructionsBuffer, so nothing happens when it does RKApplyReferenceInstructions so the copyInstructionsBuffer is empty and it just returns the searchString. I think when the replace count is greater than 1, some copy instructions (for the matched but unreplaced bits) are put in the copyInstructionsBuffer, so the unmatched bits do get copied (and the matched bits get left behind!). I hesitate to blast a "fix" in because I'm likely to fix this problem and break 23 working things, also the correct fix is probably as much a matter of taste as technique. One did occur to me which was to have a sort of noop instruction and when the ref string is empty chuck the noop in the list of instructions. It might also be worthwhile to consider what should happen if the ref string is nil as opposed to empty - currently I think nil is returned which may or may not be consistent? It could equally return the unmatched bits with nothing inserted between them?
btw- when I run the unit tests, I get 369 failures, 365 look like:
error: -[multithreading testSimpleMultiThreading] : "number1 != nil" should be true.
error: -[multithreading testSimpleMultiThreading] : "number2 != nil" should be true.
they are in various locations: line 1215, 1223, 1236 and 1244
the remaining four are:
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:371: error: -[stringConversion testStringParseNamedConvertToNSNumberAllFormats] : "number1 != nil" should be true.
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:379: error: -[stringConversion testStringParseNamedConvertToNSNumberAllFormats] : "number2 != nil" should be true.
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:351: error: -[stringConversion testStringParseNumericConvertToNSNumberAllFormats] : "number1 != nil" should be true.
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:359: error: -[stringConversion testStringParseNumericConvertToNSNumberAllFormats] : "number2 != nil" should be true.
Are these failures expected? I'm running on ppc with 1 CPU if that matters.
Doug Dickinson --dd
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've yet to dig in to this problem, but I just wanted to comment on the unit tests failures..
No, it's not normal. While working on the 0.6 release the other day, I had the unit tests kick back an awful lot of failures when I modified the compiler optimization setting. I haven't yet tracked down the root cause of it, but changing the prototype of one the internal functions (RKSRegexFromStringOrRegex) seems to make the problem come and go. Your report makes me wonder if there isn't a problem in the latest released version as well.
Since it comes and go with compiler optimization changes and prototype changes, it's hard to say where the bug is, or when it started happening. If it's a compiler bug, it's also a lot hard to track down since what you think it is you're debugging might not be what's actually executing, and you need to get down in to the assembly code, which is never fun and takes forever to piece together what's happening.
This has all the signs of spending an enormous amount of time to find a simple, single character type-o kind of bug. :(
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, with a little bit of juggling, I used the in-development unit test code for 0.6 that shows failures with the in-development 0.6 framework (with the header prototypes and optimization flags changes that produce errors) against a copy of the released 0.4 framework I had handy and it spit out:
Test Suite 'build/Debug/Unit Tests.octest(Tests)' finished at 2007-12-20 15:20:00 -0500.
Executed 72 tests, with 1 failure (1 unexpected) in 0.706 (2.208) seconds
The one failure has to do with the start of a unit test for a feature in 0.6, but not present in 0.4, so it's expected. I also had to comment out the NSData tests because those extensions aren't present in 0.4 either, but again, all failures accounted for.
So, that's good news. In the process of doing all this, I think I tracked the error down. It has to do with adding the 'pure' attribute to RKRegexFromStringOrRegex, and there is a case that I had obviously not considered or realized that is not pure, namely when there is some kind of error in the regex, or anything that causes an NSException to be thrown. It's hard to predict exactly how this interacts with everything because it's a hint to the optimizer, which by definition is making changes to the way code executes that are not explicitly expressed in the source code.
I still haven't looked at the original replacement issue as I'm knee deep in 0.6, but I will definitely look in to it before 0.6 is released. 0.6 will bring some API changes that make use of the standard error: NSError reporting method, all strings used be the framework prepped for internationalization (though none provided initially), and accelerated multithreaded matching of regular expressions in a collection (i.e, [@"Match test string" isMatchedByAnyRegexInSet:setOfRegexes], there's also array methods) which is taking up all of my time right now. It's pretty neat, it keeps track of the regexes in a collection and the number of times that regex was a match. It then keeps that list in sorted order, so the regexes with the highest hit count are tried first, and in parallel (number of threads == number of cores). This was inspired by the way Safari AdBlock uses RegexKit, which has to do bulk comparisons of a URL against a collection of regular expressions to determine if the URL should be blocked.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The incorrect search and replace behavior is fixed in the upcoming 0.6 release. The change is in NSString.m, function RKStringByMatchingAndExpanding(), near the very end of the function. This is the bug fix in case anyone needs it, and I think it would be obvious which chunk of code this replaces (very similar):
In short, 'search and replace' operations which required no copies from a replacement string were incorrectly determined to be unedited, and therefore the original string could be returned instead. This fix also checks the range of the final string against the original search string to make sure catch these 'no copy instructions created, but still edited' conditions are handled.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi; can anyone spot what I'm doing wrong in the first of these match/replaces? Or, if the answer is right, why my expectation is wrong? I would expect the answer to be "mMmMm", but it is "MmMmMm". I threw in some additional tests; all except #3 give the result I expect. #3, as you can see is very similar to #1 but with "." for the replacement rather than "M".
r1 = [@"MmMmMm" stringByMatching:@"M" replace:1 withReferenceString:@""];
==> 1 MmMmMm
r1 = [@"MmMmMm" stringByMatching:@"m" replace:1 withReferenceString:@""];
==> 2 MMmMm
r1 = [@"MmMmMm" stringByMatching:@"." replace:1 withReferenceString:@""];
==> 3 MmMmMm
r1 = [@"MmMmMm" stringByMatching:@"M" replace:2 withReferenceString:@""];
==> 4 mmMm
r1 = [@"MmMmMm" stringByMatching:@"M" replace:1 withReferenceString:nil];
==> 5 (null)
r1 = [@"MmMmMm" stringByMatching:@"M" replace:1 withReferenceString:@"B"];
==> 6 BmMmMm
r1 = [@"MmMmMm" stringByMatching:@"M" replace:2 withReferenceString:@"B"];
==> 7 BmBmMm
Thanks
Doug Dickinson --dd
Are you sure your #1 example is correct? I would expect 'mMmMm', not 'MmMmMm' as you gave as the result.
I'll take a look and see what I can find. Of course, it's always the corner cases that seems to get you in the end. Off the top of my head, I don't think I even included any tests for zero length strings in the unit tests. :(
Here's a few lines I stuck in the collectionAdditions.m, testStringMatchAndReplace method
STAssertNoThrow(searchAndReplacedString = [@"MmMmMm" stringByMatching:@"M" replace:1 withReferenceString:@""], NULL);
STAssertTrue([searchAndReplacedString isEqualToString:@"mMmMm"], @"String: %@", searchAndReplacedString);
STAssertNoThrow(searchAndReplacedString = [@"MmMmMm" stringByMatching:@"M" replace:2 withReferenceString:@""], NULL);
STAssertTrue([searchAndReplacedString isEqualToString:@"mmMm"], @"String: %@", searchAndReplacedString);
the first fails with:
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/collectionAdditions.m:893: error: -[collectionAdditions testStringMatchAndReplace] : "[searchAndReplacedString isEqualToString:@"mMmMm"]" should be true. String: MmMmMm
the second succeeds.
I stepped through the execution and I can see vaguely what is happening in RKStringByMatchingAndExpanding: because the ref string is empty, no instructions are put in the referenceInstructionsBuffer, so nothing happens when it does RKApplyReferenceInstructions so the copyInstructionsBuffer is empty and it just returns the searchString. I think when the replace count is greater than 1, some copy instructions (for the matched but unreplaced bits) are put in the copyInstructionsBuffer, so the unmatched bits do get copied (and the matched bits get left behind!). I hesitate to blast a "fix" in because I'm likely to fix this problem and break 23 working things, also the correct fix is probably as much a matter of taste as technique. One did occur to me which was to have a sort of noop instruction and when the ref string is empty chuck the noop in the list of instructions. It might also be worthwhile to consider what should happen if the ref string is nil as opposed to empty - currently I think nil is returned which may or may not be consistent? It could equally return the unmatched bits with nothing inserted between them?
btw- when I run the unit tests, I get 369 failures, 365 look like:
error: -[multithreading testSimpleMultiThreading] : "number1 != nil" should be true.
error: -[multithreading testSimpleMultiThreading] : "number2 != nil" should be true.
they are in various locations: line 1215, 1223, 1236 and 1244
the remaining four are:
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:371: error: -[stringConversion testStringParseNamedConvertToNSNumberAllFormats] : "number1 != nil" should be true.
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:379: error: -[stringConversion testStringParseNamedConvertToNSNumberAllFormats] : "number2 != nil" should be true.
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:351: error: -[stringConversion testStringParseNumericConvertToNSNumberAllFormats] : "number1 != nil" should be true.
/Volumes/Home/EXAMPLES/RegexKit/Sourcecode/Source/unitTests/stringConversion.m:359: error: -[stringConversion testStringParseNumericConvertToNSNumberAllFormats] : "number2 != nil" should be true.
Are these failures expected? I'm running on ppc with 1 CPU if that matters.
Doug Dickinson --dd
I've yet to dig in to this problem, but I just wanted to comment on the unit tests failures..
No, it's not normal. While working on the 0.6 release the other day, I had the unit tests kick back an awful lot of failures when I modified the compiler optimization setting. I haven't yet tracked down the root cause of it, but changing the prototype of one the internal functions (RKSRegexFromStringOrRegex) seems to make the problem come and go. Your report makes me wonder if there isn't a problem in the latest released version as well.
Since it comes and go with compiler optimization changes and prototype changes, it's hard to say where the bug is, or when it started happening. If it's a compiler bug, it's also a lot hard to track down since what you think it is you're debugging might not be what's actually executing, and you need to get down in to the assembly code, which is never fun and takes forever to piece together what's happening.
This has all the signs of spending an enormous amount of time to find a simple, single character type-o kind of bug. :(
Well, with a little bit of juggling, I used the in-development unit test code for 0.6 that shows failures with the in-development 0.6 framework (with the header prototypes and optimization flags changes that produce errors) against a copy of the released 0.4 framework I had handy and it spit out:
Test Suite 'build/Debug/Unit Tests.octest(Tests)' finished at 2007-12-20 15:20:00 -0500.
Executed 72 tests, with 1 failure (1 unexpected) in 0.706 (2.208) seconds
The one failure has to do with the start of a unit test for a feature in 0.6, but not present in 0.4, so it's expected. I also had to comment out the NSData tests because those extensions aren't present in 0.4 either, but again, all failures accounted for.
So, that's good news. In the process of doing all this, I think I tracked the error down. It has to do with adding the 'pure' attribute to RKRegexFromStringOrRegex, and there is a case that I had obviously not considered or realized that is not pure, namely when there is some kind of error in the regex, or anything that causes an NSException to be thrown. It's hard to predict exactly how this interacts with everything because it's a hint to the optimizer, which by definition is making changes to the way code executes that are not explicitly expressed in the source code.
I still haven't looked at the original replacement issue as I'm knee deep in 0.6, but I will definitely look in to it before 0.6 is released. 0.6 will bring some API changes that make use of the standard error: NSError reporting method, all strings used be the framework prepped for internationalization (though none provided initially), and accelerated multithreaded matching of regular expressions in a collection (i.e, [@"Match test string" isMatchedByAnyRegexInSet:setOfRegexes], there's also array methods) which is taking up all of my time right now. It's pretty neat, it keeps track of the regexes in a collection and the number of times that regex was a match. It then keeps that list in sorted order, so the regexes with the highest hit count are tried first, and in parallel (number of threads == number of cores). This was inspired by the way Safari AdBlock uses RegexKit, which has to do bulk comparisons of a URL against a collection of regular expressions to determine if the URL should be blocked.
FYI,
The incorrect search and replace behavior is fixed in the upcoming 0.6 release. The change is in NSString.m, function RKStringByMatchingAndExpanding(), near the very end of the function. This is the bug fix in case anyone needs it, and I think it would be obvious which chunk of code this replaces (very similar):
if(expandOrReplace == YES) {
NSRange copySearchStringRange = NSMakeRange(searchIndex, (searchStringBuffer.length - searchIndex));
if((copyInstructionsBuffer.length == 0) && (NSEqualRanges(NSMakeRange(0, copyInstructionsBuffer.length), copySearchStringRange) == YES)) { return(searchString); } // There were no changes, so the replaced string == search string.
if(RKAppendCopyInstruction(©InstructionsBuffer, searchStringBuffer.characters, copySearchStringRange) == NO) { goto errorExit; }
}
In short, 'search and replace' operations which required no copies from a replacement string were incorrectly determined to be unedited, and therefore the original string could be returned instead. This fix also checks the range of the final string against the original search string to make sure catch these 'no copy instructions created, but still edited' conditions are handled.