Menu

Bug in RegexKit 0.6?

Phil B
2008-02-03
2013-04-24
  • Phil B

    Phil B - 2008-02-03

    I downloaded RegexKit today and it's great!  However, I've run into some strange behavior using it on Leopard (10.5.1) in a GC'd application.  When I execute the following:

    newString = [curItem stringByMatching:@"(_(.))"
                                     replace:RKReplaceAll
                      withReferenceString:@"\\u$2"];

    Things appear to be working in the short term but at some point down the line (i.e. after a few hundred iterations of a loop using the resulting values obtained by calling this code a couple of dozen times and putting the results in an NSArray) I see some rather strange behavior unless I copy the resulting string:

    newString = [[curItem stringByMatching:@"(_(.))"
                                      replace:RKReplaceAll
                       withReferenceString:@"\\u$2"] copy];

    The problem I am having with the non-copy version is that at some point down the line, the value of the object appears either to get changed or incorrectly released/freed (the latter seems more likely as the value appears to be garbage and have no relation to the original, valid value.  But that's not supposed to happen in a GC'd app, right?)  I haven't had this issue with any of my other GC'd code and was wondering if this is a known issue/limitation of RegexKit. (i.e. is the result returned not really 'mine' but rather something I need to make a copy of?)

     
    • John Engelhart

      John Engelhart - 2008-02-03

      My first impression from the problem description is that this is a GC related bug, and probably in RegexKit.

      I'll see if I can't replicate the problem, but finding these bugs is extremely difficult.  I thought I had the GC problem nailed as it manages to pass the unit tests (with a specially modified /Developer/Tools/otest which is GC capable).  It tends to be one of two results: It works, or it crashes at some random place during the multithreading tests.  The multithreading tests push things so hard that it either works or it doesn't.

      But this leads me to a gripe: The Mac OS X 10.5 garbage collection system.  This isn't a swipe against the 10.5 GC implementers, as I'm sure they had a nearly impossible job, but I think the 10.5 GC system is fundamentally and fatally flawed.  On top of that, there is essentially no documentation on the GC system other than simple 'fluff' pieces, and certainly not the type of documentation you need to really understand what's going on behind the scenes so you can make intelligent decisions about what to do in something like a framework.

      The 10.5 GC system is based on "compiler assisted write barriers", and from what I can tell (from my own reverse engineering as there is no documentation on this), "write barrier" does not mean the type of barrier that comes to mind for most people when the term is used (i.e., an atomic operation such that all CPU's after the barrier point in time are all guaranteed to 'see' the same value in memory).  As near as I can tell, the 10.5 GC write barrier is translated in to a function call (one of objc_assign_strongCast, objc_assign_global, or objc_assign_ivar).  These functions, when GC is enabled, explicitly update the data structures used by the GC with the information and then perform the update in memory.  This is counter to GC systems such as the Boehm GC system which does not require any notifications of changes or updates to memory, it scans memory and pieces together the 'live' memory from the scan.  This allows programs to run unmodified and leaves the work of determining what objets are still live up to the GC system.  The 10.5 system requires notification of any changes to memory, which the compiler hides by intercepting all stores to __strong tagged pointers and wrapping them in objc_assign_* function calls.

      The problem with this method is.... if you miss tagging a pointer with __strong, things will most likely work because there's a good chance some function on the stack did manage to tag the update as __strong, and 'hides' the error.  That and the read to write of pointers in an average program is heavily skewed towards reading, it's the writing that can get you in trouble.  Add to this the HUGE amount of changes they made to the GCC compiler in an effort to 'automagically' detect pointers that should automatically be promoted to __strong (id, and id like pointers, such as NSArray *), and the fact that errors and bugs in this automatic promotion are easily hidden by the mismatch of read to write ratio, and that there's a window where 'things will continue to work until the collector collects' in which an uncaught write might be correctly caught in a different piece of code...  well, you can see how challenging this is to get right.

      So, in a nutshell, the 10.5 GC system trades in "forgetting to free()/release" at a specific, deterministic location with "at some point, possibly inside the compiler and not even visible to you, someone forget to reference a pointer as __strong, and at some random later point in time when the collector decides it needs some memory, it recycles the memory at that address and causes a crash."  Add to all of this the fact that the GC system is, for all practical purposes, untested.  Except that Xcode uses it... and it manages to litter my logs with confidence inspiring messages like:

      Feb  2 16:29:08 johnes-powerbook Xcode[61614]: Xcode(61614,0xf0103000) malloc: free_garbage: garbage ptr = 0x3a77980, has non-zero refcount = 1

      So, like I said, I'll try to replicate the problem, which is probably due to some kind of missing __strong tag, or possibly even some interaction between how Foundation and CoreFoundation treat garbage collection, or.... but so far I've found most of these bugs by luck, not through any kind of tried and true programming techniques.

       
      • Phil B

        Phil B - 2008-02-03

        Funny you should mention the lack of docs on GC... I just filed a bug about that last week as I'm having problems with memory leaks and the only thing I can find on memory leaks and GC is 'they don't happen' lol.  So I share your pain on the lack of information re: GC.  I'll cut Apple a little slack from the standpoint that it is a fine 1.0 effort... hopefully 1.1 will be here soon along with more docs.

        Your description of the suspected issue sounds like it's right on the mark.  I was banging my head trying to figure out if this problem was my code as it would never fail in exactly the same place.  If it will help, I'd be happy to see if I can put together a project which can replicate the issue.  The logic in my app is pretty simple: convert a bunch of field names from underscore-separated to camel case and then inserts several thousand records using the camel case names via core data.  It fails at a random point during the insert when one of the field names goes haywire.  When I realized this was the case, I added the copy and then everything worked perfectly.

         
    • John Engelhart

      John Engelhart - 2008-02-04

      Phil,

      I'm pretty sure I know that the problem is now.  And from your description, it sounds as though your program keeps running, but the contents of the string become corrupt.  This points to the buffer that holds the string getting reclaimed, but the object itself remains live.  Sound about right?

      I suspect the problem will be 'fixed' if you do the following.  I'd check something in, but my copy of NSString.m has already drifted pretty far off from the repository as I've added the various error: methods and functionality.

      In the file NSString.m, in the function RKStringFromCopyInstructions, which should be at line 729 if you've got the 0.6.0 release. The 'fix' is to change the allocation to not use GC, but to use standard malloc.

      Line 733, change RKMallocNotScanned to RKMallocNoGC
      Line 738, change 'RK_EXPECTED(RKRegexGarbageCollect == 1, 0) ? kCFAllocatorNull : kCFAllocatorMalloc' to just 'kCFAllocatorMalloc'
      Line 740, change 'freeWhenDone:RK_EXPECTED(RKRegexGarbageCollect == 1, 0) ? NO : YES' to just 'freeWhenDone: YES'

      Really only line 738 of 738/740 is used by default, line 740 is just to be consistent.

      This will change the allocation that contains the contents of the string from a GC based NSAllocateCollectable() to a malloc() based allocation, and let CFString know that it needs to manually free() the allocation when it's done.

      Although it seems perfect reasonable to obtain an allocation from the collector and create a string that points to it, and expect that the Garbage Collection system will scan the heap and see that the pointer is still in use and not reclaim it..... this is in fact nothing like how 10.5's GC actually works, I've come to suspect.  This is because (again, guessing, as core foundation for 10.5 has yet to be released on apples 10.5 open source server) the function CFStringCreateWithCStringNoCopy does not use objc_assign_* to store the pointer to where-ever it happens to keep that pointer.  Since CF never calls objc_assign_*, the GC system figures that it has gone out of scope and can be reclaimed...

      My suspicion is that the 10.5 GC system does no scanning of the heap to determine pointer liveness, but instead relies entirely on the compiler (correctly!) wrapping __strong qualified pointers with the appropriate GC notification function calls.

       
      • Phil B

        Phil B - 2008-02-05

        You understand my problem exactly and I will attempt the fix you describe and report back.  It will probably be later this week before I can as I need to finish up some Core Data work first (the copy workaround has worked fine as a short-term band-aid)

        One thing I'm still fuzzy on:  I can understand that the code in RegexKit didn't do things the way the GC system needed but why wouldn't my taking the returned object and adding it to a rooted array result in a strong reference to the object and prevent it from being GC'd?

        P.S. interesting discussion over on cocoa-dev re: your GC experiences.  Hopefully the folks at Apple will at least take away that there is a need for better documentation on what's going on re: GC.

         
        • John Engelhart

          John Engelhart - 2008-02-06

          The copy bandaid seems to fit the the cause of the problem: it's replicating the 'non-strong' references with strong copies, side stepping the issue.

          > One thing I'm still fuzzy on: I can understand that the code in RegexKit didn't do things the way the GC system needed

          Didn't it?  You're probably in a better position to answer it more objectively than I, but the reasoning seems valid: The backing store is allocated with NSAllocateCollectable, and that pointer is given to CFStringCreateWithCStringNoCopy.  Under any definition of a GC system I'm aware of, that backing store would be considered live because there's a pointer from the CFString object to it.

          > but why wouldn't my taking the returned object and adding it to a rooted array result in a strong reference to the object and prevent it from being GC'd?

          You got me.  There's actually a subtle distinction to made:  The string object is still live, because if it wasn't, chances are calling any of its methods would result in a crash because it's ->isa pointer would be random.... it's the pointer inside the strings object to the buffer that contains the strings text that is no longer live.

          > P.S. interesting discussion over on cocoa-dev re: your GC experiences. Hopefully the folks at Apple will at least take away that there is a need for better documentation on what's going on re: GC.

          As an outside person, if you look at the code, and the GC documentation, and think that what I have done is perfectly reasonable and should not be resulting in the GC system declaring the allocation that contains the text of the string to be considered dead and reclaimed.... then I think you have an example in the wild where "the GC pointer rules" are non-obvious and trivial to get wrong.

          This is why I think that 10.5's GC system is fundamentally and fatally flawed.  The design choices behind Leopards GC system do not dampen allocation errors, they have the paradoxical effect of AMPLIFYING allocation errors.  The consequences of these types of errors are trying to track down why, at some later, random point in time, what was working suddenly just seems to stop working for no reason.  Programmers make mistakes, no matter how diligent they are, but the real world end result of Leopards GC system is to essentially guarantee unrepeatable and completely random crashes.

          The 'bugs' are essentially race conditions.  They will not manifest themselves under light load, but as you push things harder and harder, you increase the chances of just the right set of circumstances occurring to trigger a crash.  This is insidious because the application will appear to work just fine in development, but once released in to the real world, it will be unstable.

          This particular bug is a perfect example of this.  Even though RegexKits unit tests pass without error or crashing, which includes heavy, multi-threaded concurrent stress testing, odd GC related bug pop up once it's out in the real world.

           
          • Phil B

            Phil B - 2008-02-06

            > Didn't it? You're probably in a better position to answer it more objectively than I...

            Notice that I said 'the way the GC system needed' not 'the way the GC system should require' or even 'the way the documentation specifies' ;-)  I don't have enough first-hand experience with GC algorithms to take a strong position one way or the other re: the current approach but it is clear that Apple is doing some hand waving in saying that 'it just works' when, at the very least, their tools and documentation need to better support and clarify the approach they are taking.  I've worked with other GC environments that 'just work' automatically and Obj-C 2.0 isn't one of them in several instances I've seen so far.  I think the problem is compounded in your work since you are dealing with allocations at the C and CF layer which requires a very clear understanding of the 'contract' with the GC.  To a lesser degree, I've got the same issue: Apple, spell out the rules of the road for GC. (which was why I filed the bug report with Apple)

            > The string object is still live...

            An excellent point.  The string object is very much alive and it is only the string contents that are being collected.

            >  ...and the GC documentation... then I think you have an example in the wild where "the GC pointer rules" are non-obvious and trivial to get wrong.

            I think that's a largest part of the problem (with tool support coming in a close second.)  I can relate as I have been slamming my head against Core Data for quite a while now.  (funnily enough: I raised a number of issues re: CD back in 2005 and got the same hand waving from Apple... out comes Leopard and the most significant of these issues have been largely addressed)  A large part of my problems are the result of Apple, yet again, doing some hand waving when what is really needed are better docs and tools.  A little less arrogance from Apple would be nice but I'm not going to hold my breath for that.

            > This is why I think that 10.5's GC system is fundamentally and fatally flawed...

            You make some great points here and the issue I ran into was a perfect example of how this problem is manifested (back to your point about the string object being live: had that not been the case, I might not have thought to copy the object and still be wondering what was going on.)  At the same time, I'm not sure I'd say it's a fundamental or fatal flaw *assuming* the trade-off here was to get minimize GC overhead.  In that case, some burden is being put on developers (esp. those who are not just dealing with things at the Cocoa level) to play by the rules and the fundamental problem I see is that they haven't clearly spelled out all of the rules and edge cases yet.

             

Log in to post a comment.