Menu

#572 Crash in TOcPart::Close function

7
open
1
2024-03-29
2024-03-20
No

Hi,

Recently my team migrated our project from OWL 6.20 (from 2006) to OWL 7.0.12. After migration we encountered a crash when TOcPart::Close was being called on debug version.

Please, take a look at TOcPart::Close implementation,

bool
TOcPart::Close()
 {
   if (!BPartI)
     return true;

   if (BLPartI)
     BLPartI->OnRename(0, 0);

   if (HRSucceeded(BPartI->Close())) {
     // release our BOle helper object
     //
     BPart->Release();
     BPart   = 0;
     BPartI  = 0;
     BLPartI = 0;

     // So that it won't get created again in Draw
     //
     Flags |= Closing;
     return true;
   }
   return false;
 }

We debugged the crash and we found out that after BPartI->Close() call BPart points to 0xdddddd. This causes the crash when BPart->Release() is called.

We compared current implementation with OWL 6.20 implementation of TOcPart::Close method. We found out that in OWL 6.20 a copy of BPart is made before BPartI->Close(), and then Release() is called on a copy.

After we modified TOcPart::Close method the issue seems to be resolved. Here is an implementation that works for us:

bool
TOcPart::Close()
{
  if (!BPartI)
    return true;

  if (BLPartI)
    BLPartI->OnRename(0, 0);

  auto partCopy = BPart;
  if (HRSucceeded(BPartI->Close())) {
    // release our BOle helper object
    //
    partCopy->Release();
    BPart   = 0;
    BPartI  = 0;
    BLPartI = 0;

    // So that it won't get created again in Draw
    //
    Flags |= Closing;
    return true;
  }
  return false;
}

Discussion

  • Vidar Hasfjord

    Vidar Hasfjord - 2024-03-20

    Thanks for the thorough and well-submitted ticket! I briefly talked to Bing Copilot about this. See below. I will have a closer look tomorrow. In the meantime, I'll just point out that this particular function has been in this state since at least its inclusion in OWLNext in [r18] (tags\V616).


    Bing Copilot: If BPart is set to 0xDDDDDDDD, it’s the pointer itself that has been corrupted, not necessarily the object it points to. This value is indeed often used to mark deallocated memory in debugging environments.

    If BPart is a member of the current class and it’s part of deallocated memory, it does suggest that the current instance has been deallocated. This could potentially happen if BPartI->Close() is somehow causing the TOcPart that’s referencing it to be deallocated. This could indeed be a form of circular cleanup, where the cleanup of one object is triggering the cleanup of another.

    If this instance is deleted by the Close call, any subsequent use of this would indeed be incorrect and could lead to undefined behavior, as it would be writing into deallocated (and potentially reallocated) memory.

    Your suggestion to register this instance for cleanup and then end with the Close call is a more robust solution. Writing the return value (indicating success or not) into a proper location elsewhere would ensure that you’re not attempting to access deallocated memory.

    This approach would also make the code more resilient to changes and easier to understand, as the lifecycle and ownership of the objects would be clearer.

    This is a complex issue that requires a deep understanding of the code and the interactions between these objects. It’s crucial to identify the exact sequence of events leading to this situation and to ensure that objects are being cleaned up in the correct order and at the right time.

     

    Related

    Commit: [r18]


    Last edit: Vidar Hasfjord 2024-03-20
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-03-21

    Hi Marceli, welcome to the project!

    @pralkopodobny wrote:

    Recently my team migrated our project from OWL 6.20 (from 2006) to OWL 7.0.12.

    That's good to hear! It is nice to see interest for version 7 and migration to modern C++. If you want to share more about your project, feel free to include an entry on our Application Showcase page. Or share some background info about its development and a screenshot, and I will make an entry for you.

    Regarding the bugs you have encountered in OCF, note that there is very little activity on maintaining this extension library. In particular, work on Unicode and 64-bit support is incomplete and ongoing. Search for bug and feature tickets with "labels:OCF". In particular, see "64-bit OCFNext" [feature-requests:#218] and "Support for BOCOLE" [feature-requests:#184] and the bugs related to Unicode.

    I have now registered you as a developer so that you can take control of fixing the issues in OCF that you have encountered. Please work on the trunk (version 8), then we will merge the fixes into the release branches, if/when we go for a new release. I will try to review changes you make.

    See Contributing to the OWLNext Project for more about our conventions.

    after BPartI->Close() call BPart points to 0xdddddd. This causes the crash when BPart->Release() is called. [...] Here is an implementation that works for us [patched code].

    As I noted yesterday, if my hunch is correct, your patch is likely incorrect and causes undefined behaviour. The fact that the member TOcPart::BPart is corrupted with 0xdddddddd likely means that this TOcPart instance has been deallocated as a side-effect of calling IBPart::Close, and your use of any part of this instance after the call to Close hence causes undefined behaviour. While local variable partCopy is held on the stack and should be unaffected, achieving the call to Release in this manner is highly suspect.

    If you use this brittle workaround, you should remove any use of this instance after the call to Close, and add ample explanation in the code.

     

    Related

    Feature Requests: #184
    Feature Requests: #218
    Wiki: Contributing
    Wiki: OWLNext_Application_Showcase

  • Vidar Hasfjord

    Vidar Hasfjord - 2024-03-21
    • assigned_to: Marceli Sokólski
    • Group: unspecified --> 7
     
    • Marceli Sokólski

      Hi Vidar,
      I'm more then happy to help :)
      I'll try to set up my environment and commit changes.

       
      👍
      1
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-03-21

    @pralkopodobny wrote:

    I'm more then happy to help. I'll try to set up my environment and commit changes.

    Great! Let me know if you need any help getting started.

    I have had a deeper look at the OCF code, and circular cleanup and confused ownership seem to be the issue. Here is how it goes according to my findings:

    1. TOcPart calls virtual IBPart::Close, which is likely the TOcLinkView::Close override.
    2. TOcLinkView calls TOcServerHost::EvOcViewBreakLink which forwards the event OC_VIEWBREAKLINK to the TOcLinkedView instance itself.
    3. The event is likely handled by TOleView::EvOcViewBreakLink, which deletes the view passed (itself), which triggers the base destructor ~TOcPart (presumably the instance instantiating this call sequence in step 1).
    4. ~TOcPart calls TOcPart::Detach, which in turn detaches itself from the document by calling TOcPartCollection::Detach (passing default 0 for the 'del' parameter).
    5. TOcPartCollection::Detach removes the part from the container, but since 'del' is 0, it does not call Release.
    6. At this point the TOcPart instance has been deleted, but the interface it represents has not been released, nor has its member BPart been released.

    Note that, as I understand it, the interfaces pointed to by members BPartI and BLPartI are immediately released after initialisation, so need no further releasing, but they rely on the objects they are referring to being held alive by other pointers, so this technique is brittle.

    A quick fix may be to handle release properly in ~TOcPart. Detach can be given an optional parameter whether to release, and pass this along to TOcPartCollection::Detach. Then TOcPart can call Detach with 'del' set to true to release itself (assuming the parameter is changed to 'bool'). In addition, the destructor should explicitly release the interface referred to by BPart as well.

    This "quick" fix still needs careful review of the code to ensure there is no double deletion or release.

    For a better robust long-term solution, consider replacing pointers by smart-pointers across OCF, and do a full review and any necessary refactoring of the ownership handling.

    Also consider the role and implementation of TOcPart::Close. In this case (perhaps always?) it is effectively acting like a destructor.

     
  • Marceli Sokólski

    Hi,

    I managed to successfully download trunk branch using TortoiseSVN and build source code using OWLMaker.
    I tried to integrate it with our product, but unfortunately there are already too many changes between trunk and 7.0.12 (I've got more then 33000 error messages).
    Can you suggest what can I do in order to make changes to OWL codebase and then to test them?
    Should I use 7.0.12 branch?

     
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-03-21

    Yes, you can use 7.0.12 to develop your fix. But don't commit it to the release branch. When you have a fix that works for your use case in 7, then create a minimal test app to test your changes in 8 (trunk). Preferably, commit your test case as well under "examples".

    And please document any changes that are needed between trunk and 7, so that it is clear how the fixed code should be merged back into 7.

    If you need to work on this for a prolonged period, and you need to work with 7, consider creating a temporary feature-branch for it, if you prefer.

     
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-03-29

    Also see open ticket "Null-pointer access in EvOcViewClose" [bugs:#50], which also seems to be related to issues with clean-up and ownership management.

     

    Related

    Bugs: #50


Log in to post a comment.