Migrate from GitHub to SourceForge with this tool. Check out all of SourceForge's recent improvements.
Close

#1637 Potential lockup in DirScan_CompareItems()

closed-fixed
nobody
None
5
2008-02-11
2008-01-06
No

[CODE]
// Some compare methdods can be faster than collecting,
// so we can reach the end of list while collect is running.
// In this case we must wait for a while for new items to be
// added to the list.
if (pos == NULL && pCtxt->m_bCollectReady == FALSE)
{
do
{
pos = prevPos;
Sleep(200);
EnterCriticalSection(&pCtxt->m_criticalSect);
list->GetNextDiffPosition(pos);
LeaveCriticalSection(&pCtxt->m_criticalSect);
} while (pos == NULL);
}
[/CODE]

Compare thread might enter the loop after processing last item, but before collect thread sets pCtxt->m_bCollectReady to TRUE.

While the fix is obvious, I'd actually prefer to get rid of the busy-waiting thread by synchronizing threads through a counting semaphore.

Discussion

  • Kimmo Varis

    Kimmo Varis - 2008-01-06

    Logged In: YES
    user_id=631874
    Originator: NO

    I added that busy waiting to fix the lockups happening with filesize/date compare methods. Cleaning it up is a good idea.

    While this may not be issue for WinMerge, semaphores are more expensive than simple interlocked increments/decrements, and which can be used for same effect inside one process. And I don't expect WinMerge to become multiprocess in near future...

    See e.g.
    http://msdn2.microsoft.com/en-us/library/ms684122\(VS.85).aspx

    It is interesting that MSDN says InterLockedIncrement/InterLockedDecrement are only in W2K and later. But indeed they were in W95 and NT 3.51 already. Propably because MS doesn't support those old OS versions it also rewrites documentation to ignore them.. :( Funny.

    See this blog item mentioning support in those old OS versions:
    http://blogs.msdn.com/oldnewthing/archive/2004/05/06/127141.aspx

     
  • Takashi Sawanaka

    Logged In: YES
    user_id=954028
    Originator: NO

    Today I encountered this problem on winmerge debug build.

    It is easy to reproduce this bug if inserting following Sleep function.

    diff--- C:/Users/taka/AppData/Local/Temp/DiffThread.cpp-revBASE.svn000.tmp.cpp Mon Jan 14 19:27:58 2008
    +++ D:/dev/WinMerge/winmerge.org/trunk/Src/DiffThread.cpp Mon Jan 14 19:23:56 2008
    @@ -253,6 +253,7 @@

        // Build results list \(except delaying file comparisons until below\)
        DirScan\_GetItems\(paths, subdir, subdir, myStruct->pItemList, casesensitive, depth,  myStruct->context\);
    

    + Sleep(4000);

    #ifdef _DEBUG
    _CrtMemCheckpoint(&memStateAfter);

     
  • Kimmo Varis

    Kimmo Varis - 2008-02-07

    Logged In: YES
    user_id=631874
    Originator: NO

    Takashi, do you possibly have a multicore processor?

    I don't deny there is a problem in this code's synchronization. I was going to revisit it later, but I simply don't have a time to start digging into it. In multicore processor these two threads really run in the same time. So what this code needs is careful synchronization, from point of view things *DO* happen in the same time in both threads. And that comparing IS sometimes a lot faster than reading filesystem items. Date/size compares are basically comparing numbers, while another thread might be reading folder info from slow media.

    So basically it is a producer - consumer problem, explained e.g. in Wikipedia:
    http://en.wikipedia.org/wiki/Producer-consumer_problem

    And of course there is the third thread that is updating the compare progress GUI.

     
  • Jochen Tucht

    Jochen Tucht - 2008-02-08

    Logged In: YES
    user_id=766060
    Originator: YES

    > So basically it is a producer - consumer problem ...

    The producer - consumer problem (i.e. the deadlock issue described in the wiki article) applies to a producer and consumer sharing a fixed-size buffer. Our design differs in that the producer grows the buffer as needed.

     
  • Kimmo Varis

    Kimmo Varis - 2008-02-08

    Logged In: YES
    user_id=631874
    Originator: NO

    I really should write this into wiki for later reference. But I'll start scratching this description here first:

    1) We have two threads. First (collector) thread's task is to add items it founds in given folder (and optionally in all subfolders) to the shared list of items. This thread exits naturally when all the items are read. Or exceptionally when user selects to stop the compare from GUI.

    Second (compare) thread's task is to poll and read items from shared list of items, and do a compare for those items. Items are read from the list one by one. Determining end-condition for this thread may be tricky as there is no natural way for the thread itself to know when another (collector) thread has added all the items it has found. Exceptionally this thread ends when user selects to stop the compare from GUI.

    Problem of ending compare thread: as collector thread and compare thread run simultaneously, we know amount of total items found when collector thread ends its task, not any sooner. Unlike in previous implementation we don't know amount of items to compare when comparing. New items are added to the shared list all the way the compare progresses. There are subtle speed differences between the threads and either of the thread can be faster (by adding items to the list, or reading items from the list), and this can wary even during the compare of the items. Collector thread must be able to signal its ready status for the compare thread.

    The compare thread needs extra logic for its exit condidion. First the compare thread must get a ready signal from the collector thread. This signal means that collector thread is not adding new items to the shared list anymore. After that compare thread must compare all remaining items in the shared list. Adding items to the list and reading them must be atomic and synchronized operations. Otherwise we may end up reading partially filled items.

    During the compare compare thread polls for new items from the shared list. Compare thread must assume there might not be new items for a long time, indeed there might not be new items at all (in case of stopped compare, and we may later add features like pausing).

    Another very important aspect is (already shortly mentioned) the speed difference and varying of the threads. We never can assume anything about their relative speed. We can only assume they run in the speed the other constraints allow them to. We must never assume there is always items for compare thread to compare. Indeed, with date/size based compare methods there usually aren't ready items to compare in the list. Compare thread must be able to wait for more results, and in case of stopped compare, break this waiting. Compare thread cannot do infinite waits in any phase, as it may make it unstoppable if new items aren't added anymore.

    Can we agree about this basic way to do the compare?

     
  • Kimmo Varis

    Kimmo Varis - 2008-02-08

    Logged In: YES
    user_id=631874
    Originator: NO

    And please please, remember we can be reading files from the remote server over remote desktop with very slow speed, and connection can break. Or user can remove the cd-rom while the compare is running. This code must be able to handle all kinds of very weird situations. Assumptions about speed, ordering or completeness are dangerous.

     
  • Takashi Sawanaka

    Logged In: YES
    user_id=954028
    Originator: NO

    > Takashi, do you possibly have a multicore processor?

    Yes, I have core 2 duo CPU.

    I posted a patch for this bug. Please see http://winmerge.org/patch/1889907

     
  • Takashi Sawanaka

    Logged In: YES
    user_id=954028
    Originator: NO

    >I posted a patch for this bug. Please see
    >http://winmerge.org/patch/1889907

    Ouch! I didn't notice that Jochen has already submitted a patch for the bug. (PATCH: [ 1885119 ] DirScan related cleanups and fixes)

    I think that Jochen's patch is more elegant and faster than my patch.

     
  • Jochen Tucht

    Jochen Tucht - 2008-02-09

    Logged In: YES
    user_id=766060
    Originator: YES

    > Ouch!

    There is continued discussion on that patch of mine. Moving from busy-waiting approach to semaphore-based approach is a significant change, and as always, there is risk of regressions. Given that we are going to branch for 2.8 release soon, and semaphore-based approach may need some more discussion, I think your patch is very welcome to get that bug fixed for 2.8.

    Keep up the good work!

     
  • Kimmo Varis

    Kimmo Varis - 2008-02-09

    Logged In: YES
    user_id=631874
    Originator: NO

    Yes, I think what we want for 2.8.x is Takashi's patch as a bugfix. It is a safe fix without clear disadvantages.

    Jochen's patch is a improvement of the whole threading model. These kind of changes just can't go into stable release, they need few months of real life testing in experimental releases. We've been burned by these kind of changes way too many times in the past. And this time all we really want to fix for 2.8.x is the lockup, not threading.

     
  • Takashi Sawanaka

    Logged In: YES
    user_id=954028
    Originator: NO

    Oh I see. I will commit my patch to SVN.

     
  • Jochen Tucht

    Jochen Tucht - 2008-02-11
    • status: open --> closed-fixed
     

Log in to post a comment.