#1950 Parallell make hangs/freezes

MSYS
assigned
Cesar Strauss
Bug
none
Unknown
False
2014-08-20
2013-03-25
Dan Gudmundsson
No

make -jX hangs on 64b Windows 7 where X > 1

as reported several times on mingw-users mailing list.

A bug report (with backtraces) can be found here:
http://article.gmane.org/gmane.comp.gnu.mingw.user/41040

It happens for me as well, independent on project I'm building.
In my case mostly erlang projects. make -j1 works but it's so slow.

Discussion

  • Earnie Boyd
    Earnie Boyd
    2013-03-25

    WJFFM! How many CPU do you have?

     
  • Earnie Boyd
    Earnie Boyd
    2013-03-25

    • labels: msys make --> make
     
  • Intel I5 so four CPU's, but I don't think that matters.

    BTW the email posted above is not my backtraces, it was a detailed unanswered bug report on the mailing list.

    The same thing happens here one of the make processes get stuck on 100% cpu utilization, i.e. 25% total util of my quad core.

     
  • Earnie Boyd
    Earnie Boyd
    2013-03-25

    Based on the archived post you point to, the result is after updating to 1.0.18 of MSYS. If you downgrade to 1.0.17 does it work for you?

    exit all MSYS processes
    Change directory to your MinGW prefix/bin direcotory
    mingw-get upgrade msys-core-bin=1.0.17-1

     
  • Here is the backtraces from my system
    running make -j6

    Freeze/hangs on first the try

    I have another make process but I can't attach to that process,
    on the processes spends 25% cpu and the other is idle.

    Do not know which one I managed to attach to.

    (gdb) attach 5956
    Attaching to process 5956
    [New Thread 5956.0x464]
    [New Thread 5956.0x1854]
    [New Thread 5956.0x74]
    [New Thread 5956.0x1bc0]
    Reading symbols from c:\MinGW\msys\1.0\bin\make.exe...(no debugging symbols fou                                     nd)...done.
    (gdb) info threads
      Id   Target Id         Frame 
    * 4    Thread 5956.0x1bc0 0x76f5000d in ntdll!LdrFindResource_U ()
       from C:\Windows\SysWOW64\ntdll.dll
      3    Thread 5956.0x74  0x76f6013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation () from C:\Windows\SysWOW64\ntdll.dll
      2    Thread 5956.0x1854 0x60859130 in pause ()
       from c:\MinGW\msys\1.0\bin\msys-1.0.dll
      1    Thread 5956.0x464 0x76f5f8e5 in ntdll!RtlUpdateClonedSRWLock ()
       from C:\Windows\SysWOW64\ntdll.dll
    (gdb) bt
    #0  0x76f5000d in ntdll!LdrFindResource_U ()
       from C:\Windows\SysWOW64\ntdll.dll
    #1  0x76fdf896 in ntdll!RtlQueryTimeZoneInformation ()
       from C:\Windows\SysWOW64\ntdll.dll
    #2  0x6765274a in ?? ()
    #3  0x00000000 in ?? ()
    (gdb) thread 3
    [Switching to thread 3 (Thread 5956.0x74)]
    #0  0x76f6013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
       from C:\Windows\SysWOW64\ntdll.dll
    (gdb) bt
    #0  0x76f6013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
       from C:\Windows\SysWOW64\ntdll.dll
    #1  0x76f6013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
       from C:\Windows\SysWOW64\ntdll.dll
    #2  0x746915e9 in WaitForMultipleObjectsEx ()
       from C:\Windows\syswow64\KernelBase.dll
    #3  0x00000001 in ?? ()
    #4  0x00aefe78 in ?? ()
    #5  0x75801a2c in KERNEL32!GetVolumePathNamesForVolumeNameA ()
       from C:\Windows\syswow64\kernel32.dll
    #6  0x00aefe78 in ?? ()
    #7  0x75804220 in KERNEL32!CheckForReadOnlyResource ()
       from C:\Windows\syswow64\kernel32.dll
    #8  0x00000001 in ?? ()
    #9  0x7efde000 in ?? ()
    #10 0x608594f7 in pause () from c:\MinGW\msys\1.0\bin\msys-1.0.dll
    #11 0x60805465 in msys-1!_exit () from c:\MinGW\msys\1.0\bin\msys-1.0.dll
    #12 0x758033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
       from C:\Windows\syswow64\kernel32.dll
    #13 0x00aeffd4 in ?? ()
    #14 0x76f79ef2 in ntdll!RtlpNtSetValueKey ()
       from C:\Windows\SysWOW64\ntdll.dll
    #15 0x6089c8e0 in msys-1!__ctype_ptr ()
       from c:\MinGW\msys\1.0\bin\msys-1.0.dll
    #16 0x67052716 in ?? ()
    #17 0x00000000 in ?? ()
    (gdb) thread 2
    [Switching to thread 2 (Thread 5956.0x1854)]
    #0  0x60859130 in pause () from c:\MinGW\msys\1.0\bin\msys-1.0.dll
    (gdb) bt
    #0  0x60859130 in pause () from c:\MinGW\msys\1.0\bin\msys-1.0.dll
    #1  0x60805465 in msys-1!_exit () from c:\MinGW\msys\1.0\bin\msys-1.0.dll
    #2  0x758033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
       from C:\Windows\syswow64\kernel32.dll
    #3  0x008effd4 in ?? ()
    #4  0x76f79ef2 in ntdll!RtlpNtSetValueKey ()
       from C:\Windows\SysWOW64\ntdll.dll
    #5  0x6089c8e0 in msys-1!__ctype_ptr ()
       from c:\MinGW\msys\1.0\bin\msys-1.0.dll
    #6  0x67252716 in ?? ()
    #7  0x00000000 in ?? ()
    (gdb) thread 1
    [Switching to thread 1 (Thread 5956.0x464)]
    #0  0x76f5f8e5 in ntdll!RtlUpdateClonedSRWLock ()
       from C:\Windows\SysWOW64\ntdll.dll
    (gdb) bt
    #0  0x76f5f8e5 in ntdll!RtlUpdateClonedSRWLock ()
       from C:\Windows\SysWOW64\ntdll.dll
    #1  0x76f5f8e5 in ntdll!RtlUpdateClonedSRWLock ()
       from C:\Windows\SysWOW64\ntdll.dll
    #2  0x7468dd54 in ReadFile () from C:\Windows\syswow64\KernelBase.dll
    #3  0x000001e4 in ?? ()
    #4  0x00000000 in ?? ()
    
     
    Last edit: Dan Gudmundsson 2013-03-26
  • Ooops that didn't look great, sorry.

    Will try downgrade tomorrow..

     
  • Earnie Boyd
    Earnie Boyd
    2013-03-25

    I surrounded your output with the markdown of ~~~~~~ preceded by a blank line.

     
  • Downgrading with: mingw-get upgrade msys-core-bin=1.0.17-1
    works for me as well, or so it seems.

    for (( i=0; i < 20; i=i+1)) ; do make clean; make -j6; done

    Builds everything and do not cause a hanging make.

    Will use this one until next update.

    If you need testing, or other info, let me know.

     
    • Keith Marshall
      Keith Marshall
      2013-03-26

      So, it appears that MSYS-1.0.18 may have introduced a regression; assigning to Cesar, for follow up.

       
  • Keith Marshall
    Keith Marshall
    2013-03-26

    • status: unread --> assigned
    • assigned_to: Cesar Strauss
     
  • I've started experience problems with 'make' since very begining of using it long ago.

    1. msys-1.0.8
      Both -j and -j4 works well but sometimes fails at random places with something like '{command from recipe}.exe: command not found', then I just trying again. It's not very annoying. Also with '-j' it runs much more processes than number of cores processor have.

    2. msys-1.0.17
      With -j it launches a lot of processes (about 3 hundreds, I guess these are all files in makefile) and fails with:
      make: wait: No children. Stop.
      make:
      Waiting for unfinished jobs....
      make: *** wait: No children. Stop.
      Some files are compiled ok although.
      With -j4 it seems work ok.

    3. msys-1.0.18
      With -j same as msys-1.0.17
      With -j4 it hangs on 4th command with CPU load 25% (steady)

    My system: Windows 7 Pro 64-bit, Intel Core i5-2500 (4 cores), antivirus disabled and shut down, windows defender disabled.

     
    Last edit: Artem Pisarenko 2013-05-30
  • Thorsten Otto
    Thorsten Otto
    2014-02-28

    Looks like there was no solution to this yet?

    I had the same problem, and it was definitely caused by upgrading from msys-1.0.17 to msys-1.0.18. Since the difference between the 2 was not quite big i digged through the sources and for my system (Windows 7 64bit) was able to fix it by changing the setting of has_unreliable_pipes in source/winsup/cygwin/wincap.cc to true.

    The actual problem seems to come from _read() in source/winsup/cygwin/syscalls.cc. This functions checks the file descriptor only for available data when is_slow() returns true. In 1.0.17, for pipes this was answered as true for all WinNT based systems, but with the change to the wincap system now was only set for older OS.

    I guess this change was taken from cygwin, and there it works because the signal handling is done different.

     
    • Cesar Strauss
      Cesar Strauss
      2014-03-05

      Looks like there was no solution to this yet?

      I had the same problem, and it was definitely caused by upgrading from msys-1.0.17 to msys-1.0.18. Since the difference between the 2 was not quite big i digged through the sources and for my system (Windows 7 64bit) was able to fix it by changing the setting of has_unreliable_pipes in source/winsup/cygwin/wincap.cc to true.

      Thank you very much for tracking this down!

      The actual problem seems to come from _read() in source/winsup/cygwin/syscalls.cc. This functions checks the file descriptor only for available data when is_slow() returns true. In 1.0.17, for pipes this was answered as true for all WinNT based systems, but with the change to the wincap system now was only set for older OS.

      I guess this change was taken from cygwin,

      Indeed, the change was taken from cygwin.

      and there it works because the signal handling is done different.

      No, it was only a typo by the Cygwin folks. The following patch was applied by them soon afterwards:

      Fri Sep 14 00:18:52 2001  Christopher Faylor <cgf at cygnus.com>
      
              * fhandler.h (fhandler_pipe::is_slow): Return true only if pipes are
              reliable (i.e., not Win9x).
      
      ===================================================================
      RCS file: /cvs/src/src/winsup/cygwin/fhandler.h,v
      retrieving revision 1.77
      retrieving revision 1.78
      diff -u -r1.77 -r1.78
      --- src/winsup/cygwin/fhandler.h    2001/09/12 17:46:36 1.77
      +++ src/winsup/cygwin/fhandler.h    2001/09/14 04:22:05 1.78
      @@ -442,7 +442,7 @@
         /* This strange test is due to the fact that we can't rely on
            Windows shells to "do the right thing" with pipes.  Apparently
            the can keep one end of the pipe open when it shouldn't be. */
      -  BOOL is_slow () {return wincap.has_unreliable_pipes ();}
      +  BOOL is_slow () {return !wincap.has_unreliable_pipes ();}
         select_record *select_read (select_record *s);
         select_record *select_write (select_record *s);
         select_record *select_except (select_record *s);
      

      I'll apply it to MSYS as well and release 1.0.19 as soon as possible.

      Thanks again!

      Regards,
      Cesar

       
  • Cesar Strauss
    Cesar Strauss
    2014-03-16

    • labels: make --> make, new-in-msys-1.19