From: David D. <dav...@rs...> - 2007-08-15 17:59:25
|
I'm sorry this post is so long, I'm looking for someone who may have experienced this, or knows enough about what I am rambling about to help me. I haven't been able to run our app built with MINGW for a week now, and It's starting to become a problem. This is a followup to a previous post entitled "Wrong DLL init routine being called?". I am starting a new post because it appears as if the stack traces I was getting before were wrong. I now have more information, and feel I am closer to solving this. When I try to load a test application I'm receiving the message: "The application failed to initialize properly (0xc0000005). Click on OK to terminate the application." I am not using auto-imports, so solutions referring to them have not helped me. I have always been winding up in malloc during the crashes I've been experiencing in calls to DLL initialization routines. I've been able to get a stack trace that lands up inside dllcrt1.c (the STATIC MINGW runtime) and shows me which call to malloc is segfaulting. The stack trace is as follows: #0 0x003e14a0 in malloc () #1 0x003e111c in DllMainCRTStartup@12 (hDll=0x3e0000, dwReason=1, lpReserved=0x22fd30) at dllcrt1.c:56 #2 0x7c9011a7 in ntdll!LdrSetAppCompatDllRedirectionCallback () from C:\WINDOWS\system32\ntdll.dll #3 0x003e0000 in ?? () #4 0x00000001 in ?? () #5 0x003e10c0 in __dll_exit () at dllcrt1.c:149 #6 0x00000000 in ?? () with frame 1 (inside dllcrt1.c) as: (gdb) f 1 #1 0x003e111c in DllMainCRTStartup@12 (hDll=0x3e0000, dwReason=1, lpReserved=0x22fd30) at dllcrt1.c:56 56 first_atexit = (p_atexit_fn*) malloc (32 * sizeof (p_atexit_fn)); Current language: auto; currently c (gdb) list 51 #endif 52 53 /* Initialize private atexit table for this dll. 54 32 is min size required by ANSI */ 55 56 first_atexit = (p_atexit_fn*) malloc (32 * sizeof (p_atexit_fn)); 57 if (first_atexit == NULL ) /* can't allocate memory */ 58 { 59 errno=ENOMEM; 60 return FALSE; Notice line 56. It seems innocent enough, how could a call to malloc be crashing? Well I called malloc from insidge gdb, and it worked fine, so I then dug further in and noticed that the malloc symbol that gdb is seeing is: (gdb) print malloc $1 = {<text variable, no debug info>} 0x40c780 <malloc> 0x40c780, okay, lets see a disassembly: Dump of assembler code for function malloc: 0x0040c780 <malloc+0>: jmp *0x41b1dc So there is a jump to 0x41b1dc, sure enough that lands us inside of: (gdb) disas 0x41b1dc Dump of assembler code for function _imp__malloc: 0x0041b1dc <_imp__malloc+0>: pop %es 0x0041b1dd <_imp__malloc+1>: les (bad),%eax 0x0041b1de <_imp__malloc+2>: ret $0x77 Ahh, import stub that leads us to: (gdb) disas _imp__malloc Dump of assembler code for function msvcrt!malloc: 0x77c2c407 <msvcrt!malloc+0>: mov %edi,%edi 0x77c2c409 <msvcrt!malloc+2>: push %ebp 0x77c2c40a <msvcrt!malloc+3>: mov %esp,%ebp 0x77c2c40c <msvcrt!malloc+5>: cmpl $0x0,0x77c62418 0x77c2c413 <msvcrt!malloc+12>: jne 0x77c2c420 <msvcrt!malloc+25> 0x77c2c415 <msvcrt!malloc+14>: call 0x77c1ef38 <msvcrt!__initenv+77> 0x77c2c41a <msvcrt!malloc+19>: test %eax,%eax 0x77c2c41c <msvcrt!malloc+21>: jne 0x77c2c420 <msvcrt!malloc+25> 0x77c2c41e <msvcrt!malloc+23>: pop %ebp 0x77c2c41f <msvcrt!malloc+24>: ret 0x77c2c420 <msvcrt!malloc+25>: pushl 0x77c61808 0x77c2c426 <msvcrt!malloc+31>: pushl 0x8(%ebp) 0x77c2c429 <msvcrt!malloc+34>: call 0x77c2c3d4 <msvcrt!free+441> 0x77c2c42e <msvcrt!malloc+39>: pop %ecx 0x77c2c42f <msvcrt!malloc+40>: pop %ecx 0x77c2c430 <msvcrt!malloc+41>: pop %ebp 0x77c2c431 <msvcrt!malloc+42>: ret 0x77c2c432 <msvcrt!malloc+43>: int3 0x77c2c433 <msvcrt!malloc+44>: int3 0x77c2c434 <msvcrt!malloc+45>: int3 0x77c2c435 <msvcrt!malloc+46>: int3 0x77c2c436 <msvcrt!malloc+47>: int3 End of assembler dump. Good, this is where I would expect to be, right at home inside of msvcrt. But now take a look at the first frame of our original stack trace. #0 0x003e14a0 in malloc () 0x3e14a0.. Umm, hmmm, wierd. Let's see where that leads us? (gdb) disas 0x3e14a0 Dump of assembler code for function malloc: 0x003e14a0 <malloc+0>: jmp *0x90c0609c I'd hate to say it, but there is no 0x90x0609c. That explains the segfault. Let's see what else is around 0x3e14a0... (gdb) disas 0x3e1490 Dump of assembler code for function _pei386_runtime_relocator: 0x003e1470 <_pei386_runtime_relocator+0>: mov $0x90c03034,%ecx 0x003e1475 <_pei386_runtime_relocator+5>: push %ebp 0x003e1476 <_pei386_runtime_relocator+6>: cmp $0x90c03034,%ecx 0x003e147c <_pei386_runtime_relocator+12>: mov %esp,%ebp 0x003e147e <_pei386_runtime_relocator+14>: jae 0x3e1498 <_pei386_runtime_relocator+40> 0x003e1480 <_pei386_runtime_relocator+16>: mov 0x4(%ecx),%edx 0x003e1483 <_pei386_runtime_relocator+19>: mov (%ecx),%eax 0x003e1485 <_pei386_runtime_relocator+21>: add $0x8,%ecx 0x003e1488 <_pei386_runtime_relocator+24>: add $0x90c00000,%edx 0x003e148e <_pei386_runtime_relocator+30>: add %eax,(%edx) 0x003e1490 <_pei386_runtime_relocator+32>: cmp $0x90c03034,%ecx 0x003e1496 <_pei386_runtime_relocator+38>: jb 0x3e1480 <_pei386_runtime_relocator+16> 0x003e1498 <_pei386_runtime_relocator+40>: pop %ebp 0x003e1499 <_pei386_runtime_relocator+41>: ret End of assembler dump. That looks like static libgcc to me. So is libgcc calling the wrong malloc for some reason? Why? Is this a relocation problem? Is there any way I can fix this? Is there anyone else I should be asking? I am using the dwarf2 GCC 4.2.1, but I had the same results with gcc 3.4.5 (though with less details on the crash) I was thinking that maybe shared libgcc would fix my problem, but I can't seem to link with it. When I specify -shared-libgcc it says I'm missing libgcc_s.a which I can't find anywhere. Details on my setup: $ gcc -v Using built-in specs. Target: mingw32 Configured with: ../gcc-4.2.1-2-src/configure --with-gcc --enable-libgomp --host=mingw32 --build=mingw32 --target=mingw32 --program-suffix=-dw2 --with-arch=i486 --with-tune=generic --disable-werror --prefix=/mingw --with-local-prefix=/mingw --enable-threads --disable-nls --enable-languages=c,c ++,fortran,objc,obj-c++,ada --disable-win32-registry --disable-sjlj-exceptions --enable-libstdcxx-debug --enable-cxx-flags=-fno-function-sections -fno-data-sections --enable-version-specific-runtime-libs --disable-bootstrap Thread model: win32 gcc version 4.2.1-dw2 (mingw32-2) $ld -v GNU ld version 2.17.50 20060824 MINGW runtime is a debug compiled version of mingw-runtime-3.13 |
From: Brian D. <br...@de...> - 2007-08-15 22:32:06
|
David Daeschler wrote: > Dump of assembler code for function malloc: > 0x0040c780 <malloc+0>: jmp *0x41b1dc > > So there is a jump to 0x41b1dc, sure enough that lands us inside of: Er no, that's not a jump to 0x41b1dc, that's a jump to what is pointed to by 0x41b1dc (the star means indirection.) If you try to disassemble 0x41b1dc you'll get garbage because that's data not code. This is a standard thunk/stub for when code calls a dllimported function without __declspec(dllimport) on the declaration. It should lead to the actual dllimported function, which it does: > (gdb) disas _imp__malloc > Dump of assembler code for function msvcrt!malloc: > 0x77c2c407 <msvcrt!malloc+0>: mov %edi,%edi > 0x77c2c409 <msvcrt!malloc+2>: push %ebp > 0x77c2c40a <msvcrt!malloc+3>: mov %esp,%ebp > Good, this is where I would expect to be, right at home inside of > msvcrt. But now take a look at the first frame of our original stack > trace. > > #0 0x003e14a0 in malloc () > > 0x3e14a0.. Umm, hmmm, wierd. Let's see where that leads us? > > (gdb) disas 0x3e14a0 > Dump of assembler code for function malloc: > 0x003e14a0 <malloc+0>: jmp *0x90c0609c > > I'd hate to say it, but there is no 0x90x0609c. That explains the > segfault. Again, this is not a jump to 0x90c0609c, it's an indirect jump. What does "p *0x90c0609c" say? > Let's see what else is around 0x3e14a0... > (gdb) disas 0x3e1490 > Dump of assembler code for function _pei386_runtime_relocator: > ... > That looks like static libgcc to me. So is libgcc calling the wrong No, this isn't libgcc. pei386_runtime_relocator performs the task of fixing up pseudo-relocs at program startup, and it is part of the MinGW startup code in crt2.o/dllcrt2.o, which is also where the DllMainCRTStartup/WinMainCRTStartup code lives. But you said you weren't using any auto-imports, so I can't see how this is relevant other than they are adjacent in memory. > malloc for some reason? Why? Is this a relocation problem? Is there > any way I can fix this? Is there anyone else I should be asking? Find out how you get from msvcrt!malloc() in frame 1 to the thunk at 0x003e14a0 in frame 0. Also find out where the thunk at 0x003e14a0 came from. > I am using the dwarf2 GCC 4.2.1, but I had the same results with gcc > 3.4.5 (though with less details on the crash) The difference is probably to to gcc 4.2 using dwarf-2 debug format by default. But you can get that with gcc 3.4.5 by using -gdwarf-2. (Note that debug format has nothing to do with method used for exception handling.) > I was thinking that maybe shared libgcc would fix my problem, but I That sounds like a red herring. Brian |
From: David D. <dav...@rs...> - 2007-08-16 12:19:21
|
Hi Brian, I want you to know that I really appreciate you helping me. I know that all the maintainers have a lot to do and it's probably hard to find a couple of minutes here and there for problems like mine. > Er no, that's not a jump to 0x41b1dc, that's a jump to what is pointed > to by 0x41b1dc (the star means indirection.) Thank you for pointing out what jmp *address means, I guess it should have been obvious. So let me try again, with that information in mind. I've started GDB and gotten the segfault. Here is the stack trace [I had to restart gdb, addresses may have changed] (gdb) bt #0 0x003e14a0 in malloc () #1 0x003e111c in DllMainCRTStartup@12 (hDll=0x3e0000, dwReason=1, lpReserved=0x22fd30) at dllcrt1.c:56 #2 0x7c9011a7 in ntdll!LdrSetAppCompatDllRedirectionCallback () from C:\WINDOWS\system32\ntdll.dll #3 0x003e0000 in ?? () #4 0x00000001 in ?? () #5 0x003e10c0 in __dll_exit () at dllcrt1.c:149 #6 0x00000000 in ?? () Disassembly of 0x003e14a0 (frame 0): (gdb) disas 0x3e14a0 Dump of assembler code for function malloc: 0x003e14a0 <malloc+0>: jmp *0x90c0609c There's that indirect jump.. > Again, this is not a jump to 0x90c0609c, it's an indirect jump. What > does "p *0x90c0609c" say? (gdb) print *0x90c0609c Cannot access memory at address 0x90c0609c That's what I thought would happen. > Find out how you get from msvcrt!malloc() in frame 1 to the thunk at > 0x003e14a0 in frame 0 Here is the disassembly, it looks like a direct call, it doesn't look like it even tries to call msvcrt!malloc in frame 1, it just calls that thunk at 0x3e14a0: (gdb) disas 0x003e111c Dump of assembler code for function DllMainCRTStartup@12: 0x003e10c0 <DllMainCRTStartup@12+0>: push %ebp 0x003e10c1 <DllMainCRTStartup@12+1>: mov %esp,%ebp 0x003e10c3 <DllMainCRTStartup@12+3>: sub $0x18,%esp 0x003e10c6 <DllMainCRTStartup@12+6>: mov %ebx,0xfffffff8(%ebp) 0x003e10c9 <DllMainCRTStartup@12+9>: mov 0xc(%ebp),%ebx 0x003e10cc <DllMainCRTStartup@12+12>: mov %esi,0xfffffffc(%ebp) 0x003e10cf <DllMainCRTStartup@12+15>: cmp $0x1,%ebx 0x003e10d2 <DllMainCRTStartup@12+18>: je 0x3e1110 <DllMainCRTStartup@12+80> 0x003e10d4 <DllMainCRTStartup@12+20>: mov 0x10(%ebp),%eax 0x003e10d7 <DllMainCRTStartup@12+23>: mov %ebx,0x4(%esp) 0x003e10db <DllMainCRTStartup@12+27>: mov %eax,0x8(%esp) 0x003e10df <DllMainCRTStartup@12+31>: mov 0x8(%ebp),%eax 0x003e10e2 <DllMainCRTStartup@12+34>: mov %eax,(%esp) 0x003e10e5 <DllMainCRTStartup@12+37>: call 0x3e1410 <DllMain@12> 0x003e10ea <DllMainCRTStartup@12+42>: sub $0xc,%esp 0x003e10ed <DllMainCRTStartup@12+45>: test %ebx,%ebx 0x003e10ef <DllMainCRTStartup@12+47>: mov %eax,%esi 0x003e10f1 <DllMainCRTStartup@12+49>: jne 0x3e1101 <DllMainCRTStartup@12+65> 0x003e10f3 <DllMainCRTStartup@12+51>: mov 0x90c04000,%eax 0x003e10f8 <DllMainCRTStartup@12+56>: test %eax,%eax 0x003e10fa <DllMainCRTStartup@12+58>: je 0x3e1160 <DllMainCRTStartup@12+160> 0x003e10fc <DllMainCRTStartup@12+60>: call 0x3e1060 <__dll_exit> 0x003e1101 <DllMainCRTStartup@12+65>: mov %esi,%eax 0x003e1103 <DllMainCRTStartup@12+67>: mov 0xfffffff8(%ebp),%ebx 0x003e1106 <DllMainCRTStartup@12+70>: mov 0xfffffffc(%ebp),%esi 0x003e1109 <DllMainCRTStartup@12+73>: mov %ebp,%esp 0x003e110b <DllMainCRTStartup@12+75>: pop %ebp 0x003e110c <DllMainCRTStartup@12+76>: ret $0xc 0x003e110f <DllMainCRTStartup@12+79>: nop 0x003e1110 <DllMainCRTStartup@12+80>: movl $0x80,(%esp) 0x003e1117 <DllMainCRTStartup@12+87>: call 0x3e14a0 <malloc> > Also find out where the thunk at 0x003e14a0 came > from. It looks like it's data in the exe somewhere. It doesn't appear to be inside of a DLL, because: [some paths snipped for brevity] (gdb) info sharedlibrary >From To Syms Read Shared Object Library 0x7c901000 0x7c97b6fe Yes C:\WINDOWS\system32\ntdll.dll 0x7c801000 0x7c882fb5 Yes C:\WINDOWS\system32\kernel32.dll 0x616c1000 0x6173320c Yes librsairpc.dll 0x6f701000 0x6f78a2e8 Yes libitsdk.dll 0x64601000 0x646108b4 Yes libwxextctrls.dll 0x6fbc1000 0x6fbc1590 Yes mingwm10.dll 0x77c11000 0x77c5cd16 Yes C:\WINDOWS\system32\msvcrt.dll 0x64e01000 0x64f2c7e4 Yes wxbase28_gcc_custom.dll 0x77dd1000 0x77e452d9 Yes C:\WINDOWS\system32\advapi32.dll 0x77e71000 0x77ef2beb Yes C:\WINDOWS\system32\rpcrt4.dll 0x774e1000 0x775ff4e2 Yes C:\WINDOWS\system32\ole32.dll 0x77f11000 0x77f5228e Yes C:\WINDOWS\system32\gdi32.dll 0x77d41000 0x77d9fda7 Yes C:\WINDOWS\system32\user32.dll 0x7c9c1000 0x7cbbcd68 Yes C:\WINDOWS\system32\shell32.dll 0x77f61000 0x77fccb28 Yes C:\WINDOWS\system32\shlwapi.dll 0x6e501000 0x6e781b68 Yes wxmsw28_core_gcc_custom.dll 0x773d1000 0x77461a3a Yes comctl32.dll 0x763b1000 0x763e0d9d Yes C:\WINDOWS\system32\comdlg32.dll 0x77121000 0x7719f0bd Yes C:\WINDOWS\system32\oleaut32.dll 0x66e01000 0x66e10ea4 Yes libitex.dll 0x638c1000 0x63935330 Yes librsaipd.dll 0x6cd01000 0x6cd1cb60 Yes libnetch.dll 0x71ab1000 0x71ac3133 Yes C:\WINDOWS\system32\ws2_32.dll 0x71aa1000 0x71aa4d2c Yes C:\WINDOWS\system32\ws2help.dll 0x61dc1000 0x61e2a760 Yes librsaxml2.dll 0x6e581000 0x6e733870 Yes libxerces-c2_7_0.dll 0x61881000 0x6197c3d4 Yes libitshared.dll 0x6e401000 0x6e421038 Yes wxbase28_odbc_gcc_custom.dll 0x74321000 0x743576db Yes C:\WINDOWS\system32\odbc32.dll >> Dump of assembler code for function _pei386_runtime_relocator: >> That looks like static libgcc to me. So is libgcc calling the wrong > I can't see how this is relevant other than they are adjacent in > memory. I thought that maybe whatever was around the "bad malloc" thunk may have been related to it somehow. Sorry if I am mistaken. > That sounds like a red herring. It probably is. Just for reference, here's where GDB sees malloc: (gdb) print malloc $5 = {<text variable, no debug info>} 0x44d290 <malloc> (gdb) disas 0x44d290 Dump of assembler code for function malloc: 0x0044d290 <malloc+0>: jmp *0x4fafd0 (gdb) disas *0x4fafd0 Dump of assembler code for function msvcrt!malloc: 0x77c2c407 <msvcrt!malloc+0>: mov %edi,%edi 0x77c2c409 <msvcrt!malloc+2>: push %ebp 0x77c2c40a <msvcrt!malloc+3>: mov %esp,%ebp 0x77c2c40c <msvcrt!malloc+5>: cmpl $0x0,0x77c62418 0x77c2c413 <msvcrt!malloc+12>: jne 0x77c2c420 <msvcrt!malloc+25> 0x77c2c415 <msvcrt!malloc+14>: call 0x77c1ef38 <msvcrt!__initenv+77> 0x77c2c41a <msvcrt!malloc+19>: test %eax,%eax 0x77c2c41c <msvcrt!malloc+21>: jne 0x77c2c420 <msvcrt!malloc+25> [... snip ...] Thank you for your time. Again, I appreciate it. - Dave |
From: Brian D. <br...@de...> - 2007-08-16 16:09:05
|
David Daeschler wrote: > (gdb) bt > #0 0x003e14a0 in malloc () > #1 0x003e111c in DllMainCRTStartup@12 (hDll=0x3e0000, dwReason=1, > lpReserved=0x22fd30) at dllcrt1.c:56 > #2 0x7c9011a7 in ntdll!LdrSetAppCompatDllRedirectionCallback () > from C:\WINDOWS\system32\ntdll.dll > #3 0x003e0000 in ?? () > #4 0x00000001 in ?? () > #5 0x003e10c0 in __dll_exit () at dllcrt1.c:149 > #6 0x00000000 in ?? () There are a couple of things that seem worrisome about this backtrace. One, why is dllcrt1.o (as opposed to dllcrt2.o) ever being used? That sounds wrong. How and why is that being pulled into the link? And what module's DllMain is this, i.e. what DLL is this? Two, why is a DLL trying to load at base address 0x3e0000? That seems wrong too, because the default base address for the main module is 0x400000, and the default base for DLLs is 0x1000000, and I would never expect anything to load *below* 0x400000 let alone a mere 20000 bytes below it. It almost seems like a DLL startup module got linked in as the entry point of the main module (exe). > Disassembly of 0x003e14a0 (frame 0): > > (gdb) disas 0x3e14a0 > Dump of assembler code for function malloc: > 0x003e14a0 <malloc+0>: jmp *0x90c0609c > > There's that indirect jump.. > > > Again, this is not a jump to 0x90c0609c, it's an indirect jump. What > > does "p *0x90c0609c" say? > > (gdb) print *0x90c0609c > Cannot access memory at address 0x90c0609c Right, so this 0x3e14a0 is probably uninitialized. > Here is the disassembly, it looks like a direct call, it doesn't look > like it even tries to call msvcrt!malloc in frame 1, it just calls that > thunk at 0x3e14a0: You're right, I misinterpreted what you said in the previous email. The DllMainCRTStartup is directly calling this malloc thunk at 0x3e14a0 which appears to be junk. I'm not really sure where to go from here other than trying to figure out why this code is calling the stub at 0x3e14a0 instead of the stub at 0x40c780. Some debugging tools that might help are -Wl,-verbose and Wl,--enable-extra-pe-debug and -Wl,-Map,filename. Brian |
From: David D. <dav...@rs...> - 2007-08-16 18:17:58
|
Hi Brian, I tried outputting the maps for each one of my DLLs but after going through all of them (there was a lot) it looks like all the mallocs are leading into msvcrt.dll. However, I began debugging again, and turned on showsnaps using gflags to have the windows loader output debug strings.. The output was rather interesting... [full directories clipped for brevity] (gdb) r Loaded symbols for system32\ntdll.dll Loaded symbols for system32\kernel32.dll Loaded symbols for librsairpc.dll Loaded symbols for libitsdk.dll Loaded symbols for libwxextctrls.dll Loaded symbols for mingwm10.dll Loaded symbols for system32\msvcrt.dll Loaded symbols for wxbase28_gcc_custom.dll Loaded symbols for system32\advapi32.dll Loaded symbols for system32\rpcrt4.dll Loaded symbols for system32\ole32.dll Loaded symbols for system32\gdi32.dll Loaded symbols for system32\user32.dll Loaded symbols for system32\shell32.dll Loaded symbols for system32\shlwapi.dll Loaded symbols for wxmsw28_core_gcc_custom.dll Loaded symbols for ...ows.Common-Controls_6595b\comctl32.dll Loaded symbols for system32\comdlg32.dll Loaded symbols for system32\oleaut32.dll Loaded symbols for libitex.dll Loaded symbols for librsaipd.dll Loaded symbols for libnetch.dll Loaded symbols for system32\ws2_32.dll Loaded symbols for system32\ws2help.dll Loaded symbols for librsaxml2.dll Loaded symbols for libxerces-c2_7_0.dll [d1c,988] LDR: Real INIT LIST for process pid 3356 0xd1c [d1c,988] system32\msvcrt.dll init routine 77C1F2A1 [d1c,988] mingwm10.dll init routine 003E10C0 <-------- 3e... [d1c,988] system32\RPCRT4.dll init routine 77E76284 [d1c,988] system32\ADVAPI32.DLL init routine 77DD70D4 [d1c,988] system32\USER32.dll init routine 77D4F538 [d1c,988] system32\GDI32.dll init routine 77F165BA [d1c,988] system32\OLE32.dll init routine 774FD0A1 [d1c,988] system32\SHLWAPI.dll init routine 77F651FB [d1c,988] system32\SHELL32.DLL init routine 7C9E7366 [d1c,988] wxbase28_gcc_custom.dll init routine 64E01060 [d1c,988] ...Common-Controls_65..\COMCTL32.DLL init routine 773D4246 [d1c,988] system32\COMDLG32.DLL init routine 763B1AB8 [d1c,988] system32\OLEAUT32.DLL init routine 77121558 [d1c,988] wxmsw28_core_gcc_custom.dll init routine 004A1060 [d1c,988] libwxextctrls.dll init routine 64601060 [d1c,988] libitex.dll init routine 66E01060 [d1c,988] libitsdk.dll init routine 6F701060 [d1c,988] system32\WS2HELP.dll init routine 71AA1642 [d1c,988] system32\WS2_32.DLL init routine 71AB1273 [d1c,988] libnetch.dll init routine 6CD01060 [d1c,988] libxerces-c2_7_0.dll init routine 6E581060 [d1c,988] librsaxml2.dll init routine 61DC1060 [d1c,988] librsaipd.dll init routine 638C1060 [d1c,988] librsairpc.dll init routine 616C1060 [d1c,988] LDR: msvcrt.dll loaded - Calling init routine at 77C1F2A1 LDR: LdrGetDllHandle, searching for kernel32.dll from LDR: LdrGetProcedureAddress by NAME - InitializeCriticalSectionAndSpinCount [d1c,988] LDR: mingwm10.dll loaded - Calling init routine at 003E10C0 <--- Program received signal SIGSEGV, Segmentation fault. 0x003e14a0 in malloc () (gdb) bt #0 0x003e14a0 in malloc () #1 0x003e111c in DllMainCRTStartup@12 (hDll=0x3e0000, dwReason=1, lpReserved=0x22fd30) at dllcrt1.c:56 #2 0x7c9011a7 in ntdll!LdrSetAppCompatDllRedirectionCallback () from C:\WINDOWS\system32\ntdll.dll #3 0x003e0000 in ?? () #4 0x00000001 in ?? () #5 0x003e10c0 in __dll_exit () at dllcrt1.c:149 #6 0x00000000 in ?? () Note that it appears as if we are inside of mingwm10.dll, right after the load of msvcrt. Also note that mingwm.dll was loaded at: 0x6fbc1000 0x6fbc1590 Yes mingwm10.dll Right where it's supposed to be according to: gcc mingwthrd.exp -o mingwm10.dll -B./ -mdll -Wl,--image-base,0x6FBC0000 -Wl,--entry,_DllMainCRTStartup@12 mthr.o mthr_init.o -Lmingwex But the loader is calling it's init routine at: warning: [d1c,988] LDR: mingwm10.dll loaded warning: - Calling init routine at 003E10C0 <--- Strange indeed? I don't understand why the loader thinks the init routine is at 003E10C0? mingwm10.dll even got loaded into it's preferred base? I am very confused. - Dave |
From: David D. <dav...@rs...> - 2007-08-22 14:55:09
|
Hi Brian, Since I've posted my last message last week I have been able to move the project over to MSVC 8 and I have been able to compile, link and run using visual studio express. As this is not an optimal solution, I was wondering if you have had a chance to read my last message that I sent on 8/16, and if you have any other ideas or suggestions. After I get this MSVC build out I will be available to do more debugging. Thank you for your time, - Dave |
From: Brian D. <br...@de...> - 2007-08-22 23:57:31
|
David Daeschler wrote: > As this is not an optimal solution, I was wondering if you have had a > chance to read my last message that I sent on 8/16, and if you have any > other ideas or suggestions. After I get this MSVC build out I will be > available to do more debugging. I wish there was some way to be able to reproduce this, but if I understand correctly this is a large and complex application and the problem only manifests in that context. >From the last email it seems that you've determined that the crash is in the DllMainCRTStartup() of mingwm10.dll, which is trying to load at 0x003e0000 despite having an ImageBase of 0x6FBC0000 when built. Is that right? To me that sounds like the DLL had to relocated by the loader because its preferred base address was already occupied. This very low location 0x003e0000 seems like quite a strange place for the NT loader to choose to relocate the DLL, but since this is just a very small stub/helper DLL I suppose it's conceivable. So the first thing I would do is see if its preferred base address is indeed mapped/reserved/populated by something else. You could also experiment with rebasing the DLL to a location you know doesn't conflict to confirm if this plays a part in the crash. (BTW Process Explorer is invaluable for this kind of thing.) But even if the DLL has to be relocated at startup, that is no reason for a crash. This happens all the time, and there are supposed to be appropriate relocs in the PE file to let the OS do this without a crash. Secondly, if the debug symbols are correct this mingwm10.dll is linked with dllcrt1.o instead of the normal -2 version. This seems strange -- maybe there's a good reason for it, I don't know. Check the mingw-runtime Makefile.in to see what the deal here is. Brian |
From: David D. <dav...@rs...> - 2007-08-23 11:58:47
|
Hi Brian, > I wish there was some way to be able to reproduce this, but if I > understand correctly this is a large and complex application and the > problem only manifests in that context. So far, I have been able to create a test executable that shows the problem, but only when linked with a few of the libraries that I'm working with. I am working on shrinking the test case down to something more manageable. > From the last email it seems that you've determined that the crash is > in the DllMainCRTStartup() of mingwm10.dll, which is trying to load at > 0x003e0000 despite having an ImageBase of 0x6FBC0000 when built. Is > that right? As far as I can tell, the mingwm10.dll IS loading at the correct base address, it appears to be 0x1000 off of it's preferred addess, but so do all the other DLLs. (as they all look like 0xXXXX1000). GDB tells me mingwm10.dll is using the range 0x6fbc1000 - 0x6fbc1590: 0x6fbc1000 0x6fbc1590 Yes mingwm10.dll According to gdb's "info sharedlibrary" command, there are no other DLLs loaded in mingwm10's base range, so it should have had no problem loading in it's preferred base. The init routine that the windows loader is calling at 0x003e0000 is no where near the address range of mingwm10.dll. In fact, I don't know where it's getting that init routine address from, nothing appears to be loading in that range. > You could also experiment with rebasing the DLL to a location you > know doesn't conflict to confirm if this plays a part in the crash. I will try this as soon as I can, and I will let you know of the result. > Secondly, if the debug symbols are correct this mingwm10.dll is linked > with dllcrt1.o instead of the normal -2 version. This seems strange >-- maybe there's a good reason for it, I don't know. Check the > mingw-runtime Makefile.in to see what the deal here is. Again, I will try this as soon as possible. Thank you for your help so far. Have a nice day. - Dave |
From: David D. <dav...@rs...> - 2007-08-23 12:11:47
|
Hello Again, Since it only took a minute to change the preferred base in mingwm10.dll, I went ahead and edited the generated makefile to change the preferred base of mingwm10.dll to 0x8fbc0000. I still got the same crash, and heres the really strange part: The init routine that the windows loader is trying to call did not change even though I changed the base address of the DLL! (gdb) info sharedlibrary >From To 0x8fbc1000 0x8fbc1590 Yes mingwm10.dll Yet: [c24,c28] LDR: mingwm10.dll loaded - Calling init routine at 003E10C0 The base address of the DLL doesn't appear to have an effect on the address where the loader thinks the init routine is. Does this mean that for some reason the loader thinks mingwm's init routine is somewhere inside the executable? Is that range even in the executable? The only thing I can say for sure is that it's not the range of any loaded DLLs, so it must be in the exe right? - Dave |
From: Brian D. <br...@de...> - 2007-08-24 00:43:34
|
David Daeschler wrote: > The base address of the DLL doesn't appear to have an effect on the > address where the loader thinks the init routine is. Does this mean > that for some reason the loader thinks mingwm's init routine is > somewhere inside the executable? Is that range even in the executable? > The only thing I can say for sure is that it's not the range of any > loaded DLLs, so it must be in the exe right? Well normally the exe loads at 0x400000 so I'm really confused what this 128k below that is. Like I said before, use Process Explorer and you can get a nice map of the entire memory layout and find out exactly what is mapped into that region. Brian |
From: David D. <dav...@rs...> - 2007-08-29 19:10:25
|
Hi Brian, I've released the MSVC build and can now continue debugging this problem. > From the last email it seems that you've determined that the crash is > in the DllMainCRTStartup() of mingwm10.dll, which is trying to load > at 0x003e0000 despite having an ImageBase of 0x6FBC0000 when built. > Is that right? > ... > To me that sounds like the DLL had to relocated by the loader because > its preferred base address was already occupied I have some output from process explorer, and it is rather interesting.. NAME BASE SIZE IMG BASE mingwm10.dll 0x3E0000 0x11000 0x8FBC0000 Test.exe 0x400000 0x49000 0x40000 wxmsw28_core_gcc_custom.dll 0x4A0000 0x8402000 0x6E500000 Note that the mingwm10.dll's base is different than the IMG BASE? I don't know what the means, because I can't understand what process explorer means by "Base". IMG BASE appears to be where the DLL should be loaded, and there is nothing else loaded anywhere near that range. > This very low location > 0x003e0000 seems like quite a strange place for the NT loader to > choose to relocate the DLL, I thought so too, until I started up visual studio express and saw: NAME BASE SIZE IMG BASE custsat.dll 0x330000 0xB000 0x400000 But the IMG BASE for that DLL seems odd. VS Express seems to run fine that way however. > Secondly, if the debug symbols are correct this mingwm10.dll is linked > with dllcrt1.o instead of the normal -2 version. This seems strange > -- > maybe there's a good reason for it, I don't know. Check the > mingw-runtime Makefile.in to see what the deal here is. It looks like dllcrt[1|2].o are both based on dllcrt1.c: # # Dependancies # [...] dllcrt1.o: dllcrt1.c dllcrt2.o: dllcrt1.c Indeed, they are the same file: $ md5sum.exe "dllcrt1.o" "dllcrt2.o" 6530958f254ca7009862ef1e11dbe50d *dllcrt1.o 6530958f254ca7009862ef1e11dbe50d *dllcrt2.o The difference appears to be only in the crtX.o's: # The special rules are necessary. crt1.o dllcrt1.o: $(CC) -c -D__CRTDLL__ -U__MSVCRT__ $(ALL_CFLAGS) $< -o $@ crt2.o dllcrt2.o: $(CC) -c -D__MSVCRT__ -U__CRTDLL__ $(ALL_CFLAGS) $< -o $@ Where: $ md5sum crt1.o crt2.o 14ac0fa0dc6e83b6850ced2805e3fe55 *crt1.o 59300d24544c6fd37f87659b3c406382 *crt2.o - Dave |
From: Brian D. <br...@de...> - 2007-08-29 19:22:40
|
David Daeschler wrote: > NAME BASE SIZE IMG BASE > > mingwm10.dll 0x3E0000 0x11000 0x8FBC0000 > Test.exe 0x400000 0x49000 0x40000 > wxmsw28_core_gcc_custom.dll 0x4A0000 0x8402000 0x6E500000 > > Note that the mingwm10.dll's base is different than the IMG BASE? I > don't know what the means, because I can't understand what process > explorer means by "Base". IMG BASE appears to be where the DLL should > be loaded, and there is nothing else loaded anywhere near that range. BASE is the address where it is actually resident, IMG BASE is the address it was given when it was linked (i.e. the value in the image [executable]). So, this means it had to be relocated at process startup by the OS. Usually this is because the desired location is already mapped. And this is what I meant by try rebasing it (i.e. relinking, or using the MS tool rebase.exe) so that it doesn't need to be relocated like this. And what is with this bogus value 8FBC0000 for the image base? That is impossible, all userspace DLLs must reside below the 2GB line (0x7fffffff) as everything above that is kernel-mode only. So there is no way it could ever load at that address -- no wonder it is relocated. > $ md5sum.exe "dllcrt1.o" "dllcrt2.o" > 6530958f254ca7009862ef1e11dbe50d *dllcrt1.o > 6530958f254ca7009862ef1e11dbe50d *dllcrt2.o Hmm, didn't know that. But I still want to know why mingwm10.dll is linked with dllcrt1.o and not dllcrt2.o, not that it apparently matters. Brian |
From: David D. <dav...@rs...> - 2007-08-29 19:36:43
|
Brian, First of all EUREKA! > And what is with this bogus value 8FBC0000 for the image base? That is > impossible, all userspace DLLs must reside below the 2GB line > (0x7fffffff) as everything above that is kernel-mode only. So there is > no way it could ever load at that address -- no wonder it is relocated. That was my fault, I didn't know that. It seems the mingwm10.dll doesn't like being relocated. Even with it's initial base of 0x6FBC0000 I had the 0xc5 error, but when I rebase it to somewhere it won't be relocated: NAME BASE SIZE IMG BASE mingwm10.dll 0x5FBC0000 0x11000 0x5FBC0000 THE PROGRAM RUNS! I'll try forcing a relocation of mingwm10.dll for a simple program and if the program won't start that will be my test case to you. Thank you for all your help. -Dave |
From: David D. <dav...@rs...> - 2007-08-29 19:53:51
|
I have a test case. (I know right now you're probably like "Oh great..") test.cpp: #include <windows.h> int main(int argc, char* argv[]) { Sleep(60000); } Compilation: $ make g++ -mthreads -Wall -g3 -MD -MFMINGW32_NT-5.1_i686/test.d -MTMINGW32_NT-5.1_i686/test.o -c -o MINGW32_NT-5.1_i686/test.o test.cpp g++ -mthreads -e_WinMainCRTStartup MINGW32_NT-5.1_i686/test.o -o MINGW32_NT-5.1_i686/test Results in: The application failed to initialize properly (0xc0000005). Click on OK to terminate the application. Memory usage from process explorer: mingwm10.dll 0x3D0000 0x11000 0x400000 mingwm10.dll msvcrt.dll 0x77C10000 0x58000 0x77C10000 msvcrt.dll kernel32.dll 0x7C800000 0xF5000 0x7C800000 kernel32.dll ntdll.dll 0x7C900000 0xB0000 0x7C900000 ntdll.dll Maybe it doesn't like being rebased below the exe? Note: If the #include <windows.h> and the Sleep(60000) isn't there, the crash doesn't happen. Not sure why. - Dave |
From: David D. <dav...@rs...> - 2007-08-29 19:57:06
|
Brian, Please note for the problem to occur, you have to rebase mingwm10.dll into the application's memory range 0x400000 Thank you, - Dave |
From: David D. <dav...@rs...> - 2007-08-31 20:38:16
|
Hi Again Brian, Once again, I'm back with more information about the Application Failed to Initialize problem. It looks like in certain cases, 2 reloc entries are being inserted for the jmp_msvcrt.dll!malloc entry of some of my DLLs. xerces_c happens to be one of the DLLs that is affected, here is a disassembly (using PE Explorer Disassembler): 6E763C60 jmp [msvcrt.dll!malloc] That is the jmp *ADDRESS that we've been seeing. Before any fixups are applied, it matches the address of the msvcrt import table entry in .idata: 6E9AB300 msvcrt.dll!malloc: 6E9AB300 70B74200 dd ?? However, when I load the DLL I get an access violation: Dump of assembler code for function malloc: 0x014d3c60 <malloc+0>: jmp *0x9448b300 Notice that the DLL has been relocated. The new base of the DLL is: BASE SIZE IMAGE BASE libxerces-c2_7_0.dll 0x12F0000 0x452000 0x6E580000 That is a difference of: 0x6E580000 - 0x12F0000 = 6D290000 If I Take the address of 6E9AB300 and perform a manual "fixup" on it, I get: 6E9AB300 - (FIXUP) 6D290000 = 171B300 Hmmm, but the jmp is to *0x9448b300. Lets do another "fixup": 171B300 - 6D290000 = 0x9448b300 hmmmmmm look familiar?! 0x014d3c60 <malloc+0>: jmp *0x9448b300 So then I looked at the Relocation entries in the DLL I built, and sure enough there are 2 entries for 6E763C62. Both in .text according to PE Explorer Disassembler. Xerces appears to use dllwrap in its build process. $ ld -v GNU ld version 2.17.50 20060824 $ gcc -v Reading specs from d:/MinGW/bin/../lib/gcc/mingw32/3.4.5/specs Configured with: ../gcc-3.4.5/configure --with-gcc --with-gnu-ld --with-gnu-as --host=mingw32 --target=mingw32 --prefix=/mingw --enable-threads --disable-nls --enable-languages=c,c ++,f77,ada,objc,java --disable-win32-registry --disable-shared --enable-sjlj-exceptions --enable-libgcj --disable-java-awt --without-x --enable-java-gc=boehm --disable-libgcj-debug --enable-interpreter --enable-hash-synchronization --enable-libstdcxx-debug Thread model: win32 gcc version 3.4.5 (mingw special) Thanks again, - Dave |
From: Brian D. <br...@de...> - 2007-08-31 20:51:12
|
David Daeschler wrote: > So then I looked at the Relocation entries in the DLL I built, and sure > enough there are 2 entries for 6E763C62. Both in .text according to PE > Explorer Disassembler. You mean there's two copies of an identical reloc to the same location? That definitely sounds like a linker bug. Try a CVS binutils just to rule out something that's already been fixed. > Xerces appears to use dllwrap in its build process. Yuck. Dllwrap is old and kind of crufty, the preferred way for creating a DLL is simply "gcc -shared" like on other platforms. If this double reloc thing can be traced to a bug in dllwrap I could certainly see why it would linger around without being noticed. Brian |
From: David D. <dav...@rs...> - 2007-08-31 21:15:31
|
Hi Brian > You mean there's two copies of an identical reloc to the same location? Yes, according to pedump and PE Explorer: Virtual Address: 001E3000 size: 00000110 [...snip...] 001E3C62 HIGHLOW [...snip...] Virtual Address: 001E3000 size: 00000110 [...snip...] 001E3C62 HIGHLOW Other addresses are repeated too, it looks like all the base relocation entries are repeated twice for this DLL. - Dave |