From: Matthew F. <mat...@gm...> - 2015-11-18 15:49:02
|
On Wed, Nov 18, 2015 at 9:02 AM, Bernard Berthomieu <be...@la...> wrote: > Dear Matthew, > > On 11/18/2015 04:21 AM, Matthew Fluet wrote: >> >> On Tue, Nov 17, 2015 at 9:58 AM, Bernard Berthomieu <be...@la...> >> wrote: >>> >>> .... >>> [GC: Shrinking stack of size 735 bytes to size 0 bytes, using >>> 18,446,744,073,709,551,615 bytes.] >>> >>> The last line seems suspicious to me ... is it ? >> >> Yes, that last line is clearly bogus. Seems like the pointer to the >> top of the ML stack got corrupted, which broke the computation of the >> bytes of stack used. > > > I guess so. > >> Do you have gc#10 messages available? >> >> Have you tried compiling your program with "-debug true"? That might >> trip an assert. > > I just did it. The log files are attached, for the application compiled > with -debug true and gc-messages, in 32-bit and in 64-bit, and under > strace or not. > > In 64-bit, the segfault occurs a bit later. > An assertion violation shows up (splitHeader). Nothing jumps out to me in the logs. Obviously, the failed assertion is indicative of something going wrong --- the copying collection followed a pointer to what it assumes is an ML object, but didn't find a well-formed header. Usually, at this point, I end up adding more prints and asserts to the runtime system in order to trace the error back. >> Can you reproduce with source code you can share? > > The application is part of a toolbox written as ~50k lines of SML ... > I could certainly strip the code to that needed for that app only, but > it would still be several thousand lines. I will do that as a last resort > :-) . Well, I don't mind working with the larger application. > As I mentionned in my previous post, the application makes use of ffi, > and allocates some storage in C using mmap. At first, I was suspecting > a collision between the storage allocated in C and that allocated by mlton. > But how this could happen ? The memory obtained by MLton for the ML heap should be disjoint from any other memory allocated by mmap. Unfortunately, there is a rare possibility that MLton might "steal" other mmap-allocated memory. For 32-bit systems (and, the code is left there for 64-bit platforms), MLton suggests addresses to mmap in order to alternate putting semi-spaces at high memory addresses and low memory addresses; this tends to avoid some kinds of fragmentation for large heaps, where an unconstrained mmap might drop a 1Gb heap in the middle of the address space, leaving it impossible to allocate a second 1Gb heap elsewhere. Unfortunately, the semantics of mmap is that if there is already a mapping at a given address, then that previous mapping is removed and a new one created. So, if MLton's sweep of high/low addresses happens to hit upon the same address returned by a previous mmap, then it might take that as an ML heap. But, I've never seen this happen. But, I guess that we could look more closely at the strace to see if this is happening. Another possibility is if the application passes an ML pointer to C code via the FFI, which is retained by the C code and then accessed later (e.g., during a subsequent FFI call). If the ML heap is garbage collected in the meantime, then that ML pointer retained in the C code will be invalid --- it won't be treated as a root and won't be updated when the pointed-to object is moved during garbage collection. Writing through that invalid ML pointer can corrupt arbitrary objects in the heap. |