From: Robert E. <pa...@tu...> - 2003-09-18 17:03:07
|
>> 3: I did manage to crash blockbuster with the File-Open bug -- still, >> but only when the system was in the thrashing mode --- default preload >> and cache values on the big greeno data set. However, I couldn't get >> it to fail when I set the preload and cache to reasonable sizes -- so >> it is much tougher to crash then say 2 weeks ago. >> >> Here's a gdb stack trace -- notice getFrameBlock parameters are way >> out of whack. >> >> #0 0x4207c45c in memcpy () from /lib/tls/libc.so.6 >> (gdb) where >> #0 0x4207c45c in memcpy () from /lib/tls/libc.so.6 >> #1 0x08067afc in smBase::getFrameBlock(int, void*, int, int*, int*, >> int*, int) >> (this=0x41d005c8, >> f=1410337192, data=0x80c44d0, destRowStride=1410337244, dim=0x7a, >> pos=0x64534000, step=0x54100948, >> res=256) at smBase.C:988 >> #2 0x0805d32f in smLoadImage(Image*, FrameInfo*, Canvas*, Rectangle >> const*, int) () >> #3 0x08053fe9 in LoadAndConvertImage (frameInfo=0x541009a8, >> frameNumber=41, canvas=0x80c44d0, >> region=0x541009dc, levelOfDetail=0) at cache.c:75 >> #4 0x08054500 in DoReaderThreadWork (data=0x8146598) at cache.c:316 >> #5 0x40460332 in start_thread () from /lib/tls/libpthread.so.0 > > > Bob, any ideas here? At this point, only a bizarre observation. The frameInfo->frameNumber field (which I'm guessing is the same as the movie's frame number, 41 in stack frame #3) appears to be corrupted by the time it gets to "f" in stack frame #1 (why are no parameters printed for stack frame #2? Is sm.C not compiled debuggable, while everything else is?). The bizarreness is that the bogus "f" given in stack frame #1 (1410337192) is the same as the frameInfo address seen in stack frame #3 (0x541009a8) (?!!?). Since this works most of the time (it seems to fail only in low-memory conditions), it can't always be this mangled. Perhaps we're suffering a stack/heap collision? (I'm wondering if the threads' separate stack spaces are making things interesting for the collision detection...) I've got three things to try: - make clean, then make again; maybe there's a funny interaction that only occurs at funny times if you mix an optimized C++ .o file with debuggable C and C++ .o files... (sure, it's unlikely, but it's easy to test; and if it still fails, we should get more information on the critical stack frame #2) - maybe there's something funny in the C/C++ linkage (why it would only show up under low-memory conditions is confounding, but I can contrive various things involving mixed languages and threads). Perhaps adding 'extern "C"' before the exported function declarations in sm.C might make a difference. (I doubt it, though; I'd think that if there were such a conflict, it would show up more frequently, even all the time.) - interestingly, there's only one memcpy() call in getFrameBlock(), and it's used to copy tile information for overlapping tiles. I'm no expert on this code (it's a bit Spartan for my tastes ;-), but although the algorithms look fine, there's something off in it. Note that the "tinfo" variable only has a valid value if "version == 2" (line 924), but the reference (on line 971) is only protected by a check against the total number of tiles (line 939). I suspect it's possible to have multiple tiles in a non-version-2 file (or the check on line 924 is redundant); is it possible that "tileinfo" is random when assigned at line 971 and that this is producing interesting results? (Does the failing file happen to be a non-version-2 tiled file?) I actually suspect that this is unlikely (it seems this bug would be occurring all the time if this were the cause); but it's worth examining; perhaps an expert examining the code (Holger?) might be able to find something else in the area. =================================== I've checked in code that directly specifies the C linkage requirements for the exported functions in sm.C (interestingly, although there's no reason for the functions to not be static, there is no 'static "C" ...' declaration for functions, while there is an 'extern "C" ...' declaration... Holger, can you try the "make clean; make" steps, and take a look at and around the memcpy() at line 988 in smBase.C, and see if anything about the "tileinfo" or anything else looks fishy to you? I still "owe" y'all a feedback mechanism that allows the image cache to resize itself on memory failure; I'll come up with it as soon as I can (but I've currently prioritized it low, as you've got a workaround; if you need it more urgently, let me know...). Bob Ellison Tungsten Graphics, Inc. |