From: Maynard J. <may...@us...> - 2008-01-07 15:13:21
|
Michael Ellerman wrote: > Hi all, > > Running oprofile (0.9.3) on a cell machine (2.6.24-rc7 kernel) I see the > oprofiled intermittently crashing. It only seems to happen when I run an > SPU program. > > When it crashes I see this in the log: > > oprofiled started Mon Jan 7 18:23:21 2008 > kernel pointer size: 8 > Read buffer of 98307 entries. > No anon map for pc 0, app anonymous. > Well, that's definitely badness, but this, in itself, would not cause oprofiled to crash. Is this the last thing you see in the log? Does the daemon fail both with and without the --verbose option? > Compared to a working run: > > oprofiled started Mon Jan 7 18:21:12 2008 > kernel pointer size: 8 > Read buffer of 11 entries. > Dangling ESCAPE_CODE. > <snip> > A dangling ESCAPE code is badness, too. For Cell, a buffer with 11 entries could mean 3 entries for profiling start header info + 8 entries for SPU context info. The 11th entry would be the offset of the SPU ELF data, if embedded; otherwise 0. According to the above log snippet, the 11th entry is an ESCAPE_CODE. This implies to me that another event record may be getting intermingled in the buffer. There were locks and memory barriers in place to prevent this from happening. Has there been a change in the Cell-oprofile kernel code recently that might be causing this? Did you see this problem on earlier kernels? Are there any more details you can provide to reproduce the problem? -Maynard > I've tried strace'ing oprofiled but that seems to hide the bug. Does > anyone have any ideas? > > cheers > > > ------------------------------------------------------------------------ > > _______________________________________________ > cbe-oss-dev mailing list > cbe...@oz... > https://ozlabs.org/mailman/listinfo/cbe-oss-dev > |