Re: [brlcad-users] Illegal Instruction problem in BRL-CAD 7.12.6
Open Source Solid Modeling CAD
Brought to you by:
brlcad
From: Christopher S. M. <br...@ma...> - 2008-08-25 05:58:13
|
Outstanding! Now that is some nice debugging -- really, way to go and thanks! Sounds like you pin-pointed the problem exactly and it all makes complete sense now including why I haven't run into the problem. (I don't have a single x86-based system that doesn't support SSE.) It also explains the relatively recent nature of the invalid instruction reports as they correspond with the development of NURBS support which is the only reason configure is adding those flags. The NURBS implementation can optionally use SSE for fast surface evaluation but that is all experimental/active development that isn't exposed anywhere else. Since it is entirely isolated within the NURBS evaluation and isn't even critical there, it's completely safe (and recommended) to simply comment out those related portions of our configure.ac file without any downside. In fact, I've done this on the latest SVN trunk sources for now until a proper run-time test can be added to test for SSE functionality. As for needing to run make install before make test is rather dependent on your linker, system, and compilation options. Our tests do usually run prior to install but that is heavily dependent on libtool doing its job and our bwish/btclsh automatic auto_path searching. The bwish/btclsh tools scan the expected source directories for the scripts they're looking for (so that you specifically don't have to set [I]TCL_LIBRARY and don't have to install) but a lot of factors come into play as to whether that succeeds. Since the tests are predominantly for devs, it's been "good enough". The main user 'tests' are to run 'make benchmark' prior to install (or just 'benchmark' after install) and to run 'mged' after install. Thanks again for tracking down that problem! That would have been really very hard to reproduce, isolate, and fix anytime soon. Cheers! Sean On Aug 24, 2008, at 8:13 PM, Simon Clubley wrote: > On 24/08/2008, Christopher Sean Morrison <br...@ma...> wrote: >> Simon, >> >> As you're obviously aware having read the list, you're not the >> first to run >> into this Illegal instruction problem. At this point, there's not >> much left >> to try other than someone capable reproducing the problem and >> running in a >> debugger until the issue is identified. >> >> The only other idea to try that comes to mind is to test a different >> version of the compiler (gcc 4.0 or 3.3) or a different compiler >> altogether >> (intel). Similarly, trying previous (source) releases of BRL-CAD >> to see if >> a problem was introduced in a particular release would help. >> >> Cheers! >> Sean >> >> > > Hello, > > I think I now know what the problem is, but I need someone who knows > the BRL-CAD code base and the issues around the GCC compiler > performance options to confirm this: > > Here's what I've learnt so far: > > The Illegal Instruction is not Linux (or even modern Linux) specific. > > I've now got the Illegal Instruction on a RedHat 7.3 system with gcc > 3.4.3 installed, a FreeBSD 6.2 system as well as the modern Linux > system posted about earlier. > > I've built BRL-CAD 7.2.6 using the same versions of the tools used to > build 7.12.6 on the RedHat 7.3 system and 7.2.6 (apart from an issue > with a missing weight.sh during "make test") completes the tests on > the RedHat 7.3 system. > > The failure of "make benchmark" that I reported in 7.12.6 is due to > the same Illegal Instruction error. > > I believe that the Illegal Instruction is genuine and is _not_ caused > by some kind of memory trashing or stack overflow. I have duplicated > the problem in a stripped down version of timer42.c, with a minimal > main(). > > The Illegal Instruction is occuring in the timer42 version of > rt_get_timer() when calculating elapsed_secs. When looking at the > assembly code for rt_get_timer() using Insight/GDB, I see a opcode for > cvtsi2sd. According to my Intel handbooks, that appears to be only > valid on SSE2 architectures. > > It looks like the compiler options -msse and -msse2 are added by the > current version of configure if it detects that the compiler supports > the options. > > However, and this is where I need a second opinion, I think that gcc > will accept them as valid options, and generate the related code, even > if the platform that gcc is running on does not actually support SSE2 > instructions. > > If I manually alter configure and comment out the SSE="$MSSE $MSSE2" > line and add SSE="" on the next line, the tests and benchmarks appear > to run ok, but I would strongly recommend that any users reading this > do _NOT_ make this alteration unless you understand the nature of the > problem and agree that it's the correct solution for _you_. > > Assuming that this is indeed the problem, I wonder if the best > solution would be to add a --disable-sse-check and > --disable-sse2-check to configure, but leave the checks enabled by > default so that you continue to see the same behaviour as currently > exists ? > > BTW, it appears that, contrary to the sequence listed in the INSTALL > file for 7.12.6, you have to "make install" first before running the > tests. Once an install is done, you no longer appear to need to define > TCL_LIBRARY and ITCL_LIBRARY before running "make test". > > As a final note, there's of course no guarantee that other people's > Illegal Instruction crashes are caused by the same issue that I've > experienced. > > Simon. > > -- > Simon Clubley > sim...@go... > |