Menu

#159 mged crashes when inserting a torus

crash or data loss
closed-accepted
Modeling (29)
7
2008-08-12
2008-05-07
louipc
No

This relates to SVN 31011 but I've had this problem in 7.12.2 and some other time before that. I can't remember when exactly, but it seemed to disappear for awhile.

I can't really figure out what the problem is.
I can only guess that it has something to do with the compiling or linking.

If I copy the code from rt_num_circular_segments() into a separate program, compile, and run it everything works perfectly. Of course I have to use `gcc -lm` to properly include the math libraries.

I've attached the little test program. I don't think it matters though.

Here's the gdb output:

Starting program: /opt/brlcad/bin/mged test.g in testo tor 0 0 0 1 0 0 20 10
[Thread debugging using libthread_db enabled]
[New Thread 0xb708d6e0 (LWP 10543)]
[New Thread 0xb6daab90 (LWP 10546)]

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0xb708d6e0 (LWP 10543)]
0xb7e0db15 in rt_num_circular_segments (maxerr=0.59999999999999998, radius=20)
at g_torus.c:984
984 n = bn_pi / half_theta + 0.99;
(gdb) bt
#0 0xb7e0db15 in rt_num_circular_segments (maxerr=0.59999999999999998,
radius=20) at g_torus.c:984
#1 0xb7e0e720 in rt_tor_plot (vhead=0xbfc592c0, ip=0xbfc59400, ttol=0x89a7e4c,
tol=0x89a7e68) at g_torus.c:1047
#2 0xb7f718b3 in dgo_wireframe_leaf (tsp=0x88f6d5c, pathp=0x88f6e50,
ip=0xbfc59400, client_data=0x89442a8) at dg_obj.c:2633
#3 0xb7d3f3b3 in db_recurse (tsp=0x88f6d5c, pathp=0x88f6e50,
region_start_statepp=0xbfc594b8, client_data=0x89442a8) at db_tree.c:1289
#4 0xb7d3fe8f in db_walk_subtree (tp=0x8944310,
region_start_statepp=0xbfc594b8, leaf_func=0xb7f71740 <dgo_wireframe_leaf>,
client_data=0x89442a8, resp=0x8105800) at db_tree.c:2053
#5 0xb7d40212 in db_walk_dispatcher (cpu=0, arg=0xbfc59618) at db_tree.c:2148
#6 0xb7d40a07 in db_walk_tree (dbip=0x8942e90, argc=1, argv=0xbfc59854,
ncpu=1, init_state=0x89a7d58, reg_start_func=0,
reg_end_func=0xb7f6ab90 <dgo_wireframe_region_end>,
leaf_func=0xb7f71740 <dgo_wireframe_leaf>, client_data=0x89442a8)
at db_tree.c:2416
#7 0xb7f71268 in dgo_drawtrees (dgop=0x8a03130, interp=0x88818d0, argc=1,
argv=0xbfc59854, kind=1, _dgcdp=0x0) at dg_obj.c:3133
#8 0xb7f713e6 in dgo_draw_cmd (dgop=0x8a03130, interp=0x88818d0, argc=1,
argv=0xbfc59854, kind=1) at dg_obj.c:542
#9 0x0807affe in edit_com (argc=2, argv=0xbfc59850, kind=1, catch_sigint=1)
at chgview.c:441
#10 0x080dcf75 in f_in (clientData=0x80fff70, interp=0x88818d0, argc=11,
argv=0x8888c40) at typein.c:859
#11 0xb71de026 in TclInvokeStringCommand () from /usr/lib/libtcl8.5.so
#12 0xb71dee31 in TclEvalObjvInternal () from /usr/lib/libtcl8.5.so
#13 0xb71df761 in TclEvalEx () from /usr/lib/libtcl8.5.so
#14 0xb71dfbfe in Tcl_EvalEx () from /usr/lib/libtcl8.5.so
#15 0xb71dfc3c in Tcl_Eval () from /usr/lib/libtcl8.5.so
#16 0x0808017e in cmdline (vp=0x88818d0, record=1) at cmd.c:1185
#17 0x080bca4e in main (argc=0, argv=0xbfc59ca8) at mged.c:683

Discussion

  • louipc

    louipc - 2008-05-07

    Little test program.

     
  • louipc

    louipc - 2008-07-25

    SVN 31914 bt full output

     
  • louipc

    louipc - 2008-07-25

    Logged In: YES
    user_id=1633208
    Originator: YES

    Here's some more detailed output which also reflects new filenames.
    File Added: btfull

     
  • Sean Morrison

    Sean Morrison - 2008-07-26
    • assigned_to: nobody --> brlcad
    • priority: 5 --> 7
    • status: open --> pending-fixed
     
  • Sean Morrison

    Sean Morrison - 2008-07-26

    Logged In: YES
    user_id=785737
    Originator: NO

    Loui, thanks for the detailed report. Very informative. I added a bunch of sanity checks to rt_num_circular_segments() as I entirely suspect this is floating point fuzz that was causing a divide by zero. It's still odd that it was making it past the SMALL test, though, so something else may still be going on. Please try the latest svn sources, anything after r31961, to confirm whether this is fixed or not. Thanks again! -- Sean

     
  • louipc

    louipc - 2008-07-26
    • status: pending-fixed --> open-fixed
     
  • louipc

    louipc - 2008-07-26

    Logged In: YES
    user_id=1633208
    Originator: YES

    Thanks for looking at this. Unfortunately mged still exits with Illegal Instruction in SVN 31962.

     
  • louipc

    louipc - 2008-07-26
    • status: open-fixed --> open-accepted
     
  • Sean Morrison

    Sean Morrison - 2008-07-26

    Logged In: YES
    user_id=785737
    Originator: NO

    Can you repost the stack trace with the updated build? Curious if it's still crashing during that division line and, if so, what the value of half_theta and bn_pi are at the time.

    Also, with your test program -- does the problem occur there as well?

     
  • louipc

    louipc - 2008-07-28

    Logged In: YES
    user_id=1633208
    Originator: YES

    It seems like the conversion from fastf_t to int is what's tripping the Illegal Instruction.
    The test program works as expected without problems.

    I added some print statements and new variables to test intermediate steps. It's interesting that for the cos_half_theta the printed value and the value reported by gdb are different. I'm not sure if that hints to anything.

    Here's some output:

    Starting program: /opt/brlcad/bin/mged test.g in testo tor 0 0 0 1 0 0 20 10
    [Thread debugging using libthread_db enabled]
    [New Thread 0xb6eed6e0 (LWP 2928)]
    [New Thread 0xb6c06b90 (LWP 2931)]
    cos_half_theta = 9.700000e-01
    half_theta = 2.455655e-01
    SMALL_FASTF = 1.000000e-77
    bn_pi = 3.141593e+00
    l = bn_pi / half_theta = 1.279330e+01
    m = bn_pi / half_theta + 0.99 = 1.378330e+01

    Program received signal SIGILL, Illegal instruction.
    [Switching to Thread 0xb6eed6e0 (LWP 2928)]
    rt_num_circular_segments (maxerr=0.59999999999999998, radius=20)
    at primitives/tor/tor.c:1005
    1005 n = (double) m;
    (gdb) bt 3 full
    #0 rt_num_circular_segments (maxerr=0.59999999999999998, radius=20)
    at primitives/tor/tor.c:1005
    cos_half_theta = 0
    half_theta = 0.24556551751529213
    l = 12.793297224208835
    m = 13.783297224208836
    n = <value optimized out>
    #1 0xb7d5a1b0 in rt_tor_plot (vhead=0xbf909ac0, ip=0xbf909c00,
    ttol=0x9900830, tol=0x990084c) at primitives/tor/tor.c:1073
    dist_to_rim = <value optimized out>
    tip = (struct rt_tor_internal *) 0x9900d10
    w = <value optimized out>
    nw = 8
    len = <value optimized out>
    nlen = 16
    pts = <value optimized out>
    rel = 0
    #2 0xb7e09323 in dgo_wireframe_leaf (tsp=0x9900e1c, pathp=0x9900f10,
    ip=0xbf909c00, client_data=0x989d2b0) at dg_obj.c:2672
    curtree = <value optimized out>
    dashflag = 0
    vhead = {magic = 16868736, forw = 0xbf909ac0, back = 0xbf909ac0}

     
  • Sean Morrison

    Sean Morrison - 2008-07-28
    • status: open-accepted --> pending-accepted
     
  • Sean Morrison

    Sean Morrison - 2008-07-28

    Logged In: YES
    user_id=785737
    Originator: NO

    Loui, those are indeed some odd results. The fact that it has suspicious casting problems and is dying on what should otherwise be fully protected code makes me think there is some data management/optimization problem going on. More specifically, it reminds me of a talk I was having with kwizart several weeks back regarding compilation options.

    Presuming you compiled with gcc -- did you use the -fno-strict-aliasing flag? That flag is required as we do rely on structure type punning in such a way that -fstrict-aliasing (which is enabled by default with -O2 and -O3) will cause failures at run-time.

    Another validation check before more head-banging on this stack trace is to run the benchmark suite -- if it fails to pass, then there are basic compilation sanity failures that need to be addressed. You can run the benchmark with the 'benchmark' command; it'll report results as right or wrong.

     
  • SourceForge Robot

    • status: pending-accepted --> closed-accepted
     
  • SourceForge Robot

    Logged In: YES
    user_id=1312539
    Originator: NO

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
  • louipc

    louipc - 2008-09-03

    Logged In: YES
    user_id=1633208
    Originator: YES

    I should mention that I was building and running BRL-CAD on a Pentium III.

    This problem is no longer present in SVN 32569.
    It was probably caused by improper SSE instructions in the binary as discussed
    on the brlcad-users mailing list on AUG 2008 under the
    "Illegal Instruction problem in BRL-CAD 7.12.6" thread.

    Here's a reference link.
    http://sourceforge.net/mailarchive/message.php?msg_name=426443510808232000w30b450a1pda91c8d58a84ac45%40mail.gmail.com

     

Log in to post a comment.