Menu

Question on KILLBYSIGSINFO1 errors (?)

Help
Bob Isch
2001-06-27
2001-07-06
  • Bob Isch

    Bob Isch - 2001-06-27

    Anyone have any ideas what might be causing the following errors?  These seem to periodically zap some of our background jobs.  Could be related to TCP/IP but I have not tried to isolate it yet.  There seem to be handful of common addresses...

    Thanks for any suggestions or info,
    Bob

    %GTM-F-KILLBYSIGSINFO1, GT.M process  7899 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x6D6173AB)
    %GTM-F-KILLBYSIGSINFO1, GT.M process  2804 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x267265A0)
    %GTM-F-KILLBYSIGSINFO1, GT.M process  2809 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x267265A0)
    %GTM-F-KILLBYSIGSINFO1, GT.M process  3881 has been killed by a signal 11 at address 0x808540A (vaddr 0x4)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 11700 has been killed by a signal 11 at address 0x808540A (vaddr 0x4)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 19090 has been killed by a signal 11 at address 0x808540A (vaddr 0x4)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 14891 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x353030AE)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 24405 has been killed by a signal 11 at address 0x808540A (vaddr 0x4)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 28219 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x3638326C)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 31319 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x6E6174AF)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 31948 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x30653261)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 31950 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x30653261)
    %GTM-F-KILLBYSIGSINFO1, GT.M process  8331 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x316461AA)
    %GTM-F-KILLBYSIGSINFO1, GT.M process  9718 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x504F5291)
    %GTM-F-KILLBYSIGSINFO1, GT.M process   348 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x534D4590)
    %GTM-F-KILLBYSIGSINFO1, GT.M process  9219 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x53595367)
    %GTM-F-KILLBYSIGSINFO1, GT.M process 16770 has been killed by a signal 11 at address 0x8082DB1 (vaddr 0x44363261)

     
    • Sam Weiner

      Sam Weiner - 2001-06-27

      First, it helps to know which version you were using.  If you built you own version,
      on what platform, including version.

      You can find out where the SEGV (signal 11) occurred by "gdb $gtm_dist/mumps",
      "disassemble 0x8082DB1" (or whatever the address reported is.)  The vaddr value
      is the virtual address whose access caused the SEGment Violation.

      There should have been a core file produced.  If there was already a core file in the
      directory, GT.M will rename the old core file to core1, core2, etc. so if you started
      with no core files and have more than one now, core1 is usually the most interesting.
      "gdb $gtm_dist/mumps core", "bt" will produce a stack traceback.  This isn't always
      very useful, especially with Linux and production images.  A dbg image is a lot more
      useful in tracking down problems.  It also has asserts which may catch memory
      corruption problems early enough to get a sensible core.

      You wouldn't be doing any external calls?

      Good luck, Sam

       
    • Bob Isch

      Bob Isch - 2001-07-05

      Sorry for the initial lack of detail, I was just trying to get a quick read on this before spending much more time on it.  I seem to be able to reproduce it now.

      GTM version is: GT.M V4.2-002 Linux x86
      Standard production install without rebuilding...

      Following is the traceback.  I believe this happens when the symbol table is stressed so it may be a garbage collection issue?  Anyway, I will try this on dbg image when I get a chance.  Also, recent core files are only about 200K if you would like one.

      Thanks,
      Bob

      (gdb) bt
      #0  0x40075a66 in ?? ()
      #1  0x4003f2b4 in ?? ()
      #2  0x805f696 in secshr_db_clnup ()
      #3  0x8060cba in secshr_db_clnup ()
      #4  0x8060df6 in secshr_db_clnup ()
      #5  0x806191c in secshr_db_clnup ()
      #6  0x806171e in secshr_db_clnup ()
      #7  0x80540f6 in parse_glvn ()
      #8  0x8052da8 in unw_prof_frame ()
      #9  0x806c0f8 in util_format ()
      #10 0x8064daa in stp_gcol ()
      #11 0x8064e3e in stp_gcol ()
      #12 0x4003cc68 in ?? ()
      #13 0x40070621 in ?? ()
      #14 0x4003e0a1 in ?? ()
      #15 0x80594af in op_srchindx ()
      #16 0x805b811 in parse_file ()
      #17 0x805f75b in secshr_db_clnup ()
      #18 0x8060cba in secshr_db_clnup ()
      #19 0x8060df6 in secshr_db_clnup ()
      #20 0x806191c in secshr_db_clnup ()
      #21 0x806171e in secshr_db_clnup ()
      #22 0x80540f6 in parse_glvn ()
      #23 0x8052da8 in unw_prof_frame ()
      #24 0x806c0f8 in util_format ()
      #25 0x805c5e6 in load_pattern_table ()
      #26 0x805c1ab in load_pattern_table ()
      #27 0x4003cc68 in ?? ()
      #28 0x805f6e9 in secshr_db_clnup ()
      #29 0x8060cba in secshr_db_clnup ()
      #30 0x8060df6 in secshr_db_clnup ()
      #31 0x806191c in secshr_db_clnup ()
      #32 0x806171e in secshr_db_clnup ()
      #33 0x80540f6 in parse_glvn ()
      #34 0x8052da8 in unw_prof_frame ()
      #35 0x806c0f8 in util_format ()
      #36 0x8064daa in stp_gcol ()
      #37 0x8064e3e in stp_gcol ()
      #38 0x4003cc68 in ?? ()
      #39 0x400769a4 in ?? ()
      #40 0x40076bf0 in ?? ()
      #41 0x40072fad in ?? ()
      #42 0x400703e4 in ?? ()
      #43 0x805f749 in secshr_db_clnup ()
      #44 0x8060cba in secshr_db_clnup ()
      #45 0x805e939 in secshr_db_clnup ()
      #46 0x805e9a4 in secshr_db_clnup ()
      #47 0x805e9ce in secshr_db_clnup ()
      #48 0x805e480 in s2n ()
      #49 0x805e64e in same_device_check ()
      #50 0x805e75e in secshr_db_clnup ()
      #51 0x80617ab in secshr_db_clnup ()
      #52 0x806171e in secshr_db_clnup ()
      #53 0x80540f6 in parse_glvn ()
      #54 0x8052da8 in unw_prof_frame ()
      #55 0x80532d5 in crt_gbl ()
      ---Type <return> to continue, or q <return> to quit---
      #56 0x805267d in pcurrpos ()
      #57 0x8053f2b in parse_glvn ()
      #58 0x8053e8e in parse_glvn ()
      #59 0x8052cd4 in new_prof_frame ()
      #60 0x806c0f8 in util_format ()
      #61 0x8070664 in wcs_verify ()
      #62 0x80547c5 in mprof_tree_find_node ()
      #63 0x8054bb7 in mprof_tree_find_node ()
      #64 0x80545cc in mprof_tree_walk ()
      #65 0x8052da8 in unw_prof_frame ()
      #66 0x806c0f8 in util_format ()
      #67 0x804afed in cli_is_hex ()
      #68 0x804ad20 in tok_string_extract ()
      #69 0x804a7c1 in sigemptyset ()
      #70 0x400369cb in ?? ()
      (gdb)

       
    • Bob Isch

      Bob Isch - 2001-07-05

      Sam:

      Well, with the debug version we get basically the same error:

      %GTM-F-KILLBYSIGSINFO1, GT.M process 13771 has been killed by a signal 11 at address 0x808C801
      (vaddr 0x32255080)

      But the traceback is much more abbreviated (and perhaps useful?):

      $ gdb $gtm_dist/mumps core
      GNU gdb 5.0rh-5 Red Hat Linux 7.1
      Copyright 2001 Free Software Foundation, Inc.
      GDB is free software, covered by the GNU General Public License, and you are
      welcome to change it and/or distribute copies of it under certain conditions.
      Type "show copying" to see the conditions.
      There is absolutely no warranty for GDB.  Type "show warranty" for details.
      This GDB was configured as "i386-redhat-linux"...
      Core was generated by `/u/gtm/mumps -direct'.
      Program terminated with signal 3, Quit.
      Reading symbols from /usr/lib/libncurses.so.4...done.
      Loaded symbols for /usr/lib/libncurses.so.4
      Reading symbols from /lib/i686/libm.so.6...done.
      Loaded symbols for /lib/i686/libm.so.6
      Reading symbols from /lib/libdl.so.2...done.
      Loaded symbols for /lib/libdl.so.2
      Reading symbols from /lib/i686/libc.so.6...done.
      Loaded symbols for /lib/i686/libc.so.6
      Reading symbols from /lib/ld-linux.so.2...done.
      Loaded symbols for /lib/ld-linux.so.2
      Reading symbols from /lib/libnss_files.so.2...done.
      Loaded symbols for /lib/libnss_files.so.2
      Reading symbols from /lib/libnss_nisplus.so.2...done.
      Loaded symbols for /lib/libnss_nisplus.so.2
      Reading symbols from /lib/libnsl.so.1...done.
      Loaded symbols for /lib/libnsl.so.1
      #0  0x400ba801 in __kill () from /lib/i686/libc.so.6
      (gdb) bt
      #0  0x400ba801 in __kill () from /lib/i686/libc.so.6
      #1  0x08091b2f in gtm_dump_core () at /u/gtmsrc/gtm/sr_unix/gtm_dump_core.c:55
      #2  0x08092385 in gtm_fork_n_core () at /u/gtmsrc/gtm/sr_unix/gtm_fork_n_core.c:161
      #3  0x0808e620 in generic_signal_handler (sig=11, info=0xbffff560, context=0xbffff5e0)
          at /u/gtmsrc/gtm/sr_unix/generic_signal_handler.c:269
      #4  <signal handler called>
      #5  0x0808c801 in fetch (__builtin_va_alist=1) at /u/gtmsrc/gtm/sr_port/fetch.c:41
      #6  0x08058bad in op_linefetch () at /u/gtmsrc/gtm/sr_i386/op_linefetch.s:33
      #7  0x0804acf7 in main (argc=2, argv=0xbffff9dc, envp=0xbffff9e8) at /u/gtmsrc/gtm/sr_unix/gtm.c:154
      #8  0x400a9177 in __libc_start_main (main=0x804ab30 <main>, argc=2, ubp_av=0xbffff9dc,
          init=0x8049fe4 <_init>, fini=0x81574ac <_fini>, rtld_fini=0x4000e184 <_dl_fini>,
          stack_end=0xbffff9d4) at ../sysdeps/generic/libc-start.c:129
      (gdb)

      Still trying to do something with the symbol table it looks like.
      Core file is about 13MB now.
      Note: The debug image was compiled on a RH6.2 system and the test and
      core dump were done on a RH7.1 system.

      Any ideas?

      Thanks again,
      Bob

       
    • Bob Isch

      Bob Isch - 2001-07-06

      I have migrated this topic to the bugs section with sample code to reproduce it.  It appears that it caused by trying to read more than 90K from a socket.  Which, of course, should return the data in 32K chunks (since GT.M is thus limited in its' local variable capacity) but certainly should not dump core...

       

Log in to post a comment.