Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#1303 SegFault in STAFTCPConnProvider.cpp

Unix::Linux
open
nobody
5
2009-10-28
2009-10-28
Michael
No

[medmonds@kc1b1-e0]# staf local misc version
Response
--------
3.3.5
[medmonds@kc1b1-e0]#
[medmonds@kc1b1-e0]# uname -a
Linux kc1b1-e0 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
[medmonds@kc1b1-e0]# cat /etc/redhat-release
Red Hat Enterprise Linux ES release 4 (Nahant Update 6)
[medmonds@kc1b1-e0]# staf local misc list interfaces
Response
--------
[
{
Interface Name: local
Library : STAFLIPC
Options : {
IPCName: STAF
}
}
{
Interface Name: tcp
Library : STAFTCP
Options : {
ConnectTimeout: 5000
Port : 6500
Protocol : IPv4
Secure : No
}
}
]
[medmonds@kc1b1-e0]#

[medmonds@kc1b1-e0]# gdb /usr/software/test/staf/last/bin/STAFProc core.3317
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Reading symbols from /lib64/tls/libpthread.so.0...done.
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAF.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAF.so
Reading symbols from /usr/software/lib64/libstdc++.so.6...done.
Loaded symbols for /usr/software/lib64/libstdc++.so.6
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /usr/software/lib64/libgcc_s.so.1...done.
Loaded symbols for /usr/software/lib64/libgcc_s.so.1
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAFLIPC.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAFLIPC.so
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAFTCP.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAFTCP.so
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAFDSLS.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAFDSLS.so
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAFLog.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAFLog.so
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAFMon.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAFMon.so
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAFPool.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAFPool.so
Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5a/lib/libSTAFReaper.so...done.
Loaded symbols for /usr/software/test/staf/staf-3.3.5a/lib/libSTAFReaper.so
Core was generated by `/usr/software/test/staf/last/bin/STAFProc /usr/software/test/staf/last/bin/STAF'.
Program terminated with signal 6, Aborted.
#0 0x000000324c52e2ed in raise () from /lib64/tls/libc.so.6
(gdb) where
#0 0x000000324c52e2ed in raise () from /lib64/tls/libc.so.6
#1 0x000000324c52fb7c in abort () from /lib64/tls/libc.so.6
#2 0x00000000004418d4 in generic_signal_handler (signum=11) at /u/medmonds/p4/staf/src/staf/stafproc/unix/STAFProcOSUtil.cpp:84
#3 <signal handler called>
#4 0x0000002a989af1cb in STAFConnectionRead (baseConnection=Cannot access memory at address 0xffffffffffffff29
) at /u/medmonds/p4/staf/src/staf/connproviders/tcp/STAFTCPConnProvider.cpp:3044
Cannot access memory at address 0x9
(gdb) up
#1 0x000000324c52fb7c in abort () from /lib64/tls/libc.so.6
(gdb) up
#2 0x00000000004418d4 in generic_signal_handler (signum=11) at /u/medmonds/p4/staf/src/staf/stafproc/unix/STAFProcOSUtil.cpp:84
84 abort();
(gdb) up
#3 <signal handler called>
(gdb) up
#4 0x0000002a989af1cb in STAFConnectionRead (baseConnection=Cannot access memory at address 0xffffffffffffff29
) at /u/medmonds/p4/staf/src/staf/connproviders/tcp/STAFTCPConnProvider.cpp:3044
3044 connection->readWriteTimeout);
(gdb)

/u/medmonds/p4/staf/src/staf/connproviders/tcp/STAFTCPConnProvider.cpp line 3044 (this is the same code that is in cvs):

rc = STAFRead(connection->clientSocket,
connection->buffer,
recvSize,
isSecure,
doTimeout,
connection->readWriteTimeout);

Discussion

1 2 > >> (Page 1 of 2)
  • Sharon Lucas
    Sharon Lucas
    2009-10-28

    Do you have a recreation scenario for this problem?

    How often does it happen?

     
  • Sharon Lucas
    Sharon Lucas
    2009-10-28

    What STAF installer file did you download and use to install STAF 3.3.5?

    What's the output from:

    STAF targetMachine MISC LIST PROPERTIES

     
  • Michael
    Michael
    2009-10-28

    we build and install STAF in-house for consistency across all platforms.
    I don't have the details for reproducing it yet and as far as I know, only our "Stress and Scalability" team has hit it. there is one test that can hit it every time (so I temporarily put them back to an older version -- 3.3.0).

     
  • Sharon Lucas
    Sharon Lucas
    2009-10-28

    So, are you saying that the problem does not occur when you use STAF V3.3.0? It only occurs when using STAF V3.3.5? If so, can you narrow down which version of STAF after V3.3.0 it started occurring in? e.g. Did it first start occurring in V3.3.4.1 (after we added timeouts to the read/write connection apis)?

    How are you building STAF V3.3.5 that you're running on this Linux AMD64 machine? e.g. Are you running a 64-bit version of STAF (built with OS_NAME set to linuxamd64) or a 32-bit version of STAF (built with OS_NAME set to linx)?

    Since you have a test that can recreate this problem, can you verify if the problem occurs if you use STAF V3.3.5 for linuxamd64 that you download from http://staf.sourceforge.net instead of using a version of STAF that you built yourself?

     
  • Michael
    Michael
    2009-10-28

    yeah, I suspected the new connection timeout interface which is why I told them to go to 3.3.0 -- the only other version we have w/o the new Conn API), but I haven't had them try on 3.3.4 (mainly because they working on a deadline of their own).

    I build STAF as 32 bit. I'll build it for 64bit and see if that makes a diff. I'll update this CR as soon as I get the info.

     
  • Michael
    Michael
    2009-11-04

    now compiled as 64bit, there's a different segfault
    [medmonds@kc1b1-e0]# sudo gdb /usr/software/test/staf/last/bin/STAFProc /core.3437
    GNU gdb 6.5
    Copyright (C) 2006 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show warranty" for details.
    This GDB was configured as "x86_64-unknown-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

    Reading symbols from /lib64/tls/libpthread.so.0...done.
    Loaded symbols for /lib64/tls/libpthread.so.0
    Reading symbols from /lib64/libcrypt.so.1...done.
    Loaded symbols for /lib64/libcrypt.so.1
    Reading symbols from /lib64/libdl.so.2...done.
    Loaded symbols for /lib64/libdl.so.2
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAF.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAF.so
    Reading symbols from /usr/software/lib64/libstdc++.so.6...done.
    Loaded symbols for /usr/software/lib64/libstdc++.so.6
    Reading symbols from /lib64/tls/libm.so.6...done.
    Loaded symbols for /lib64/tls/libm.so.6
    Reading symbols from /usr/software/lib64/libgcc_s.so.1...done.
    Loaded symbols for /usr/software/lib64/libgcc_s.so.1
    Reading symbols from /lib64/tls/libc.so.6...done.
    Loaded symbols for /lib64/tls/libc.so.6
    Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
    Loaded symbols for /lib64/ld-linux-x86-64.so.2
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAFLIPC.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFLIPC.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAFTCP.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFTCP.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAFDSLS.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFDSLS.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAFLog.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFLog.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAFMon.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFMon.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAFPool.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFPool.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b/lib/libSTAFReaper.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFReaper.so
    Core was generated by `/usr/software/test/staf/last/bin/STAFProc /usr/software/test/staf/last/bin/STAF'.
    Program terminated with signal 6, Aborted.
    #0 0x000000324c52e2ed in raise () from /lib64/tls/libc.so.6
    (gdb) where
    #0 0x000000324c52e2ed in raise () from /lib64/tls/libc.so.6
    #1 0x000000324c52fb7c in abort () from /lib64/tls/libc.so.6
    #2 0x000000000043f603 in generic_signal_handler (signum=11) at /u/medmonds/p4/staf/src/staf/stafproc/unix/STAFProcOSUtil.cpp:84
    #3 <signal handler called>
    #4 0x0000000000431c88 in STAFRefPtr<STAFConnection>::operator-> (this=0x5954203520595449) at /u/medmonds/p4/staf/src/staf/stafif/STAFRefPtr.h:79
    #5 0x00000000004175a6 in handleLocalServiceRequestAPI (level=1819436368, provider=0x4e3a30323a204550, connection=@0x5954203520595449,
    doShutdown=@0x524f495250203834) at /u/medmonds/p4/staf/src/staf/stafproc/STAFProc.cpp:1381
    #6 0x796c706572277b3a in ?? ()
    #7 0x203e3d20276f745f in ?? ()
    #8 0x2c276c61636f6c27 in ?? ()
    #9 0x796c7065726f6e27 in ?? ()
    #10 0x272c30203e3d2027 in ?? ()
    #11 0x203e3d202779656b in ?? ()
    #12 0x312c303439303127 in ?? ()
    #13 0x646f63272c273332 in ?? ()
    #14 0x2427203e3d202765 in ?? ()
    #15 0x543a3a6e72616854 in ?? ()
    #16 0x3e2d42445f747365 in ?? ()
    #17 0x7265747369676572 in ?? ()
    #18 0x28746c757365725f in ?? ()
    #19 0x7473657462757322 in ?? ()
    #20 0x2c34312c2264695f in ?? ()
    #21 0x5f656c6262756222 in ?? ()
    #22 0x65646e752c226469 in ?? ()
    #23 0x6c75736572222c66 in ?? ()
    #24 0x7373656c622c2274 in ?? ()
    #25 0x63617473227b2028 in ?? ()
    #26 0x202265636172746b in ?? ()
    #27 0x72222c2222203e3d in ?? ()
    #28 0x3e3d202264696e75 in ?? ()
    #29 0x6574735f33302220 in ?? ()
    #30 0x69732f31702f7473 in ?? ()
    #31 0x5f656772616c5f6f in ?? ()
    #32 0x69746e6575716573 in ?? ()
    #33 0x22646165725f6c61 in ?? ()
    #34 0x67617373656d222c in ?? ()
    #35 0x5b203e3d20227365 in ?? ()
    #36 0x2064656c69616622 in ?? ()
    #37 0x7472617473206f74 in ?? ()
    #38 0x6976726570757320 in ?? ()
    #39 0x6b206e6f20726f73 in ?? ()
    #40 0x30652d3862333163 in ?? ()
    #41 0x722046415453203a in ?? ()
    #42 0x6620747365757165 in ?? ()
    #43 0x5c5c3a64656c6961 in ?? ()
    #44 0x6f70646e4520206e in ?? ()
    #45 0x636f6c203a746e69 in ?? ()
    #46 0x5220206e5c5c6c61 in ?? ()
    #47 0x203a747365757165 in ?? ()
    #48 0x4550204555455551 in ?? ()
    #49 0x2054494157204b45 in ?? ()
    #50 0x20206e5c5c363032 in ?? ()
    #51 0x6320746c75736552 in ?? ()
    #52 0x282032203a65646f in ?? ()
    #53 0x536e776f6e6b6e55 in ?? ()
    #54 0x5c29656369767265 in ?? ()
    #55 0x7573655220206e5c in ?? ()
    #56 0x617373654d20746c in ?? ()
    #57 0x65646e75203a6567 in ?? ()
    ---Type <return> to continue, or q <return> to quit---Quit
    (gdb) [medmonds@kc1b1-e0]#

    /u/medmonds/p4/staf/src/staf/stafif/STAFRefPtr.h line 79:
    TheType *operator->() const { return fPtr; }

     
  • Sharon Lucas
    Sharon Lucas
    2009-11-04

    The problem is orignating from line 1381 in stafproc/STAFProc.cpp which is:

    connection->read(buffer, buffSize, doTimeout);

    This again perhaps indicates that the problem has to do with timeouts that were added to the read/write connection apis. But, I still don't see what the problem is.

    Rather that using a version of STAF that you built, can you download STAF binaries from http://staf.sourceforge.net and install it and run it (instead of the STAF version you built) and then see if the problem still occurs?

     
  • I've been using STAFLoop to simulate a stress environment on our Linux AMD64 system with STAF 3.3.5, but I have not seen this problem. Could you try running a similar test with STAFLoop on a system where the problem is occurring and see if this recreates the problem? On the test system, run the following command multiple times in parallel (for my tests I ran the command 8 times in parallel):

    STAFLoop 10000 tcp://staf4g ping ping

    (in this example "staf4g" is the hostname of the local machine). Note that running 8 instances of this command in parallel, with the # of loops set to 10000 took about 15 minutes on my system.

     
  • Michael
    Michael
    2009-11-04

    I was able to reproduce it; however, I had to modify STAFLoop to make requests similar to how we make them.

    in our testing, all STAF requests (except for requests to the Queue service) are asynchronous, so I made the submit kSTAFReqQueue and I removed the break on 3 errors. Due to STAFCommunication errors, we sleep for 500 Milliseconds and resubmit up to 6 times per request.

    my STAFLoop.cpp now looks like:
    # diff /u/medmonds/p4/staf/src/staf/test/STAFLoop.cpp /u/medmonds/p4/staf_3.3.5/src/staf/test/STAFLoop.cpp
    84c84
    < STAFResultPtr result = handlePtr->submit(where, service, request);
    ---
    > STAFResultPtr result = handlePtr->submit(where, service, request,kSTAFReqQueue);
    90c90
    < if (++numError > 3) break;
    ---
    > //if (++numError > 3) break;
    #

    [medmonds@staf-rhel4x64]# sudo gdb /usr/software/test/staf/last/bin/STAFProc /co re.5192
    GNU gdb 6.5
    Copyright (C) 2006 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show warranty" for details.
    This GDB was configured as "x86_64-unknown-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

    Reading symbols from /lib64/tls/libpthread.so.0...done.
    Loaded symbols for /lib64/tls/libpthread.so.0
    Reading symbols from /lib64/libcrypt.so.1...done.
    Loaded symbols for /lib64/libcrypt.so.1
    Reading symbols from /lib64/libdl.so.2...done.
    Loaded symbols for /lib64/libdl.so.2
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAF.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAF.so
    Reading symbols from /usr/software/lib64/libstdc++.so.6...done.
    Loaded symbols for /usr/software/lib64/libstdc++.so.6
    Reading symbols from /lib64/tls/libm.so.6...done.
    Loaded symbols for /lib64/tls/libm.so.6
    Reading symbols from /usr/software/lib64/libgcc_s.so.1...done.
    Loaded symbols for /usr/software/lib64/libgcc_s.so.1
    Reading symbols from /lib64/tls/libc.so.6...done.
    Loaded symbols for /lib64/tls/libc.so.6
    Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
    Loaded symbols for /lib64/ld-linux-x86-64.so.2
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAFLIPC.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFLIPC.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAFTCP.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFTCP.so
    Reading symbols from /lib64/libnss_files.so.2...done.
    Loaded symbols for /lib64/libnss_files.so.2
    Reading symbols from /lib64/libnss_dns.so.2...done.
    Loaded symbols for /lib64/libnss_dns.so.2
    Reading symbols from /lib64/libresolv.so.2...done.
    Loaded symbols for /lib64/libresolv.so.2
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAFDSLS.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFDSLS.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAFLog.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFLog.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAFMon.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFMon.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAFPool.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFPool.so
    Reading symbols from /x/eng/localtest/arch/x86_64-fedora-linux2/staf/staf-3.3.5b /lib/libSTAFReaper.so...done.
    Loaded symbols for /usr/software/test/staf/staf-3.3.5b/lib/libSTAFReaper.so
    Core was generated by `/usr/software/test/staf/staf-3.3.5b/bin/STAFProc /usr/sof tware/test/staf/staf-3'.
    Program terminated with signal 6, Aborted.
    #0 0x0000003a8692e26d in raise () from /lib64/tls/libc.so.6
    (gdb) where
    #0 0x0000003a8692e26d in raise () from /lib64/tls/libc.so.6
    #1 0x0000003a8692fbac in abort () from /lib64/tls/libc.so.6
    #2 0x000000000043f603 in generic_signal_handler (signum=11)
    at /u/medmonds/p4/staf/src/staf/stafproc/unix/STAFProcOSUtil.cpp:84
    #3 <signal handler called>
    #4 0x0000000000431c88 in STAFRefPtr<STAFConnection>::operator-> (
    this=0x757165522f464154)
    at /u/medmonds/p4/staf/src/staf/stafif/STAFRefPtr.h:79
    #5 0x00000000004175a6 in handleLocalServiceRequestAPI (level=1397050656,
    provider=0x6c706d6f43747365, connection=@0x757165522f464154,
    doShutdown=@0x5320455059542031)
    at /u/medmonds/p4/staf/src/staf/stafproc/STAFProc.cpp:1381
    #6 0x242f544453406372 in ?? ()
    #7 0x313a32323a323a53 in ?? ()
    #8 0x7365757165723a33 in ?? ()
    #9 0x407265626d754e74 in ?? ()
    #10 0x353a53242f544453 in ?? ()
    #11 0x363a38393135383a in ?? ()
    #12 0x40746c757365723a in ?? ()
    #13 0x363a53242f544453 in ?? ()
    #14 0x6f43464154533a38 in ?? ()
    #15 0x6e6f697463656e6e in ?? ()
    #16 0x746e495564616552 in ?? ()
    ---Type <return> to continue, or q <return> to quit---q
    Quit
    (gdb) [medmonds@staf-rhel4x64]#

     
1 2 > >> (Page 1 of 2)