Menu

stack smashing detected / Boost 1.33.1

Help
2009-05-12
2013-04-11
  • Silas Sewell

    Silas Sewell - 2009-05-12

    When I apply the patch listed below Scribe will compile fine and works for a while, but eventually I get a "stack smashing detected" error.

    Does anyone one if there is something in later (> 1.33.1) versions of Boost (or possibly libevent 1.1a) which would fix this error?

    And has anyone successfully statically compiled Scribe on CentOS 5?

    I'd really like to get Scribe working as an RPM so other CentOS 5 shops can use it.

    Patch: http://code.google.com/p/silassewell/source/browse/trunk/projects/packages/rpms/scribe/scribe.2.01.patch

    Spec/Init/etc..: http://silassewell.googlecode.com/svn/trunk/2009/05/05/scribe-for-el5/scribe/

    SRPMs for Thrift/fb303/Scribe: http://www.silassewell.com/blog/2009/05/07/scribe-scalable-real-time-log-aggregation-for-centos-5-rhel-5/

    Logs:

    [root@scribe ~]# scribed -c /etc/scribed/default.conf
    [Mon May 11 18:44:09 2009] "STATUS: STARTING"
    [Mon May 11 18:44:09 2009] "STATUS: configuring"
    [Mon May 11 18:44:09 2009] "got configuration data from file </etc/scribed/default.conf>"
    [Mon May 11 18:44:09 2009] "CATEGORY : default"
    [Mon May 11 18:44:09 2009] "Creating default store"
    [Mon May 11 18:44:09 2009] "configured <1> stores"
    [Mon May 11 18:44:09 2009] "STATUS: "
    [Mon May 11 18:44:09 2009] "STATUS: ALIVE"
    [Mon May 11 18:44:09 2009] "Starting scribe server on port 9103"
    Thrift: Mon May 11 18:44:09 2009 libevent 1.1a method epoll
    ....
    [Mon May 11 18:44:12 2009] "store thread starting"
    [Mon May 11 18:44:12 2009] "[apache.access] Opened file </var/log/scribe/apache.access/apache.access_00000> for writing"
    [Mon May 11 18:44:12 2009] "[apache.access] Opened file </tmp/apache.access/apache.access_00000> for writing"
    [Mon May 11 18:44:12 2009] "[apache.access] Changing state from <DISCONNECTED> to <SENDING_BUFFER>"
    [Mon May 11 18:44:12 2009] "[apache.access] successfully read <0> entries from file </tmp/apache.access/apache.access_00000>"
    [Mon May 11 18:44:12 2009] "[apache.access] No more buffer files to send, switching to streaming mode"
    [Mon May 11 18:44:12 2009] "[apache.access] Changing state from <SENDING_BUFFER> to <STREAMING>"
    *** stack smashing detected ***: scribed terminated
    Aborted
    [root@scribe ~]#

    /etc/scribed/default.conf:

    port=9103
    max_msg_per_second=10000
    check_interval=3
    use_conn_pool=yes

    <store>
    category=default
    type=buffer

    target_write_size=20480
    max_write_interval=1
    buffer_send_rate=2
    retry_interval=30
    retry_interval_range=10

    <primary>
    type=file
    fs_type=std
    file_path=/var/log/scribe
    base_filename=thisisoverwritten
    max_size=1073741824
    </primary>

    <secondary>
    type=file
    fs_type=std
    file_path=/tmp
    base_filename=thisisoverwritten
    max_size=1073741824
    </secondary>
    </store>

    I'd be happy to do some debugging for a developer if you point me in the right directory.

    Thanks,

     
    • Anthony Giardullo

      I haven't yet run into any stack corruptions when running Scribe, so I'm not sure what could be causing this.  Can you tell me a little more about your setup?  How many categories do you have and how many messages per second are you pushing?  And is cpu/memory usage reasonable when this is running?

      Also, can you try to compile with symbols (configure --disable-opt) and then run scribe from a debugger to get the line number where this error is detected?  Of course, the line number where a stack corruption is detected isn't necessarily the same line where the corruption occurred, but it should provide a pretty good hint. 

      -Anthony

       
      • Silas Sewell

        Silas Sewell - 2009-05-12

        Scribe is still in our development environment so the load is trivial.

        We have one Scribe server (configuration in previous post) and two clients (configuration below). Each client is running Scribe and relays messages to the server (I haven't seen the clients crash yet). We have ~15 categories and do about ~60 messages/second.

        CPU and memory look fine on the clients and the server.

        gdb info: http://gist.github.com/110302

        client config: http://gist.github.com/110293

        Let me know if there is something I can do to get more/better information.

        Thanks,

         
        • Anthony Giardullo

          I'm not sure what could be causing this.  It looks like everything you are doing is correct.  I would try upgrading to a more recent version of libevent.  I am currently using libevent-1.4.3.

          Let us know how it goes.

          -Anthony

           
          • Silas Sewell

            Silas Sewell - 2009-05-15

            I updated to libevent 1.4.4-stable and it seems like the issue was resolved.

            Thanks for your help!

             
          • Silas Sewell

            Silas Sewell - 2009-05-15

            Nevermind, it just didn't happen the first couple of times.

            Here is another gdb dump: http://gist.github.com/112436

            It seems to only occur during "store thread starting".  If the server can get through starting all the store threads it seems to run fine. It seems random on when it will fail and when it works.

            I'm using thrift revision 770888, fb303 revision 770888 (from thrift apache project) and scribe 2.01. If possible can you tell me what versions of thrift/fb303 you're using.

            Thanks again,

             
    • Silas Sewell

      Silas Sewell - 2009-05-12

      Again, gdb info with thrift symbols: http://gist.github.com/110309

       
    • Anthony Giardullo

      It sounds like libevent 1.4.4-stable fixed a stack corruption bug.  Can you double check to make sure that you definitely linked with this version of libevent when rebuilding scribe?

      While I have tested recent revisions of thrift trunk, most of our stress testing(and production traffic) uses an older revision of thrift.  The version of fb303 should not be an issue.

      -Anthony

       
    • Silas Sewell

      Silas Sewell - 2009-07-24

      Working on this project again, sorry for the very late response.

      In http://gist.github.com/112436 you can see that the libevent 1.4.4-stable is displayed on start: "Thrift: Fri May 15 14:38:37 2009 libevent 1.4.4-stable method epoll"

      I'm having the same issue on Fedora 11 (boost 1.37 / libevent 1.4.5).

      I created a bug with the Fedora 11 gdb logs:

      https://sourceforge.net/tracker/?func=detail&aid=2826310&group_id=240708&atid=1113651

      Thanks for all your help!

       
    • Anthony Giardullo

      I just commented on the bug you filed and suggested contacting the developers of libevent since this is a known issue in earlier versions of libevent.

      But as a sanity check, you should probably double check that you are definitely not building or linking with the earlier version of libevent that was once installed on your system.  Even if Thrift is printing out the correct version, it is still possible you are compiling with the old version and linking with the newer one.   I would wipe the old version from your system and make sure that you have the correct libevent*.so and event.h.  Then you should make clean and remake Thrift, and then do the same for Scribe.

       
      • Silas Sewell

        Silas Sewell - 2009-07-24

        I'm positive its the never version of libevent. The Fedora 11 build was built completely separate from the CentOS version and doesn't come with the older version of libevent. Regardless:

        # ldd /usr/sbin/scribed
            linux-vdso.so.1 =>  (0x00007fff2edff000)
            libboost_system.so.4 => /usr/lib64/libboost_system.so.4 (0x0000003434000000)
            libboost_filesystem.so.4 => /usr/lib64/libboost_filesystem.so.4 (0x0000003432800000)
            libfb303.so => /usr/lib64/libfb303.so (0x0000003433800000)
            libthrift.so.0 => /usr/lib64/libthrift.so.0 (0x0000003432c00000)
            libthriftnb.so.0 => /usr/lib64/libthriftnb.so.0 (0x0000003433c00000)
            libevent-1.4.so.2 => /usr/lib64/libevent-1.4.so.2 (0x0000003433000000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00000037c6400000)
            libscribe.so => /usr/lib64/libscribe.so (0x00007f95208bd000)
            libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000037ca800000)
            libm.so.6 => /lib64/libm.so.6 (0x00000037c6800000)
            libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000037c9400000)
            libc.so.6 => /lib64/libc.so.6 (0x00000037c5c00000)
            librt.so.1 => /lib64/librt.so.1 (0x00000037c8400000)
            libnsl.so.1 => /lib64/libnsl.so.1 (0x00000037c8c00000)
            libresolv.so.2 => /lib64/libresolv.so.2 (0x00000037c9000000)
            /lib64/ld-linux-x86-64.so.2 (0x00000037c5800000)

        # yum whatprovides \*libevent-1.4.so.2
        updates/filelists_db
        libevent-1.4.5-2.fc11.x86_64 : Abstract asynchronous event notification library
        Repo        : fedora
        Matched from:
        Filename    : /usr/lib64/libevent-1.4.so.2

        Right now I'm blaming TNonblockingServer (maybe something like http://issues.apache.org/jira/browse/THRIFT-536\).

        Regardless I'm going to try to get the oldest version of thrift working with scribe as possible.

        Any chance you can give me the closest revision of the public thrift repository you're using in your environment?

        Thanks for your help.

         
        • Anthony Giardullo

          I have used Scribe in production with a version of Thrift that was probably pretty close to the original version that was open sourced.

          Keep in mind that to use an older version of thrift, you will need to make some small changes to Scribe.   I believe you would just need to replace all occurrences of 'apache::thrift' in Scribe's src with 'facebook::thrift'.

           
          • Silas Sewell

            Silas Sewell - 2009-08-05

            I built Scribe with a 18 month old version of thrift and I ran into the same issue. I'm guessing this is something I'm doing wrong, but I've had to give up for now (rsyslog works fine for us).

            Anyway, I wanted to thank you for your help and I'll try again in the next release or two.

            Thanks,

             
            • Anthony Giardullo

              Sorry we couldn't figure this out.  You should also contact the maintainers of libevent and see if they are aware of this issue.

              I will let you know if I hear of anyone else running into the same problem.

               
  • Anthony Giardullo

    Silas,  Sorry I wasn't able to figure out your stack smashing issues earlier.  These issues are difficult to debug as they could be coming from any layer of code.

    But I just ran across an old thread on the Thrift Jira page that may fix the problem you're running into:

    http://issues.apache.org/jira/browse/THRIFT-510

    And please post any future replies or discussion to Scribe's new home and discussion group:
    http://github.com/facebook/scribe
    http://groups.google.com/group/scribe-server

    -Anthony

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.