#300 a crash on libevent-core

For_2.0
open
nobody
5
2013-02-21
2013-02-21
Yong Gui
No

We use libevent on a server program to process client connections and read events. After 20+ days of running, the program crushed. This crash can been reproduced after a very long time running.There are some code involved libevent operations:

/*new a base*/
base = event_base_new();
......
/*add a event to process socket connections*/
listenevent = event_new(base, connect_fd, EV_READ|EV_PERSIST, accept_cb, (void*)this);
event_add(listenevent, NULL);

/*add read event for accepted socket*/
clifd = accept(connect_fd, (struct sockaddr *)&addr, &len);
......
recvevent = event_new(base, clifd, EV_READ|EV_PERSIST, UserConnEv::socket_read_cb, (void*)connection);
connection->recv_id = event_add(recvevent, NULL);
.......
/*set timeout for a client when it is accepted*/
tmoutevent = event_new(base, -1, 0, UserConnEv::timeout_cb, (void*)connection);
evutil_timerclear(&timeOutTime);
timeOutTime.tv_sec = 30;
connection->timer_id = event_add(tmoutevent, &timeOutTime);
......
/*update a timeout event for a client when it sends data , does this a right way to update timeout event? */
connection->timer_id = event_add(conn->tmoutevent, &timeOutTime);
.......
/*add a signal event to do something*/
struct event signal_usr;
event_assign(&signal_usr, base, SIGUSR1 , EV_SIGNAL|EV_PERSIST, blacklist_refresh, blacklist);
event_add(&signal_usr, NULL);

Our libevent version is 2.0.19 and stack trace is like this:
Program terminated with signal 11, Segmentation fault.
#0 evmap_io_active (base=0xe60b60, fd=91, events=34) at evmap.c:402
402 TAILQ_FOREACH(ev, &ctx>events, ev_io_next) {
Missing separate debuginfos, use: debuginfo-install apr-1.3.9-5.el6_2.x86_64 glibc-2.12-1.80.el6_3.3.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.9-33.el6.x86_64 libcom_err-1.41.12-12.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 libuuid-2.17.2-12.7.el6.x86_64 mysql-libs-5.1.61-4.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 sqlite-3.6.20-1.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0 evmap_io_active (base=0xe60b60, fd=91, events=34) at evmap.c:402
#1 0x00007f8c26c5d305 in epoll_dispatch (base=0xe60b60, tv=<value optimized out>) at epoll.c:439
#2 0x00007f8c26c4ba56 in event_base_loop (base=0xe60b60, flags=0) at event.c:1603
#3 0x0000000000409d1c in main (argc=3, argv=0x7fff184e5888) at src/main.cpp:575

Discussion

  • Nick Mathewson

    Nick Mathewson - 2013-02-21

    This looks like an internal corruption bug in some data structure.

    Try using the event_enable_debug_mode() function (explained at http://www.wangafu.net/~nickm/libevent-book/Ref1_libsetup.html ) to have Libevent tell you if you're making mistakes about re-initializing already added events

    Using a debugging tool like valgrind might also find some problems.

    Do those methods do anything for you ?

    The code you posted there looks okay, but the parts that do "connection->timer_id = event_add(conn->tmoutevent, &timeOutTime);" seem a bit wrong. The return value from event_add() is 0 on success, -1 on failure. But that probably is not the cause of the crash you're taling reporting.

     
  • Yong Gui

    Yong Gui - 2013-02-22

    Thanks for your reply. We will use event_enable_debug_mode() to collect more info when the crush is reproduced again. And in the the parts "connection->timer_id = event_add(conn->tmoutevent, &timeOutTime);", we use connection->timer_id to verify whether event_add() return successfully. It may not a mistake.

     

Log in to post a comment.