#281 Segfault in _mysql_ConnectionObject_dealloc at _mysql.c:698

MySQLdb-1.2
open
Andy Dustman
MySQLdb (285)
5
2012-09-19
2009-05-07
Eli Stevens
No

Running MySQLdb 1.2.3c1 with Python 2.6.2 under apache 1.3, mod_wsgi 2.3, pylons 0.9.7, SQLAlchemy 0.5.3.

Apache is custom built, RHEL4.

Linux version 2.6.9-42.ELsmp (bhcompile@hs20-bc1-1.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:27:17 EDT 2006

Getting occasional segfaults under Apache. The parent process will restart the dead process, which will last for a while and then die again. The segfaults seem to be correlated with higher load on the server, but that could just be due to the larger number of requests per second during those periods (it seems to scale faster than linear, however).

(gdb) bt

0 0x0233afa0 in ?? ()

1 0x00533c93 in _mysql_ConnectionObject_dealloc (self=0x9b77924) at _mysql.c:698

2 0x0797f33b in subtype_dealloc (self=0x9b77924) at Objects/typeobject.c:1018

3 0x07966ca0 in dict_dealloc (mp=0xb6fb2a44) at Objects/dictobject.c:911

4 0x0797f459 in subtype_dealloc (self=0xb77c8ecc) at Objects/typeobject.c:1006

5 0x003dbfa8 in deque_dealloc (deque=0xb77db844) at /mnt/crawlspace3/home/geordan/src/svn/et/misc/geordan/alexandria/build/Python-2.6.2/Modules/_collectionsmodule.c:456

6 0x07966ca0 in dict_dealloc (mp=0xb6f0546c) at Objects/dictobject.c:911

7 0x07947b61 in instance_dealloc (inst=0xb77d6dec) at Objects/classobject.c:668

8 0x07966a8d in PyDict_Clear (op=0xb6f0524c) at Objects/dictobject.c:817

9 0x07968d36 in dict_tp_clear (op=0xb6f0524c) at Objects/dictobject.c:2011

10 0x079d7e6d in collect (generation=2) at Modules/gcmodule.c:714

11 0x079d8865 in PyGC_Collect () at Modules/gcmodule.c:1292

12 0x079cbfce in Py_Finalize () at Python/pythonrun.c:424

13 0x0793c141 in wsgi_python_child_cleanup (data=0x0) at mod_wsgi.c:4440

14 0x0807e4e0 in ap_clear_pool (a=0x94d0dfc) at alloc.c:1937

15 0x0807e702 in ap_destroy_pool (a=0x94d0dfc) at alloc.c:681

16 0x080883c9 in clean_child_exit (code=0) at http_main.c:603

17 0x0808a344 in child_main (child_num_arg=Variable "child_num_arg" is not available.) at http_main.c:5468

18 0x0808a8a6 in make_child (s=Variable "s" is not available.) at http_main.c:5673

19 0x0808a949 in startup_children (number_to_start=9) at http_main.c:5710

20 0x0808b961 in standalone_main (argc=Variable "argc" is not available.) at http_main.c:6113

21 0x0808c7ce in main (argc=3, argv=0xbfe11fb4) at http_main.c:6494

I cannot make the core dump generally available, but would be happy to run any additional commands against the image. Unfortunately, we've been unable to reproduce the problem on our staging / testing servers, so we've only been able to see this by putting 1.2.3c1 into production. We've since rolled production back to 1.2.2 and are just having to deal with the utf-8 related memory leaks, so it's unlikely we'll have much besides this core dump to work with.

Discussion

  • Eli Stevens
    Eli Stevens
    2009-05-08

    After rolling back to 1.2.2, we're still seeing segfaults. Same stack trace. Very puzzled now.

    We're hitting MySQL 5.1.

     
  • Eli Stevens
    Eli Stevens
    2009-05-11

    From one of the cores:

    (gdb) p self->connection
    $5 = {net = {vio = 0x92fb528, buff = 0x965e108 "", buff_end = 0x9660108 "", write_pos = 0x965e108 "", read_pos = 0x965e108 "", fd = 11,
    remain_in_buf = 0, length = 0, buf_length = 0, where_b = 0, max_packet = 8192, max_packet_size = 1073741824, pkt_nr = 2, compress_pkt_nr = 2,
    write_timeout = 31536000, read_timeout = 31536000, retry_count = 1, fcntl = 0, return_status = 0x0, reading_or_writing = 0 '\0',
    save_char = 0 '\0', unused0 = 0 '\0', unused = 0 '\0', compress = 0 '\0', unused1 = 0 '\0', query_cache_query = 0x0, last_errno = 0,
    error = 0 '\0', unused2 = 0 '\0', return_errno = 0 '\0', last_error = '\0' <repeats 511="" times="">, sqlstate = "00000", extension = 0x0},
    connector_fd = 0x0, host = 0x92d3a20 "127.0.0.1", user = 0x94a4758 "arcturus", passwd = 0x92afe68 "(PASSWORD REDACTED)",
    unix_socket = 0x0, server_version = 0x92d3a38 "5.1.24-rc-Yahoo-SMP-log", host_info = 0x92d3a08 "127.0.0.1 via TCP/IP", info = 0x0,
    db = 0x96424e0 "arcturus_user", charset = 0x5f9fc40, fields = 0x0, field_alloc = {free = 0x0, used = 0x0, pre_alloc = 0x0, min_malloc = 32,
    block_size = 8164, block_num = 4, first_block_usage = 0, error_handler = 0}, affected_rows = 0, insert_id = 0, extra_info = 0,
    thread_id = 1010346, packet_length = 0, port = 4306, client_flag = 238223, server_capabilities = 63487, protocol_version = 10, field_count = 0,
    server_status = 0, server_language = 33, warning_count = 0, options = {connect_timeout = 0, read_timeout = 0, write_timeout = 0, port = 0,
    protocol = 0, client_flag = 128, host = 0x0, user = 0x0, password = 0x0, unix_socket = 0x0, db = 0x0, init_commands = 0x0, my_cnf_file = 0x0,
    my_cnf_group = 0x0, charset_dir = 0x0, charset_name = 0x95d5050 "latin1", ssl_key = 0x0, ssl_cert = 0x0, ssl_ca = 0x0, ssl_capath = 0x0,
    ssl_cipher = 0x0, shared_memory_base_name = 0x0, max_allowed_packet = 0, use_ssl = 0 '\0', compress = 0 '\0', named_pipe = 0 '\0',
    rpl_probe = 0 '\0', rpl_parse = 0 '\0', no_master_reads = 0 '\0', separate_thread = 0 '\0', methods_to_use = MYSQL_OPT_GUESS_CONNECTION,
    client_ip = 0x0, secure_auth = 0 '\0', report_data_truncation = 1 '\001', local_infile_init = 0, local_infile_read = 0, local_infile_end = 0,
    local_infile_error = 0, local_infile_userdata = 0x0, extension = 0x0}, status = MYSQL_STATUS_READY, free_me = 0 '\0', reconnect = 0 '\0',
    scramble = "^Q\\7<gt28bdQIYa_8:Y", rpl_pivot = 1 '\001', master = 0x9658c94, next_slave = 0x9658c94, last_used_slave = 0x0,
    last_used_con = 0x9658c94, stmts = 0x0, methods = 0x60a0c20, thd = 0x0, unbuffered_fetch_owner = 0x0, info_buffer = 0x0, extension = 0x0}

     
  • Eli Stevens
    Eli Stevens
    2009-05-20

    After further investigation, it seems like the "methods" pointer is incorrect - it variously is pointing outside of the memory image, or at seemingly random data inside. We suspect a memory overwrite issue, but again, haven't been able to reproduce it in controlled situations.