Menu

#5 Dtach Hangs Session

0.7
open
dtach (8)
5
2006-12-26
2006-12-26
Roger
No

I'm currently running mrxvt/dtach in a gdb trying to track when dtach hangs or looses the terminal session.

What I'm seeing here is, during a compile or other active terminal session, the cursor will simply hang and stop updating the screen. The dtach sessions seems to go defunct.

Sometimes, I'll see >20-30 defunct dtach processes using ps -ax. (Because I restart mrxvt and fail to completely kill each dtach process.)

I'll post more when I get more details. But right now, I've yet to get dtach to fault. It's possible anything >= -O2 flags is causing the problem as I currently compiled with debug flags (ie. -g -Wall -Wshadow).

Advise me if there are any better debug methods as I'm currently running something like "gdb mrxvt set args blah blah" and am not sure if I'll see any debug messages using this method!

Discussion

  • Roger

    Roger - 2007-01-05

    Logged In: YES
    user_id=348797
    Originator: YES

    Have been running dtach compiled with debug options and running mrxvt from gdb (as i said, i don't think this is going to catch dtach bugs).

    Dtach seems to be performing more stable but i do see two duplicate dtach sessions (via ps -ax |grep dtach). One carries irssi and I think the dtach session is detaching on it's own and spawning a new dtach session.

    *Note: I only have 5 mrxvt tabs open using dtach for session management! Here, I'm showing 5+ dtach sessions!

    roger 31301 0.0 0.0 1528 212 pts/0 Ss+ 2006 0:00 dtach -A /tmp/dtach.roger-irsssi /usr/bin/irssi
    roger 31302 0.0 0.0 1660 240 ? Ss 2006 0:01 dtach -A /tmp/dtach.roger-irsssi /usr/bin/irssi
    roger 31304 0.0 0.0 1532 216 pts/3 Ss+ 2006 0:05 dtach -A /tmp/dtach.roger-admin /bin/su
    roger 31305 0.0 0.0 1664 244 ? Ss 2006 0:07 dtach -A /tmp/dtach.roger-admin /bin/su
    roger 31307 0.0 0.0 1532 216 pts/5 Ss+ 2006 0:00 dtach -A /tmp/dtach.roger-term3 /bin/bash
    roger 31308 0.0 0.0 1664 244 ? Ss 2006 0:00 dtach -A /tmp/dtach.roger-term3 /bin/bash
    roger 31314 0.0 0.0 1532 216 pts/8 Ss+ 2006 0:00 dtach -A /tmp/dtach.roger-term4 /bin/bash
    roger 31316 0.0 0.0 1664 244 ? Ss 2006 0:00 dtach -A /tmp/dtach.roger-term4 /bin/bash
    roger 14542 0.0 0.0 1724 500 pts/10 R+ 17:20 0:00 grep --colour=auto dtach

     
  • Roger

    Roger - 2007-01-05

    Logged In: YES
    user_id=348797
    Originator: YES

    Well, looks like I've only posted a normal ps -ax output.

    I've also spent the whole day recompiling dtach with -gdb and other options and running within a gdb session and I've yet to get dtach to loose the cursor input from keyboard.

    I'm now looking at mrxvt (materm) for the culprit.

    What I'm using for debugging, is "cat /dev/urandom" within the shell (and is usually effective within mrxvt at showing this bug when I'm working with 5 tabs using dtach along with other eye candy enabled).

    I'm off to debug mrxvt tomorrow. Any ideas, let me know. :-/

     
  • Ned T. Crigler

    Ned T. Crigler - 2007-01-05

    Logged In: YES
    user_id=327158
    Originator: NO

    One thing you can try is to get it to hang first (without running it under
    gdb) and then using gdb dtach <PID of dtach> to get a backtrace and find out
    exactly what it is doing at the moment (and the source line too if you
    compiled it with debugging symbols). If you have strace installed, you can
    also use it to see what system calls a process is doing.

    The ps output you are seeing does look normal; dtach creates two processes
    when using -A, one of them starts the program in the background and the other
    is the attacher.

     
  • Roger

    Roger - 2007-01-06

    Logged In: YES
    user_id=348797
    Originator: YES

    Ok. Crashed. I found 3 dtach sessions when logging back into a session (using dtach -A /tmp/session /bin/bash). I started using gdb with the most recent process and worked backwards to the orginal one. I *forgot* to use bt on two of the processes.

    Attaching to program: /usr/bin/dtach, process 30069
    (no debugging symbols found)
    `system-supplied DSO at 0xb7fc6000' has disappeared; keeping its symbols.
    Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libutil.so.1
    Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.2...
    (no debugging symbols found)...done.
    Loaded symbols for /lib/ld-linux.so.2
    0xb7fc6410 in __kernel_vsyscall ()

    Attaching to program: /usr/bin/dtach, process 32021
    Failed to read a valid object file image from memory.
    Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libutil.so.1
    Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.2...
    (no debugging symbols found)...done.
    Loaded symbols for /lib/ld-linux.so.2
    0xb7fe6410 in ?? ()

    Attaching to program: /usr/bin/dtach, process 32021
    Failed to read a valid object file image from memory.
    Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libutil.so.1
    Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.2...
    (no debugging symbols found)...done.
    Loaded symbols for /lib/ld-linux.so.2
    0xb7fe6410 in ?? ()
    (gdb) bt
    #0 0xb7fe6410 in ?? ()
    #1 0x00000003 in ?? ()
    #2 0x00000004 in ?? ()
    #3 0xbffc5190 in ?? ()
    #4 0x4dd72623 in write () from /lib/libc.so.6
    #5 0x0804983d in ?? ()
    #6 0x00000000 in ?? ()

    Ok. The above traces I did don't look right, but I think I still have the original dtach session active at process 32020. So I opened another session attaching to the /tmp/dtach file linked to #32020 and see the cursor hanging. Running gdb on process 31383 (linked to dtach process 32020):

    Attaching to program: /usr/bin/dtach, process 31383
    Failed to read a valid object file image from memory.
    Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libutil.so.1
    Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.2...
    (no debugging symbols found)...done.
    Loaded symbols for /lib/ld-linux.so.2
    0xb7f97410 in ?? ()
    (gdb) bt
    #0 0xb7f97410 in ?? ()
    #1 0x00000004 in ?? ()
    #2 0x00000000 in ?? ()

    Seeing process 32021 still hanging around still using /tmp/dtach session file too:
    Attaching to program: /usr/bin/dtach, process 32021
    (no debugging symbols found)
    `system-supplied DSO at 0xb7fe6000' has disappeared; keeping its symbols.
    Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libutil.so.1
    Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.2...
    (no debugging symbols found)...done.
    Loaded symbols for /lib/ld-linux.so.2
    0xb7fe6410 in __kernel_vsyscall ()
    (gdb) bt
    #0 0xb7fe6410 in __kernel_vsyscall ()
    #1 0x4dd72623 in write () from /lib/libc.so.6
    #2 0x0804983d in ?? ()
    #3 0x00000000 in ?? ()

    This it the original process using the /tmp/dtach session file:
    Attaching to program: /usr/bin/dtach, process 32020
    Failed to read a valid object file image from memory.
    Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libutil.so.1
    Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.2...
    (no debugging symbols found)...done.
    Loaded symbols for /lib/ld-linux.so.2
    0xb7fe6410 in ?? ()
    (gdb) bt
    #0 0xb7fe6410 in ?? ()
    #1 0x00000003 in ?? ()
    #2 0x00000083 in ?? ()
    #3 0xbffc5194 in ?? ()
    #4 0x4dd72623 in write () from /lib/libc.so.6
    #5 0x08048fcc in ?? ()
    #6 0x00000001 in ?? ()
    #7 0xbffc5194 in ?? ()
    #8 0x00000083 in ?? ()
    #9 0x00000000 in ?? ()

    (I've recompiled mrxvt & dtach so many times, I thought I had it compiled with gdb & nostrip, but may have recompiled it since in order to get it to reproduce this bug. I will recompile tonight and await a crash again -- relogging in and having mrxvt auto execute and reattach seemed to provide this crash along with using menuconfig for linux source config.)

    Strace seems to be the best helper for me ;-) Here it is...

    $ strace dtach -a /tmp/dtach.roger-admin
    execve("/usr/bin/dtach", ["dtach", "-a", "/tmp/dtach.roger-admin"], [/* 66 vars */]) = 0
    brk(0) = 0x804c000
    access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY) = 4
    fstat64(4, {st_mode=S_IFREG|0644, st_size=174516, ...}) = 0
    mmap2(NULL, 174516, PROT_READ, MAP_PRIVATE, 4, 0) = 0xb7f12000
    close(4) = 0
    open("/lib/libutil.so.1", O_RDONLY) = 4
    read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\313"..., 512) = 512
    fstat64(4, {st_mode=S_IFREG|0755, st_size=11596, ...}) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f11000
    mmap2(0x4e2ec000, 12420, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x4e2ec000
    mmap2(0x4e2ee000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1) = 0x4e2ee000
    close(4) = 0
    open("/lib/libc.so.6", O_RDONLY) = 4
    read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20:\314"..., 512) = 512
    fstat64(4, {st_mode=S_IFREG|0755, st_size=1334812, ...}) = 0
    mmap2(0x4dcae000, 1303964, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x4dcae000
    mmap2(0x4dde6000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x137) = 0x4dde6000
    mmap2(0x4ddea000, 9628, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4ddea000
    close(4) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f10000
    set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f106b0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
    mprotect(0x4dde6000, 8192, PROT_READ) = 0
    mprotect(0x4d433000, 4096, PROT_READ) = 0
    munmap(0xb7f12000, 174516) = 0
    ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
    socket(PF_FILE, SOCK_STREAM, 0) = 4
    connect(4, {sa_family=AF_FILE, path="/tmp/dtach.roger-admin"}, 110) = 0
    rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
    rt_sigaction(SIGXFSZ, {SIG_IGN}, {SIG_DFL}, 8) = 0
    rt_sigaction(SIGHUP, {0x8048bd0, [HUP], SA_RESTART}, {SIG_DFL}, 8) = 0
    rt_sigaction(SIGTERM, {0x8048bd0, [TERM], SA_RESTART}, {SIG_DFL}, 8) = 0
    rt_sigaction(SIGINT, {0x8048bd0, [INT], SA_RESTART}, {SIG_DFL}, 8) = 0
    rt_sigaction(SIGQUIT, {0x8048bd0, [QUIT], SA_RESTART}, {SIG_DFL}, 8) = 0
    rt_sigaction(SIGWINCH, {0x8048ba0, [WINCH], SA_RESTART},

     
  • Ned T. Crigler

    Ned T. Crigler - 2007-01-09

    Logged In: YES
    user_id=327158
    Originator: NO

    It looks like it might be stuck at a write call, though it
    is hard to be sure since the binaries have been stripped
    of debugging symbols. There are only a few write calls
    which might make it easier to debug, though.

     
  • Roger

    Roger - 2007-01-22

    Logged In: YES
    user_id=348797
    Originator: YES

    Seems I created a stir on the mrxvt mailing list with this bug.

    Most agree, mrxvt terminal is not the source of this bug as they've repeated this bug with my suggested "cat /dev/urandom" (or /dev/random). Once the terminal stopped responding, they "CTRL \" and were able to escape the dtach session and kill the drone.

    I too agree as I'm able to do this too.

    (Might have time for the next two days for debugging.)

     
  • Joshua Rodman

    Joshua Rodman - 2007-03-07

    Logged In: YES
    user_id=844378
    Originator: NO

    My dtach sessions hang regularly, forcing a kill -9 of the attached (terminal facing) process. This seems especially common if two tuser-facing processes are talking to one software-facing process, but also can be triggered by middle-click mouse pastes. It's to the point where robustness is actually lowered by use of dtach as opposed to plain terminals.

    I'm on amd64, on Debian. I did not seem to have this much trouble on x86, which I no longer use.

     
  • Joshua Rodman

    Joshua Rodman - 2007-04-29

    Logged In: YES
    user_id=844378
    Originator: NO

    It seems to me that one of the 'clients' is wedging, and that is somehow wedging all clients. This is because a kill -9 of the unused client always restores functionality to the dtach which actually has a working terminal to communicate with.

    While it seems worthwhile to find the source of the wedge, would it not be wise to design dtach so that the behavior of the the clients is independent? Clients have no guarantee of receiving all output in any event, so beyond some reasonable buffer, perhaps they should be cut off, which a properly functioning client might interperet as a shutdown event.

     
  • Roger

    Roger - 2007-04-29

    Logged In: YES
    user_id=348797
    Originator: YES

    Just to confirm. This is exactly the type of activity I'm seeing. During a compile, the term will hang and then, one by one, other tabbed terminals will also hang.

    And I do think, when tested, kill -9 returned the session to dtach.

     
  • Ned T. Crigler

    Ned T. Crigler - 2007-04-29

    Logged In: YES
    user_id=327158
    Originator: NO

    I finally found some time to install mrxvt today using the provided dtach
    mrxvtrc file. Doing a cat /dev/urandom test hangs the first terminal (apparently
    because the send queue was filled by mrxvt responding to certain escape codes),
    but doesn't appear to hang the other mrxvt sessions.

    I also tried to do a compile, and after a short while it completed succesfully.

    Can someone provide details on how to reproduce this starting with a fresh
    install of mrxvt? I tried testing this on Debian i386 unstable; by using apt-get
    install mrxvt and then modifying
    http://www.eskimo.com/~roger/programming/mrxvtrc for the .mrvxtrc file.

     
  • Roger

    Roger - 2007-04-29

    Logged In: YES
    user_id=348797
    Originator: YES

    It can take several hours, or even a day or two for one terminal to hang. Once one hangs, the rest usually go pretty quickly. :-)

    cat /dev/urandom has been a pretty good way of reproducing. Any chance this could also be utf8 related?

     

Log in to post a comment.

MongoDB Logo MongoDB