Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#97 Network broke down

closed-fixed
Henry N.
None
5
2007-03-09
2006-07-06
Anonymous
No

Hello

I've use the stable version of colinux with debian
testing, all works finde but sometimes the vnc
connection broke and when I want to ping the machine
there is a timeout. I've tested to restart the network
with /etc/init.d/networks restart, but the error can't
bring up eth0 will be displayed.

That the network will work I must reboot colinux.

I use th newest version of the WinPCAP driver...

Discussion

  • andy
    andy
    2006-07-13

    Logged In: YES
    user_id=802927

    Same issue here, as soon as network goes under heavy load,
    colinux lost connection, it's impossible to forward a X
    display using plain bitmaps for example.
    I use a native bridget network card and i've tried several
    versions of winpcap driver.
    Switching to colinux 0.6.3 solved the issue.

    The changelog says:
    -Fix for dropped UDP/TCP packets between linux and host daemons.

    -pcap/Bridged:
    * Add promisc="false" in config.xml, or 'nopromisc' as
    last command-line argument
    (ie, eth1=pcap,"Local Area Connection","<FAKE
    MAC>",nopromisc ).
    Default is Promiscuous on.

    Next, i will try 0.6.4 with no Promiscuous mode active.

     
  • Mitch Bradley
    Mitch Bradley
    2006-08-10

    Logged In: YES
    user_id=1131764

    I'm seeing a similar thing, and I can reproduce it pretty
    much at will. I'm using colinux 0.6.4-2.6.11 with
    FedoraCore5-2006.8-ext3-2gb . I have added several packages
    with yum, including gqview (an image viewer). I run
    Cygwin/X on the WinXP Pro host machine, communicating with
    it via colinux-bridged-net-daemon .

    The way to reproduce the problem is as follows:
    a) run "gqview", with DISPLAY set so that the X window comes
    up on the host machine.
    b) Use it to view a large hi-res JPEG image. It initially
    shows a portion of the image in a small window.
    c) Resize the X window by dragging the corner, attempting to
    expose a much larger region of the image.

    While the larger window is repainting,
    colinux-bridged-net-daemon.exe crashes.

    I have also seen this problem happen when using Xpdf.

    The exception occurs at address 0x004014FE, with the
    colinux-bridged-net-daemon module origined at 0x00400000.
    The code at that address is trying to load at [ebx+10h],
    after having just loaded ebx from [ebp-10h], i.e. from the
    stack frame. The value in ebx is 0xE727C14E, which is
    indeed the value at [ebp-10h]. No memory is mapped at the
    address 0xE727C14E, so it appears to me that the stack frame
    is being clobbered.

    I wonder if a buffer is being allocated on the stack, which
    is overflowing in the presence of heavy traffic.

    Okay, looking at the disassembly of that section and doing
    an eyeball decompilation makes me believe that the crash is
    happening in
    conet-bridged-daemon:main.c:co_win32_daemon_read_received()

    I believe that the "message->size" dereference is failing (2
    lines after the "do {" as a result of the local variable
    "buffer" having been overwritten, probably as a result of a
    previous call to pcap_sendpacket().

    I ran colinux-debug-daemon and found the following line in
    the log file, very near the time of the crash. Note that
    the size is bogus/insanely_large/negative.

    <log module="colinux-bridged-net-daemon"
    file="colinux/os/current/user/conet-bridged-daemon/main.c"
    timestamp="00596509.3865912216" local_index="9939"
    facility="1" function="co_win32_daemon_read_received"
    line="132" level="12" driver_index="13451">
    <string>sending to pcap (0x40c168 size 0xff8ab3d3)
    </string>
    </log>

     
  • Mitch Bradley
    Mitch Bradley
    2006-08-11

    Logged In: YES
    user_id=1131764

    Okay, I know what's causing the problem.

    In 6.4 , the network daemons were changed to read multiple
    messages from the daemon pipe, instead of just reading one
    at a time.

    The problem occurs when the pipe has more than 64K of data
    available to be read. The call to ReadFile() only asks for
    64K at a time. If more than 64K is available, there will
    usually be a message fragment at the end of the buffer.

    The message list processing code in
    co_win32_daemon_read_received() discards such fragments,
    emitting a message "Error: Message incomplete" (which you
    can only see if you have the colinux-debug-daemon turned on
    and listening to "misc" messages at level 10 or higher).

    Then the next call to ReadFile() fills the buffer with the
    rest of the message, minus the fragment that was read by the
    previous call. The header that describes the message was in
    that discarded fragment. The message processing code tries
    to interpret bogus data as if it were a message header, and
    is very likely to crash.

    One way to fix it would be to copy the tail fragment down to
    the beginning of the buffer and adjust the address and size
    for the next call to ReadLine(). I would do it myself, but
    I don't have a build enviroment set up, and I'd rather not
    go down that rathole. If anyone already has an environment
    and wants to work with me, I'll supply the code.

    Mitch Bradley - wmb at firmworks dot com

     
  • Mitch Bradley
    Mitch Bradley
    2006-08-11

    Logged In: YES
    user_id=1131764

    Okay, I know what's causing the problem.

    In 6.4 , the network daemons were changed to read multiple
    messages from the daemon pipe, instead of just reading one
    at a time.

    The problem occurs when the pipe has more than 64K of data
    available to be read. The call to ReadFile() only asks for
    64K at a time. If more than 64K is available, there will
    usually be a message fragment at the end of the buffer.

    The message list processing code in
    co_win32_daemon_read_received() discards such fragments,
    emitting a message "Error: Message incomplete" (which you
    can only see if you have the colinux-debug-daemon turned on
    and listening to "misc" messages at level 10 or higher).

    Then the next call to ReadFile() fills the buffer with the
    rest of the message, minus the fragment that was read by the
    previous call. The header that describes the message was in
    that discarded fragment. The message processing code tries
    to interpret bogus data as if it were a message header, and
    is very likely to crash.

    One way to fix it would be to copy the tail fragment down to
    the beginning of the buffer and adjust the address and size
    for the next call to ReadLine(). I would do it myself, but
    I don't have a build enviroment set up, and I'd rather not
    go down that rathole. If anyone already has an environment
    and wants to work with me, I'll supply the code.

    Mitch Bradley - wmb at firmworks dot com

     
  • Henry N.
    Henry N.
    2006-09-11

    • assigned_to: nobody --> henryn
     
  • Henry N.
    Henry N.
    2006-09-11

    Logged In: YES
    user_id=579204

    Hello Mitch,

    thanks for your idea.
    Please would you make your changes in the file, you found
    and add the diff file here?
    For sample
    "diff -au old.c new.c > fix.diff"
    Then, I rebuild the code and you can test it.

     
  • Mitch Bradley
    Mitch Bradley
    2006-09-11

    Logged In: YES
    user_id=1131764

    Here is the proposed patch. Note that this has not been
    tested, nor even compiled to check for syntax errors.

    c:\cygwin\bin\diff -c
    "c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c~"
    "c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c"
    ***
    c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c~
    Sat May 6 15:31:45 2006
    ---
    c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c
    Thu Aug 10 17:47:15 2006
    ***************
    *** 41,46 ****
    --- 41,47 ----
    OVERLAPPED write_overlapped;
    char buffer[0x10000];
    unsigned long size;
    + unsigned long offset;
    } co_win32_overlapped_t;

    typedef struct co_win32_pcap {
    ***************
    *** 114,142 ****
    /* Received packet from daemon. */
    co_message_t *message;
    char * buffer = overlapped->buffer;
    ! long size_left = overlapped->size;
    ! unsigned long message_size;

    ! do {
    message = (co_message_t *)buffer;
    ! message_size = message->size + sizeof (co_message_t);
    ! buffer += message_size;
    ! size_left -= message_size;
    !
    ! /* Check buffer overrun */
    ! if (size_left < 0) {
    ! co_debug("Error: Message incomplete (%ld)\n", size_left);
    ! return CO_RC(ERROR);
    }

    ! co_debug_lvl(network, 12, "sending to pcap (0x%x size
    0x%x)\n", message->data, message->size);
    /* Send packet using pcap. */
    pcap_rc = pcap_sendpacket(pcap_packet.adhandle,
    ! message->data, message->size);
    co_debug_lvl(network, 13, "sent (%x)\n", pcap_rc);

    ! } while (size_left > 0);

    return CO_RC(OK);
    }

    --- 115,151 ----
    /* Received packet from daemon. */
    co_message_t *message;
    char * buffer = overlapped->buffer;
    ! long size_left = overlapped->size + overlapped->offset;

    ! while (size_left > 0) {
    message = (co_message_t *)buffer;
    !
    ! // Do not dereference message->size unless
    we have a complete header
    ! if ( (size_left < sizeof (co_message_t)) ||
    ! (size_left < (message->size + sizeof
    (co_message_t))) ) {
    ! // Copy partial message down to
    bottom of buffer and
    ! // adjust offset so the next read
    splices the new data
    ! // after the old data
    ! memcpy(overlapped->buffer, buffer, size_left);
    ! overlapped->offset = size_left;
    ! co_debug_lvl(network, 14, "Preserving 0x%x trailing
    bytes\n", size_left);
    ! return CO_RC(OK);
    }

    ! buffer += sizeof (co_message_t);
    ! size_left += sizeof (co_message_t);
    !
    ! co_debug_lvl(network, 12, "sending to pcap (0x%x size
    0x%x)\n", buffer, message->size);
    /* Send packet using pcap. */
    pcap_rc = pcap_sendpacket(pcap_packet.adhandle,
    ! buffer, message->size);
    co_debug_lvl(network, 13, "sent (%x)\n", pcap_rc);

    ! buffer += message->size;
    ! size_left -= message->size;
    ! }

    + overlapped->offset = 0;
    return CO_RC(OK);
    }

    ***************
    *** 179,186 ****

    while (TRUE) {
    result = ReadFile(overlapped->handle,
    ! &overlapped->buffer,
    ! sizeof (overlapped->buffer),
    &overlapped->size,
    &overlapped->read_overlapped);

    --- 188,195 ----

    while (TRUE) {
    result = ReadFile(overlapped->handle,
    ! &overlapped->buffer[offset],
    ! sizeof (overlapped->buffer) - offset,
    &overlapped->size,
    &overlapped->read_overlapped);

    ***************
    *** 238,243 ****
    --- 247,253 ----
    overlapped->handle = handle;
    overlapped->read_event = CreateEvent(NULL, FALSE, FALSE,
    NULL);
    overlapped->write_event = CreateEvent(NULL, FALSE, FALSE,
    NULL);
    + overlapped->offset = 0;

    overlapped->read_overlapped.Offset = 0;
    overlapped->read_overlapped.OffsetHigh = 0;

     
  • Henry N.
    Henry N.
    2006-09-11

    Logged In: YES
    user_id=579204

    Hello wmb314,

    thanks for the patch you send me.
    I'll add the file to the tacker and check it later.

     
  • Henry N.
    Henry N.
    2006-09-11

    patch from wmb314 as clean diff

     
    Attachments
  • Henry N.
    Henry N.
    2006-09-12

    Logged In: YES
    user_id=579204

    Thanks wmb314,

    have compiled your patch after changed small typofixies.
    Can not test it. For me it not goes into the
    case "Preserving ... trailing".

    Please check the build.

     
  • Mitch Bradley
    Mitch Bradley
    2006-09-12

    Logged In: YES
    user_id=1131764

    Henry sent me a compiled version of the patch (after fixing
    a few typos). I tested it and it works. With tmy test case
    (described elsewhere in this issue), the patched version
    works perfectly while the original version continues to crash.

    Colinux-debug-daemon shows "Preserving" log messags,
    indicating activation of the new code.

     
  • Henry N.
    Henry N.
    2006-09-13

    patch fix all daemons

     
  • Henry N.
    Henry N.
    2006-09-13

    Logged In: YES
    user_id=579204

    Mitch,

    thanks. We will change it in the mainlaine. By the while,
    here are the updates for all the daemons:

    http://www.henrynestler.com/colinux/testing/stable-0.6.4-
    2/update/

    Henry

     
  • Henry N.
    Henry N.
    2006-09-13

    Logged In: YES
    user_id=579204

    Mitch,

    thanks. We will change it in the mainlaine. By the while,
    here are the updates for all the daemons:

    http://www.henrynestler.com/colinux/testing/stable-0.6.4-
    2/update/

    Henry

     
  • Henry N.
    Henry N.
    2006-09-13

    patch fix all daemons

     
  • Henry N.
    Henry N.
    2007-03-09

    Logged In: YES
    user_id=579204
    Originator: NO

    fixed in all the current snapshots

     
  • Henry N.
    Henry N.
    2007-03-09

    • status: open --> closed
     
  • Henry N.
    Henry N.
    2007-03-09

    • status: closed --> closed-fixed