From: SourceForge.net <no...@so...> - 2006-09-13 09:11:44
|
Bugs item #1517979, was opened at 2006-07-06 10:09 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=1517979&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Henry N. (henryn) Summary: Network broke down Initial Comment: Hello I've use the stable version of colinux with debian testing, all works finde but sometimes the vnc connection broke and when I want to ping the machine there is a timeout. I've tested to restart the network with /etc/init.d/networks restart, but the error can't bring up eth0 will be displayed. That the network will work I must reboot colinux. I use th newest version of the WinPCAP driver... ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2006-09-13 11:11 Message: Logged In: YES user_id=579204 Mitch, thanks. We will change it in the mainlaine. By the while, here are the updates for all the daemons: http://www.henrynestler.com/colinux/testing/stable-0.6.4- 2/update/ Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2006-09-13 11:09 Message: Logged In: YES user_id=579204 Mitch, thanks. We will change it in the mainlaine. By the while, here are the updates for all the daemons: http://www.henrynestler.com/colinux/testing/stable-0.6.4- 2/update/ Henry ---------------------------------------------------------------------- Comment By: Mitch Bradley (wmb314) Date: 2006-09-12 19:59 Message: Logged In: YES user_id=1131764 Henry sent me a compiled version of the patch (after fixing a few typos). I tested it and it works. With tmy test case (described elsewhere in this issue), the patched version works perfectly while the original version continues to crash. Colinux-debug-daemon shows "Preserving" log messags, indicating activation of the new code. ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2006-09-12 10:30 Message: Logged In: YES user_id=579204 Thanks wmb314, have compiled your patch after changed small typofixies. Can not test it. For me it not goes into the case "Preserving ... trailing". Please check the build. ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2006-09-11 21:31 Message: Logged In: YES user_id=579204 Hello wmb314, thanks for the patch you send me. I'll add the file to the tacker and check it later. ---------------------------------------------------------------------- Comment By: Mitch Bradley (wmb314) Date: 2006-09-11 19:51 Message: Logged In: YES user_id=1131764 Here is the proposed patch. Note that this has not been tested, nor even compiled to check for syntax errors. c:\cygwin\bin\diff -c "c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c~" "c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c" *** c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c~ Sat May 6 15:31:45 2006 --- c:/coLinuxSource/coLinux-0.6.4/src/colinux/os/winnt/user/conet-bridged-daemon/main.c Thu Aug 10 17:47:15 2006 *************** *** 41,46 **** --- 41,47 ---- OVERLAPPED write_overlapped; char buffer[0x10000]; unsigned long size; + unsigned long offset; } co_win32_overlapped_t; typedef struct co_win32_pcap { *************** *** 114,142 **** /* Received packet from daemon. */ co_message_t *message; char * buffer = overlapped->buffer; ! long size_left = overlapped->size; ! unsigned long message_size; ! do { message = (co_message_t *)buffer; ! message_size = message->size + sizeof (co_message_t); ! buffer += message_size; ! size_left -= message_size; ! ! /* Check buffer overrun */ ! if (size_left < 0) { ! co_debug("Error: Message incomplete (%ld)\n", size_left); ! return CO_RC(ERROR); } ! co_debug_lvl(network, 12, "sending to pcap (0x%x size 0x%x)\n", message->data, message->size); /* Send packet using pcap. */ pcap_rc = pcap_sendpacket(pcap_packet.adhandle, ! message->data, message->size); co_debug_lvl(network, 13, "sent (%x)\n", pcap_rc); ! } while (size_left > 0); return CO_RC(OK); } --- 115,151 ---- /* Received packet from daemon. */ co_message_t *message; char * buffer = overlapped->buffer; ! long size_left = overlapped->size + overlapped->offset; ! while (size_left > 0) { message = (co_message_t *)buffer; ! ! // Do not dereference message->size unless we have a complete header ! if ( (size_left < sizeof (co_message_t)) || ! (size_left < (message->size + sizeof (co_message_t))) ) { ! // Copy partial message down to bottom of buffer and ! // adjust offset so the next read splices the new data ! // after the old data ! memcpy(overlapped->buffer, buffer, size_left); ! overlapped->offset = size_left; ! co_debug_lvl(network, 14, "Preserving 0x%x trailing bytes\n", size_left); ! return CO_RC(OK); } ! buffer += sizeof (co_message_t); ! size_left += sizeof (co_message_t); ! ! co_debug_lvl(network, 12, "sending to pcap (0x%x size 0x%x)\n", buffer, message->size); /* Send packet using pcap. */ pcap_rc = pcap_sendpacket(pcap_packet.adhandle, ! buffer, message->size); co_debug_lvl(network, 13, "sent (%x)\n", pcap_rc); ! buffer += message->size; ! size_left -= message->size; ! } + overlapped->offset = 0; return CO_RC(OK); } *************** *** 179,186 **** while (TRUE) { result = ReadFile(overlapped->handle, ! &overlapped->buffer, ! sizeof (overlapped->buffer), &overlapped->size, &overlapped->read_overlapped); --- 188,195 ---- while (TRUE) { result = ReadFile(overlapped->handle, ! &overlapped->buffer[offset], ! sizeof (overlapped->buffer) - offset, &overlapped->size, &overlapped->read_overlapped); *************** *** 238,243 **** --- 247,253 ---- overlapped->handle = handle; overlapped->read_event = CreateEvent(NULL, FALSE, FALSE, NULL); overlapped->write_event = CreateEvent(NULL, FALSE, FALSE, NULL); + overlapped->offset = 0; overlapped->read_overlapped.Offset = 0; overlapped->read_overlapped.OffsetHigh = 0; ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2006-09-11 17:37 Message: Logged In: YES user_id=579204 Hello Mitch, thanks for your idea. Please would you make your changes in the file, you found and add the diff file here? For sample "diff -au old.c new.c > fix.diff" Then, I rebuild the code and you can test it. ---------------------------------------------------------------------- Comment By: Mitch Bradley (wmb314) Date: 2006-08-11 05:49 Message: Logged In: YES user_id=1131764 Okay, I know what's causing the problem. In 6.4 , the network daemons were changed to read multiple messages from the daemon pipe, instead of just reading one at a time. The problem occurs when the pipe has more than 64K of data available to be read. The call to ReadFile() only asks for 64K at a time. If more than 64K is available, there will usually be a message fragment at the end of the buffer. The message list processing code in co_win32_daemon_read_received() discards such fragments, emitting a message "Error: Message incomplete" (which you can only see if you have the colinux-debug-daemon turned on and listening to "misc" messages at level 10 or higher). Then the next call to ReadFile() fills the buffer with the rest of the message, minus the fragment that was read by the previous call. The header that describes the message was in that discarded fragment. The message processing code tries to interpret bogus data as if it were a message header, and is very likely to crash. One way to fix it would be to copy the tail fragment down to the beginning of the buffer and adjust the address and size for the next call to ReadLine(). I would do it myself, but I don't have a build enviroment set up, and I'd rather not go down that rathole. If anyone already has an environment and wants to work with me, I'll supply the code. Mitch Bradley - wmb at firmworks dot com ---------------------------------------------------------------------- Comment By: Mitch Bradley (wmb314) Date: 2006-08-11 05:19 Message: Logged In: YES user_id=1131764 Okay, I know what's causing the problem. In 6.4 , the network daemons were changed to read multiple messages from the daemon pipe, instead of just reading one at a time. The problem occurs when the pipe has more than 64K of data available to be read. The call to ReadFile() only asks for 64K at a time. If more than 64K is available, there will usually be a message fragment at the end of the buffer. The message list processing code in co_win32_daemon_read_received() discards such fragments, emitting a message "Error: Message incomplete" (which you can only see if you have the colinux-debug-daemon turned on and listening to "misc" messages at level 10 or higher). Then the next call to ReadFile() fills the buffer with the rest of the message, minus the fragment that was read by the previous call. The header that describes the message was in that discarded fragment. The message processing code tries to interpret bogus data as if it were a message header, and is very likely to crash. One way to fix it would be to copy the tail fragment down to the beginning of the buffer and adjust the address and size for the next call to ReadLine(). I would do it myself, but I don't have a build enviroment set up, and I'd rather not go down that rathole. If anyone already has an environment and wants to work with me, I'll supply the code. Mitch Bradley - wmb at firmworks dot com ---------------------------------------------------------------------- Comment By: Mitch Bradley (wmb314) Date: 2006-08-11 01:18 Message: Logged In: YES user_id=1131764 I'm seeing a similar thing, and I can reproduce it pretty much at will. I'm using colinux 0.6.4-2.6.11 with FedoraCore5-2006.8-ext3-2gb . I have added several packages with yum, including gqview (an image viewer). I run Cygwin/X on the WinXP Pro host machine, communicating with it via colinux-bridged-net-daemon . The way to reproduce the problem is as follows: a) run "gqview", with DISPLAY set so that the X window comes up on the host machine. b) Use it to view a large hi-res JPEG image. It initially shows a portion of the image in a small window. c) Resize the X window by dragging the corner, attempting to expose a much larger region of the image. While the larger window is repainting, colinux-bridged-net-daemon.exe crashes. I have also seen this problem happen when using Xpdf. The exception occurs at address 0x004014FE, with the colinux-bridged-net-daemon module origined at 0x00400000. The code at that address is trying to load at [ebx+10h], after having just loaded ebx from [ebp-10h], i.e. from the stack frame. The value in ebx is 0xE727C14E, which is indeed the value at [ebp-10h]. No memory is mapped at the address 0xE727C14E, so it appears to me that the stack frame is being clobbered. I wonder if a buffer is being allocated on the stack, which is overflowing in the presence of heavy traffic. Okay, looking at the disassembly of that section and doing an eyeball decompilation makes me believe that the crash is happening in conet-bridged-daemon:main.c:co_win32_daemon_read_received() I believe that the "message->size" dereference is failing (2 lines after the "do {" as a result of the local variable "buffer" having been overwritten, probably as a result of a previous call to pcap_sendpacket(). I ran colinux-debug-daemon and found the following line in the log file, very near the time of the crash. Note that the size is bogus/insanely_large/negative. <log module="colinux-bridged-net-daemon" file="colinux/os/current/user/conet-bridged-daemon/main.c" timestamp="00596509.3865912216" local_index="9939" facility="1" function="co_win32_daemon_read_received" line="132" level="12" driver_index="13451"> <string>sending to pcap (0x40c168 size 0xff8ab3d3) </string> </log> ---------------------------------------------------------------------- Comment By: andy (kokoko3k) Date: 2006-07-13 22:22 Message: Logged In: YES user_id=802927 Same issue here, as soon as network goes under heavy load, colinux lost connection, it's impossible to forward a X display using plain bitmaps for example. I use a native bridget network card and i've tried several versions of winpcap driver. Switching to colinux 0.6.3 solved the issue. The changelog says: -Fix for dropped UDP/TCP packets between linux and host daemons. -pcap/Bridged: * Add promisc="false" in config.xml, or 'nopromisc' as last command-line argument (ie, eth1=pcap,"Local Area Connection","<FAKE MAC>",nopromisc ). Default is Promiscuous on. Next, i will try 0.6.4 with no Promiscuous mode active. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=1517979&group_id=98788 |