#136 Randoms Windows XPSP3 hangs (ndis-bridge)

v0.8.x (devel)
closed-duplicate
nobody
5
2009-02-10
2008-08-27
eugenesan
No

Hi,

I've noticed that builds created in last several months randomly crashes or hangs my host system.
In earlier versions BSOD with linux.sys failure would appear and on builds since ~1/7/2008 Windows just hangs...
I must notice that there where also some changes to my host system(not sure if changes and crashes appeared simultaneously):
* CPU Pentium M => Core2Duo (IBM T43->T61)
* Windows SP2 => SP3
* No other notable changes performed

In addition crashes appears regardless usage of:
* cobd or scsi
* cobd async or sync
* pcap-bridnge or ndis-bridge

My coLinux starts as service using following command:

"C:\Program Files\coLinux\colinux-daemon.exe" --run-service:D:\My Apps\Batch "coLinux" mem=512 kernel="C:\Program Files\coLinux\vmlinux" exec0="D:\My Apps\pulseaudio\pulseaudio.exe" scsi0=disk,"\\.\PhysicalDrive0" cofs0=c:\ cofs1=d:\ root=/dev/sda2 eth1=ndis-bridge,"Local","00:xx:xx:xx:xx:2B" eth2=ndis-bridge,"VirtualCoLinux","00:xx:xx:xx:xx:00" eth3=tuntap,"VirtualCoLinux","00:xx:xx:xx:xx:60" ttys0=COM5,"BAUD=115200 PARITY=n DATA=8 STOP=1 dtr=on rts=on"

* cofs almost never used
* ttys almost never used, but serial physically hooked to device and probably gets some data on RX once a while.
* VirtualCoLinux is coLinux TAP device used both as Host-Only and NAT gateway (I am using VMWARE network stack for NAT/DHCP).
* partitions (of course) never cross-mounted
* no signs of errors of any kind from linux side(but, maybe, some random networking outages)
* Crashes tends to appear only when is xserver running (via freenx). Lately system hangs on almost every X session init :-(

I am looking into effective way for debugging that issue and possible suggestions about problems my configuration may have.

Thanks ahead

Discussion

  • Henry N.

    Henry N. - 2008-08-29

    Logged In: YES
    user_id=579204
    Originator: NO

    Hello Eugenesan,

    have you checked the last build from http://www.colinux.org/snapshots/ from 23-Aug-2008?

    I ask, because all versions before this snapshot have memory leakages and race conditions in the interrupt handler. I found they by heavy network checks with the ndis-bridge and the tool "netio". The last version was running without problems over many hours. Some more details you can read from "Recent ChangeLog", the revisions from r1114 and r1118 can also be fixed the problem, you described here.

    For debugging, run the command line you have posted from Windows prompt without the parameter "--run-service:" and add the parameter "-v 3" instead (Verbose).

    A small bug I see: Path parameter is broken from space in "My Apps". Is harmless, because you you have full paths for all files there.

    Henry

     
  • eugenesan

    eugenesan - 2008-08-30

    Logged In: YES
    user_id=1010208
    Originator: YES

    Hi Henry,
    Thanks for reply.

    I do use version you are proposing and still experiencing crashes.
    In addition my system currently do not pass switching to runlevel 2, I've even reinstalled the whole distribution.
    Probably something changed in distribution's updated packages, because it was working for months...

    About debugging, I have hard times enabling it. Adding -v # makes no change, debug daemon outputs nothing.
    Is debugging.txt is outdated or I am missing something?

    Thanks ahead
    Eugene

     
  • Henry N.

    Henry N. - 2008-08-30

    Logged In: YES
    user_id=579204
    Originator: NO

    Debugging.txt is up to date. The last you will find here:
    http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/doc/debugging

    To catch some from crashing, the debug-daemon often can't help.

    To trace such, needs to put some debugging into the scripts inside the distribution. You say, that it is a problem in runlevel 2, then try to trace the steps between runlevel 1 and 2. Run you system with runlevel 1 ("init 1") in the coLinux boot parameter. Add 'set -x' in the top of the script for runlevel switching code. Than switch with command 'init 2' on Linux prompt to next runlevel,

    For me it is the file /etc/init.d/rc under SuSE.

    Good idea is also to put 'echo "$0 $service start ..."; sleep 3' into the inner of the for-loop before execute the next script (ones of /etc/init.d/rc2.d/S##...), to see what script will be start as next. - The loop what is locate the K## and S## to start/stop actions. Some distributions have a special environment in the rc scripts to enable a verbose mode.

    This is in my script for trace boot steps:
    blogger "$service start"
    echo "$0 $service start ..."; sleep 3 ### Trace: inserted for coLinux debugging ###
    $i start; status=$?

    Henry

     
  • eugenesan

    eugenesan - 2008-08-30

    Logged In: YES
    user_id=1010208
    Originator: YES

    Hi,

    After reboot debug started to work.
    Digging in...

     
  • eugenesan

    eugenesan - 2008-08-30

    Logged In: YES
    user_id=1010208
    Originator: YES

    Boot-up problems solved, "echo 0 > /proc/sys/vm/vdso_enabled" hangs coLinux.
    Hope my software will survive that...

    Can I enable debug when starting daemon as service?

     
  • Henry N.

    Henry N. - 2008-08-30

    Logged In: YES
    user_id=579204
    Originator: NO

    Ah, thanks. Very good detection.

    Default for Linux VDSO is disabled. I know, that my coLinux SuSE 9.0 hangs, if VDSO is disabled in kernel config. I have explicitly enabled VDSO compatibility in the config. But, I was not thinking about a User or Distry would disable this at runtime.

    If that helps: It would be easy for coLinux kernel code, to make /proc/sys/vm/vdso_enabled read only. ;-)

    Of curse, you can run "colinux-debug-daemon.exe -d -p -s prints=31,misc=31 ..." on a command prompt, before you starts the service from Windows control applet. The parameter "misc=31" there is the same as you would run "colinux-daemon -v 31 ..."

     
  • Henry N.

    Henry N. - 2008-09-07

    Logged In: YES
    user_id=579204
    Originator: NO

    Hello Eugenesan,

    VDSO needs to telling the glibc how to make a syscall. With vdso=0 on startup or with write 0 to vdso_enabled, the VDSO page is no loger fix mapped on ffffe000 (see http://www.trilithium.com/johan/2005/08/linux-gate/\) - it goes to randomly virtual address.

    Debian 4 has problems, if I boot with vdso=1 (coLinux default) and write 0 later. The current terminal kills the INIT task. Linux can not shutdown. The other consoles (tty1, tty2, ...) are still running.

    I can boot all my distries (SuSE 9.0, Debian 3.0, Debian 4.0) with boot parameter vdso=0.
    Of curse, if I boot with 0, there is no problem with write 0 to vdso_enabled.
    vdso=0 is the default config for x86 Kernels ("CONFIG_COMPAT_VDSO is not set").

    Can we set vdso=0 as coLinux default now?
    I think yes. Anybody can test this with boot parameter "vdso=0".

    By the while have enabled the feature for sysenter and sysexit (SEP). All current cpus would have this feature. This has also something to do with the VDSO.

    Eugenesan, please try this build and boot with vdso=0 and lets write to vdso_enabled. This should run without problems.
    Second, please test with parameter vdso=1 (or without this) and lets write 0 to vdso_enabled. Is this crashing?

    New Snapshot exist on http://www.colinux.org/snapshots/

    PS: Before you risk losing data from crash tests, it's good idea to SYNC all data to harddisk before runs colinux-daemon. I have this in my batch files for testings now: http://technet.microsoft.com/en-us/sysinternals/bb897438.aspx

    Henry

     
  • eugenesan

    eugenesan - 2008-10-10

    Hi,

    I still don't know what was the exact reason for crashes/hangs, but I've found how I can avoid this.
    First, In configuration I was bridging (pcap or ndis) on tun interface that also was connected to colinux.
    Second, I was using serial (ttys) feature on USB to RS232 dongle.

    When I disabled both of mentioned above, the problem disappeared.
    At the moment it runs for more then a week without a sign of problem.

    I don't know if binding network interfaces to the same source may couse any problems.
    But I think serial emulation can be a problem in my case, since I am using it on laptop and USB dongle with serial port comes and goes at least once a day.

    BTW: I am using latest testing build 20080923.

    Thank you for help.

     
  • Henry N.

    Henry N. - 2009-02-10
    • labels: --> Crash / BSOD
    • summary: Randoms Windows XPSP3 hangs --> Randoms Windows XPSP3 hangs (ndis-bridge)
    • status: open --> closed-duplicate
     
  • Henry N.

    Henry N. - 2009-02-10

    Bug #2357595 solves many variants of hanging or freezing PCs. Mostly with network traffic between coLinux and host over ndis-bridge.

    If will close this bug now.