You can subscribe to this list here.
2004 |
Jan
(64) |
Feb
(530) |
Mar
(266) |
Apr
(580) |
May
(360) |
Jun
(161) |
Jul
(185) |
Aug
(164) |
Sep
(123) |
Oct
(160) |
Nov
(59) |
Dec
(84) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(156) |
Feb
(95) |
Mar
(124) |
Apr
(81) |
May
(79) |
Jun
(179) |
Jul
(35) |
Aug
(64) |
Sep
(56) |
Oct
(57) |
Nov
(18) |
Dec
(41) |
2006 |
Jan
(65) |
Feb
(37) |
Mar
(59) |
Apr
(73) |
May
(65) |
Jun
(27) |
Jul
(54) |
Aug
(76) |
Sep
(103) |
Oct
(23) |
Nov
(45) |
Dec
(29) |
2007 |
Jan
(41) |
Feb
(47) |
Mar
(61) |
Apr
(24) |
May
(14) |
Jun
(6) |
Jul
(23) |
Aug
(30) |
Sep
(16) |
Oct
(9) |
Nov
(53) |
Dec
(36) |
2008 |
Jan
(19) |
Feb
(49) |
Mar
(74) |
Apr
(21) |
May
(24) |
Jun
(5) |
Jul
(9) |
Aug
(53) |
Sep
(26) |
Oct
(23) |
Nov
(32) |
Dec
(19) |
2009 |
Jan
(47) |
Feb
(49) |
Mar
(39) |
Apr
(61) |
May
(28) |
Jun
(19) |
Jul
(12) |
Aug
(10) |
Sep
(31) |
Oct
(16) |
Nov
(60) |
Dec
(26) |
2010 |
Jan
(17) |
Feb
(9) |
Mar
(32) |
Apr
(11) |
May
(24) |
Jun
(33) |
Jul
(5) |
Aug
(2) |
Sep
(7) |
Oct
(8) |
Nov
(17) |
Dec
(7) |
2011 |
Jan
(12) |
Feb
(16) |
Mar
(2) |
Apr
(12) |
May
(5) |
Jun
(10) |
Jul
(3) |
Aug
(3) |
Sep
(2) |
Oct
(1) |
Nov
(17) |
Dec
(1) |
2012 |
Jan
(9) |
Feb
(9) |
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
(4) |
Aug
(8) |
Sep
(11) |
Oct
(1) |
Nov
(2) |
Dec
(2) |
2013 |
Jan
|
Feb
(7) |
Mar
(4) |
Apr
(10) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
(3) |
2016 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Alexey E. <al...@gm...> - 2009-04-21 08:14:52
|
Hi, I had a Host crash (BSOD) when running coLinux together with VirtualBox in VMX mode. Can you take a look at it ? http://www.virtualbox.org/ticket/3724 -- -Alexey Eromenko "Technologov", 21.4.2009. |
From: SourceForge.net <no...@so...> - 2009-04-19 18:42:40
|
Support Requests item #2485983, was opened at 2009-01-04 18:17 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622064&aid=2485983&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Install Problem (example) Group: v1.0 (example) Status: Open Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Booting a WUBI image from c:\ drive in CoLinux Initial Comment: I am trying to boot an image that was installed by WUBI on my C:\ drive. If someone could help me get this working, then Linux would basically run natively from within Windows without requiring any new partitions. That would be awesome. WUBI puts its linux image in the location C:\ubuntu\disks\root.disk So I modified the example.conf file to say cobd0="C:\ubuntu\disks\root.disk" But alas it does not boot properly. I have included a dump from the screen of what colinux-daemon.exe outputs when it is trying to boot. It just repeats request_module: runaway loop modprobe binfmt-464c a few times and then hangs. I think it is to do with the line ReiserFS: cobd0: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on cobd0 that comes up when it is booting. I think this may be because I left the root pointing to root=/dev/cobd0 in the example.conf file when maybe WUBI takes it from somewhere else. Help would be greatly appreciated!! ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-19 18:42 Message: Hi, did anything else transpired on this effort? I am willing to help move this idea forward as I believe it has value. I have about ten different PCs I can use to test this if I can make it work. So before I start I would like to know if there was any progress since January when this was initially posted. Thanks! Sergio. ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-01-04 19:22 Message: Oh, sorry, have not seen your attchement. Please forget the scsi stuff. cobd is ok, the filesystem ext2 was mounted correctly. But you need to prepere some thing before you can run it. I don't know all. First, install please the modules manualy. The modules are in the file vmlinux-modules.tar.gz in your coLinux directory. Run WUBI and unpack the modules file as root with this command: tar xzf vmlinux-modules.tar.gz -C / Check, on what mount point is your file mounted. It is something like sda1 or hda1. WUBI does not know "cobd0". Use the mount you found from running WUBI and replace the cobd0 in the colinux config with this. Remove initrd from colinux config. For the first boot, bypass the normal runlevel to have a simple shell. Add this to your boot parameters: init 1 or init=/bin/bash I think there needs more work to bypass the loop mount. ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-01-04 19:09 Message: I'm afraid, that WOBI does not use a raw single partition in the file. If WUBI use whole disk with partitions and it is only one file, You can try to replace this lines: cobd0="C:\ubuntu\disks\root.disk" root=/dev/cobd0 with scsi0=disk,"C:\ubuntu\disks\root.disk" root=/dev/sda1 The scsi0 is usable only in the devel version, you can get it from http://www.colinux.org/snapshots/ For more help, I need to kwon how WUBI store the data in the file. Please post the the output from 'mount' running under WUBI here. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622064&aid=2485983&group_id=98788 |
From: Henry N. <hen...@ar...> - 2009-04-16 07:41:40
|
Hello, Next release candidate 0.7.4-rc2 as build 20090415 is available from snapshot page under "stable branch". http://www.colinux.org/snapshots/ This is a pure copy from currently branch devel. Users of devel 0.8.0 will not find any differences as only the versions number. Notes about all changes you can read from file NEWS and Changelog there. In previous candidate was detected a FPU Floating Point error between tasks switches (Related Bugs #2748015, #2756909). Please update modules manualy, because initrd can't do this automatically: "tar xzf vmlinux-modules.tar.gz -C /" This is necessary especialy for using Raid5 and Raid6. -- Henry N. |
From: coLinux a. <col...@he...> - 2009-04-16 04:11:08
|
The autobuild system has detected a new revision in the source repository. Review last changed from changelog.txt, also attached in mail. Download the compiled version: http://www.henrynestler.com/colinux/autobuild/devel-20090415/ colinux-0.8.0-20090415.src.tgz (685429 Bytes) daemons-0.8.0-20090415.dbg.zip (590980 Bytes) daemons-0.8.0-20090415.zip (477759 Bytes) modules-2.6.22.18-co-0.8.0-20090415.tgz (2636374 Bytes) vmlinux-2.6.22.18-co-0.8.0-20090415.zip (1672568 Bytes) Note, the autobuild compilation does not include an installer. Remember to reload the driver with these commands: colinux-daemon.exe --remove-driver colinux-daemon.exe --install-driver Inside coLinux please update modules as follow: rm -rf /lib/modules/*-co-* tar -xzf modules-*-co-*-20090415.tgz -C / The autobuild compilations are not official releases of Cooperative Linux software. There is no warranty that any autobuild version is stable. If use this autobuild version, please give us feedback of your experience. Job runs on machine with 64 bit version of gcc 4.1.2. A service from http://gcc.gnu.org/wiki/CompileFarm -- Lots of fun with newest version, Henry Nestler ------------------------------------------------------------------------ r1244 | henryn | 2009-04-15 18:28:37 +0000 (Wed, 15 Apr 2009) | 1 line Changed paths: M /branches/devel/NEWS M /branches/devel/RUNNING * Very small text changes. ------------------------------------------------------------------------ r1243 | henryn | 2009-04-15 18:23:00 +0000 (Wed, 15 Apr 2009) | 3 lines Changed paths: M /branches/devel/patch/base-2.6.22.diff * Bugs #2748015, #2756909: Floating point errors, if more as one task used FPU extremely. Go back to safer variant with "co_switch_wrapper_protected()". Reverts back changes from SVN r1237. ------------------------------------------------------------------------ |
From: SourceForge.net <no...@so...> - 2009-04-15 20:15:37
|
Bugs item #2756909, was opened at 2009-04-13 00:23 Message generated for change (Settings changed) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Linux Kernel Group: v0.8.x (devel) Status: Closed Resolution: Fixed Priority: 8 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Henry N. (henryn) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-15 22:10 Message: This bug is fixed now by reverting the changes from SVN r1237 (Floating point optimizations for operating switch). It's committed as SVN revision r1243 (devel) and r1245 (stable). New snapshots are available on http://www.colinux.org/snapshots/ Keith, many thanks for reporting and helpfully test environments. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-14 08:54 Message: Hi, I have read carefully the email. I have tried your code, both dbl and int version. I cannot see the problem. It would be very interesting if you (nobody :=) )could test my colinux version. If you can, please concact me at paolo DOT minazzi AT gmail DOT com I send you a link, so you can try a lettle different version. It help us to understand better the problem. On my hardware (3 PCs) I cannot see this problem. Thanks, Paolo ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 23:40 Message: I called Stano, who has reported the same problem on UML. He sayed, that this bug is not solved in UML, and he has this workaround: "The only thing that helps in my case is running the guest with mode=skas0. This eliminated the problem completely and the guest is running for months without any problem." Currently I can not find what skas0 does, I not found the mode changer. The test have more modified, so I can say, it is not a stack clobbering: #include <stdio.h> volatile double theDouble; int main(int argc, char* argv[]){ theDouble = 1; while(1){ usleep(0); if(theDouble != 1){ printf("Double test fails!\n"); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); break; } } return 0; } Compiled with "gcc -ggdb -o dblchange dblchange.c" on Debian 4.0 (gcc 4.1.2). Here some of the errors: Double test fails! - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: nan (FFF8000000000000) - current Double: 1.000000 (3FF0000000000000) ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 22:58 Message: Aah, yeah the double instead of integer thing wasn't that clever :) I changed the code to really use integers this time and it doesn't crash anymore. Seems only the double operations/registers get hosed. Also when the int registers would be unreliable I would expect alot more errors popping up when running the guest system. As for reverting, I think that's the best solution for now. But I did notice the speed improvements in the 0.8.0 code base as opossed to the 0.7.3 one and it was quite nice, so I would love to see a working version of the speed enhancements. Henry, thanks for helping hunting this bug down and good luck with the colinux development. It's been real. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 17:32 Message: The comes from changes in SVN revision r1237. 2009-03-21T23:56:07 henryn r1237 * Remove co_switch_wrapper_protected and all workaround for SSE/MMX on raid modules. Reverts the workaround from SVN r1212, related Bugs #2524658, #2551241. r1236 20090319-Snapshot runs r1237 20090321-Snapshot fails Tested dblchange-nosleep.c and "openssl genrsa -out /dev/null 4096" in fltk console. Snapshorts are from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ I feel, we should revert this change in this release to the slower, but saver code. Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 17:11 Message: Hello Keith, in intchange.c: > double theDouble; > double theLastDouble; there you used also double, not integer. But very nice to see, that your dblchange.c fails very shortly after starting the "openssl genrsa -out /dev/null 4096" in second console. I have little tuned the fail by removing some calculations and force a task switch before the compair: dblchange-nosleep.c: #include <stdio.h> int main(int argc, char* argv[]){ double theDouble, theLastDouble; theDouble = 1; while(1){ theDouble += 1; theLastDouble = theDouble; sleep(0); /* force task switch here */ if(theLastDouble != theDouble){ printf("Double test fails!\n"); printf("- previous Double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current Double: %f (%LX)\n", theDouble, theDouble); break; } } return 0; } Some example failures: Double test fails! - previous Double: 151.000000 (4062E00000000000) - current Double: 151.000000 (4062E00000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: 2.000000 (4000000000000000) - current Double: 2.000000 (4000000000000000) Double test fails! - previous Double: 6.000000 (4018000000000000) - current Double: 6.000000 (4018000000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) This fails also, if I remove the "sleep" completely: - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: 27945100.000000 (417AA688C0000000) Double test fails! - previous Double: 56525465.000000 (418AF414C8000000) - current Double: 56525465.000000 (418AF414C8000000) So, it is not the sleep self. It is the task switch, and/or something stupid in the keygen. I will check this some revisions before we changed the FPU save/restore (20090321, SVN r1237). Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 11:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-15 20:15:18
|
Bugs item #2748015, was opened at 2009-04-09 17:54 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Linux Kernel Group: v0.8.x (devel) >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) >Assigned to: Henry N. (henryn) Summary: PuTTY failing after a while with "Server key not valid" Initial Comment: Hi, a few days ago I upgraded my coLinux install to version 0.8.0 snapshot 20090329. While using the ndis-bridge feature I noticed my SSH connections crashing after a while (like 20 minutes, sometimes less, sometimes after hours). PuTTY and WinSCP report the error as "Server's host key did not match the signature supplied". WinSCP is build on the SSH code of PuTTY so it seems to be a low-level error. I tested a few setups to isolate the problem: - coLinux 0.8.0 (20090329) with ndis-bridge: fails after a while. - coLinux 0.8.0 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with ndis-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.3 (20080608) with pcap-bridge: still going strong after 2 hours and alot of traffic. The fact that I get the same error with the RC1 and the 0.8.0 development version is not supprising as it should be based on the same code. >From searching the net I found these possible causes for the problem: - The cached server key signature is not valid anymore (would prevent me from logging in) - The network traffic gets mauled between server and client - The client or server software can be at fault. My setup specs: - AMD AthlonXP 3800+, 2gb ram - Windows XP Pro SP3 + all updates to date - Network card: NVIDIA nForce 10/100/1000 Mbps Ethernet - Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) - Server software: OpenSSL (0.9.8j pacman package #1) / OpenSSH (5.1p1 pacman package #2) - Client software: PuTTY (0.60) / WinSCP (4.1.7 build 413) I tested the virtual machine on the windows host system. There cannot be any signal decay on the wire cause the bridged network shouldn't use it. This, and the fact I cannot reproduce the problem with version 0.7.3, leads me to believe the newer colinux versions are corrupting the data somewhere in between. I would like to know what you make of this problem. Solutions/workarounds and reproducability confirmations are also welcome. Thanks, Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-15 22:15 Message: This bug depends on a wrongly handled FPU save/restore for operating system switch. It was better to see with the test programs in Bug #2756909, and is fixed now by reverting the changes from SVN r1237 (Floating point optimizations for operating switch). It's committed as SVN revision r1243 (devel) and r1245 (stable). New snapshots are available on http://www.colinux.org/snapshots/ Keith, many thanks for reporting and helpfully test environments. ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 16:33 Message: First scene: I was idle in prompt with PuTTY on Host (no wire) and at same time was logged in from other machine with "ssh -o RekeyLimit=1K hn@192.168.2.104" and doing some compiling stuff under coLinux. Both ssh-sessions killed at same time. PuTTY with "Server's host key did not match the signature supplied", and the ssh with: """ RSA_public_decrypt failed: error:0407006A:rsa routines:RSA_padding_check_PKCS1_type_1:block type is not 01 key_verify failed for server_host_key """ Second scene: I was logged in with PuTTY (option key regen 1 minute) via *tuntap*, and on first nt-console was running the dblchange.c (see bug #2756909), on second nt-console (ALT-F2) was running "openssl genrsa -out /dev/null 4096". At same time the dblchange.c detected the error, my PuTTY was terminated with the error message "Server's host key did not match the signature supplied". The difference is, that I not was using pcap-bridge or ndis-bridge. Keith, you are right, that both bugs have some the same source. But, please lets follow the bug #2756909. That is better, as waiting the termination of PuTTY. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 00:24 Message: I cannot get version 0.7.3 connections to crash by increasing the rekey limit. I also tried cygwin's ssh with the RekeyLimit=1k option and it does not crash. PuTTY is still crashing though. And the sleep command is still dying occasionally, but this bug is being tracked here http://sourceforge.net/tracker/?func=detail&aid=2756909&group_id=98788&atid=622063 Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-12 19:37 Message: Hello Keith, > I would like to ask anyone (Henry ;) ) to try and reproduce the crashing > by setting your rekey PuTTY option to 1 minute. Yes. It's crashing with this setting after ~15 Minutes. I was running "watch cat /proc/colinux/stats" inside PuTTY. I have restarted PuTTY, and this is running without this problem longer than 60 minutes now. That to rarely to resolve the problem. PuTTY 0.60 Debian 4.0, openssl 0.9.8c-4etch5, openssh-server 4.3p2-9etch3 coLinux 0.7.4-rc1 (20090329) pcap-bridge on Realtek RTL8102E Family PCI-E Fast Ethernet NIC Hardware Checksum disabled. Only with this option I can connect to ssh from host, see Bug #2688891. The same coLinux connected from native Linux with "ssh -o RekeyLimit=1K user@192.168.2.104" the "watch cat /proc/colinux/stats" runs without problems for more as 60 minutes now. I can see the key-re-generation by tcpdump from fltk console for every ~20 seconds. The different here are the ssh vs. PuTTY, and this here goes over the wire. Can you check this with Cygwin's ssh on the host? As workaround you can use tuntap for your PuTTY login. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-12 16:22 Message: Bug research update. I've done the netio performance benchmark twice from windows to colinux and twice from colinux to windows (four runs total). Here are the results: >From windows to colinux #1: TCP connection established. Packet size 1k bytes: 11246 KByte/s Tx, 17599 KByte/s Rx. Packet size 2k bytes: 11296 KByte/s Tx, 18233 KByte/s Rx. Packet size 4k bytes: 11561 KByte/s Tx, 18841 KByte/s Rx. Packet size 8k bytes: 11566 KByte/s Tx, 19752 KByte/s Rx. Packet size 16k bytes: 11564 KByte/s Tx, 20031 KByte/s Rx. Packet size 32k bytes: 11586 KByte/s Tx, 18700 KByte/s Rx. Done. >From windows to colinux #2: TCP connection established. Packet size 1k bytes: 11331 KByte/s Tx, 17431 KByte/s Rx. Packet size 2k bytes: 11228 KByte/s Tx, 17502 KByte/s Rx. Packet size 4k bytes: 11564 KByte/s Tx, 17893 KByte/s Rx. Packet size 8k bytes: 11542 KByte/s Tx, 18682 KByte/s Rx. Packet size 16k bytes: 11500 KByte/s Tx, 16087 KByte/s Rx. Packet size 32k bytes: 10794 KByte/s Tx, 20235 KByte/s Rx. Done. >From colinux to windows #1: TCP connection established. Packet size 1k bytes: 17529 KByte/s Tx, 11034 KByte/s Rx. Packet size 2k bytes: 17824 KByte/s Tx, 11249 KByte/s Rx. Packet size 4k bytes: 17897 KByte/s Tx, 10706 KByte/s Rx. Packet size 8k bytes: 19426 KByte/s Tx, 10716 KByte/s Rx. Packet size 16k bytes: 19306 KByte/s Tx, 11464 KByte/s Rx. Packet size 32k bytes: 19284 KByte/s Tx, 11487 KByte/s Rx. Done. >From colinux to windows #2: TCP connection established. Packet size 1k bytes: 17279 KByte/s Tx, 11073 KByte/s Rx. Packet size 2k bytes: 17536 KByte/s Tx, 11239 KByte/s Rx. Packet size 4k bytes: 17704 KByte/s Tx, 11451 KByte/s Rx. Packet size 8k bytes: 17088 KByte/s Tx, 11338 KByte/s Rx. Packet size 16k bytes: 19456 KByte/s Tx, 11458 KByte/s Rx. Packet size 32k bytes: 19034 KByte/s Tx, 11500 KByte/s Rx. Done. Nothing interesting here. Also I looked at the netio description and source, to me it doesn't seem to do integrity checks on the data being received. The way I understand the error is that the key or packets get corrupted along the way. So it would be nice to know if what should have been sent out also gets received unharmed at the other end. After looking over my last log I found that the PuTTY sessions which crashed were exactly x hours old. Where x is mostly 1, 2 or 3 hours. See below: xxx pts/0 x.x.x.x Fri Apr 10 16:56 - 17:56 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 20:13 - 21:13 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 21:14 - 04:14 (07:00) xxx pts/2 x.x.x.x Fri Apr 10 20:14 - 03:14 (07:00) (These results are older and I'm not 100% sure these sessions have crashed but their entries are suspicious.) xxx pts/1 x.x.x.x Tue Apr 7 13:38 - 18:38 (05:00) xxx pts/0 x.x.x.x Tue Apr 7 19:57 - 20:57 (01:00) xxx pts/1 x.x.x.x Tue Apr 7 22:59 - 01:59 (03:00) xxx pts/0 x.x.x.x Tue Apr 7 22:59 - 23:59 (01:00) xxx pts/1 x.x.x.x Thu Apr 9 11:35 - 12:35 (01:00) xxx pts/0 x.x.x.x Thu Apr 9 11:34 - 15:34 (04:00) Then I remembered there is a 3600 value configuration option in sshd_config so I changed that to a more frequent value (10 seconds). The configuration option is the SSH-1 key regeneration interval (KeyRegenerationInterval). But I use SSH protocol v2 so this configuration parameter did not affect my putty connection crashes. Putty also has such an option, located in the session editor screen Connection -> SSH -> Kex option "Max minutes before rekey (0 for no limit)". When I changed this to 1 minute the session crashes became much more frequent. See below: xxx pts/1 x.x.x.x Sat Apr 11 19:42 - 19:44 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 19:27 - 23:58 (04:30) xxx pts/1 x.x.x.x Sat Apr 11 16:25 - 16:27 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 16:25 - 16:31 (00:06) xxx pts/0 x.x.x.x Sat Apr 11 13:46 - 15:46 (02:00) xxx pts/1 x.x.x.x Sat Apr 11 13:45 - 13:47 (00:02) xxx pts/1 x.x.x.x Sat Apr 11 13:43 - 13:44 (00:01) xxx pts/1 x.x.x.x Sat Apr 11 13:13 - 13:14 (00:01) xxx pts/0 x.x.x.x Sat Apr 11 13:12 - 13:45 (00:33) Now there are some sessions which lasted alot longer than the others. This is due to me having two different sessions open most of the time, one for running the 'echo $RANDOM' script and one for calling 'last | head' when the other session had crashed. The 'last | head' session does not generate much output and this seems to affect the probability of the connection crashing. Not sure as of yet how exactly the output and key exchange are related. I am currently trying to reproduce the connection crashing with the increased number of key exchanges with version 0.7.3, but I don't think it will crash. I would like to ask anyone (Henry ;) ) to try and reproduce the crashing by setting your rekey PuTTY option to 1 minute. Thanks. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-10 19:36 Message: Hello Keith, please open a separate Tracker for the sleep bug. It has nothing to do with putty. I can confirm it under Debian 4.0 with coLinux 0.7.4-rc1 on FLTK console, the message is sleep: xnanosleep.c:58: xnanosleep: Assertion `0 <= seconds' failed. For the network, it is normal, that all packets from Windows Host to coLinux Guest goes also over the wire out. Windows does not known, that we are in capture mode on this network interface. My question was more, goes the ssh connection over the wire. You sad no. Then please lets test some netio from Windows to coLinux. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-10 18:20 Message: After further investigating the problem I found this thread on the net: http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html It mentions something strange happening in the User Mode Linux kernel which causes random variables in random processes to get corrupted when doing something with OpenSSL. My little bash script became very unreliable when starting an openssl key generation process in another putty. Normally I get a neverending string of random decimal numbers, but now the sleep program starts aborting with message: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted Can someone please try to reproduce the error? Start this bash script in one putty session: while true; do echo -n "$RANDOM"; sleep 0.1; done Start the following command in another one: openssl genrsa -out /dev/null 4096 It should be obvious the sleep command is failing, 'cause there are alot of error messages. I used colinux 0.8.0 (20090329) this time with the pcap bridge for networking. Also, henryn, I monitored my network traffic and the network traffic does reach the wire. My network is very reliable, but of course we can't rule out the packet corruption is comming from interferance on the wire. But this does not explain sleep aborting when doing something with OpenSSL. This seems like a whole other bug, but I think the networking thing and the corruption of memory are related. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-09 23:12 Message: Thanks for the quick response. Above I wrote: I tested the virtual machine on the windows host system. This means I used the same desktop/computer system to run the virtual machine and to run the client software (PuTTY, WinSCP). Therefore I concluded the traffic does not go over the wire. As for the hours between my bug report and this comment the same virtual machine running under coLinux 0.7.3 is still doing great. I did not experience any corrupted downloads inside the vm, but I will test this later on. Also I noticed my sleep command aborting sometimes while running this bash script (to generate traffic) while true; do echo -n "$RANDOM"; sleep 0.5; done It blurts somekind of assertion not being valid or something in xnanosleep.c on some line, will report the actual error message later on. Maybe this is related, cause I don't get such aborts with the 0.7.3 version. I will investigate the problem further tomorrow or the day after that. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-09 19:18 Message: Hello Keith, do you use PuTTY on same desktop where coLinux is running? Or goes it over the wire? I can not assume, that this is a problem on the network bridge between coLinux and Host. I use PuTTY every day for many hours and never have corrupted data, or such errors. I'm auto logged in with public + private key pairs. PuTTY and coLinux runs on the same desktop. For me, I use PuTTY as connection between Host (Windows) and Guest (coLInux). I also have no errors from network mounts and getting downloads from internet via this bridging interface. coLinux ndis-bridge have heavy tested with "netio" and there are no errors. Please test your network connection without ssh, for example with netio ( http://www.nwlab.net/art/netio/netio.html ). Use the -t option to test only TCP (ssh uses TCP only). Test first from your location where you have detected the errors. Than test the connection between your Host (Windows) and Guest (coLinux). Changes on pcap-bridge betwen 0.7.3 and 0.7.4 are very rarely. Here is the list of changes: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/user/conet-bridged-daemon/?view=log Revision 1057 was the release build for 0.7.3. So, there exist only 4 changes up to version 0.7.4. The biggest change was revision 1222, this is first included in build 20090227. To find the regression, you can use snapshots from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-15 20:10:49
|
Bugs item #2756909, was opened at 2009-04-13 00:23 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Linux Kernel Group: v0.8.x (devel) >Status: Closed >Resolution: Fixed >Priority: 8 Private: No Submitted By: Nobody/Anonymous (nobody) >Assigned to: Henry N. (henryn) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-15 22:10 Message: This bug is fixed now by reverting the changes from SVN r1237 (Floating point optimizations for operating switch). It's committed as SVN revision r1243 (devel) and r1245 (stable). New snapshots are available on http://www.colinux.org/snapshots/ Keith, many thanks for reporting and helpfully test environments. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-14 08:54 Message: Hi, I have read carefully the email. I have tried your code, both dbl and int version. I cannot see the problem. It would be very interesting if you (nobody :=) )could test my colinux version. If you can, please concact me at paolo DOT minazzi AT gmail DOT com I send you a link, so you can try a lettle different version. It help us to understand better the problem. On my hardware (3 PCs) I cannot see this problem. Thanks, Paolo ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 23:40 Message: I called Stano, who has reported the same problem on UML. He sayed, that this bug is not solved in UML, and he has this workaround: "The only thing that helps in my case is running the guest with mode=skas0. This eliminated the problem completely and the guest is running for months without any problem." Currently I can not find what skas0 does, I not found the mode changer. The test have more modified, so I can say, it is not a stack clobbering: #include <stdio.h> volatile double theDouble; int main(int argc, char* argv[]){ theDouble = 1; while(1){ usleep(0); if(theDouble != 1){ printf("Double test fails!\n"); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); break; } } return 0; } Compiled with "gcc -ggdb -o dblchange dblchange.c" on Debian 4.0 (gcc 4.1.2). Here some of the errors: Double test fails! - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: nan (FFF8000000000000) - current Double: 1.000000 (3FF0000000000000) ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 22:58 Message: Aah, yeah the double instead of integer thing wasn't that clever :) I changed the code to really use integers this time and it doesn't crash anymore. Seems only the double operations/registers get hosed. Also when the int registers would be unreliable I would expect alot more errors popping up when running the guest system. As for reverting, I think that's the best solution for now. But I did notice the speed improvements in the 0.8.0 code base as opossed to the 0.7.3 one and it was quite nice, so I would love to see a working version of the speed enhancements. Henry, thanks for helping hunting this bug down and good luck with the colinux development. It's been real. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 17:32 Message: The comes from changes in SVN revision r1237. 2009-03-21T23:56:07 henryn r1237 * Remove co_switch_wrapper_protected and all workaround for SSE/MMX on raid modules. Reverts the workaround from SVN r1212, related Bugs #2524658, #2551241. r1236 20090319-Snapshot runs r1237 20090321-Snapshot fails Tested dblchange-nosleep.c and "openssl genrsa -out /dev/null 4096" in fltk console. Snapshorts are from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ I feel, we should revert this change in this release to the slower, but saver code. Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 17:11 Message: Hello Keith, in intchange.c: > double theDouble; > double theLastDouble; there you used also double, not integer. But very nice to see, that your dblchange.c fails very shortly after starting the "openssl genrsa -out /dev/null 4096" in second console. I have little tuned the fail by removing some calculations and force a task switch before the compair: dblchange-nosleep.c: #include <stdio.h> int main(int argc, char* argv[]){ double theDouble, theLastDouble; theDouble = 1; while(1){ theDouble += 1; theLastDouble = theDouble; sleep(0); /* force task switch here */ if(theLastDouble != theDouble){ printf("Double test fails!\n"); printf("- previous Double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current Double: %f (%LX)\n", theDouble, theDouble); break; } } return 0; } Some example failures: Double test fails! - previous Double: 151.000000 (4062E00000000000) - current Double: 151.000000 (4062E00000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: 2.000000 (4000000000000000) - current Double: 2.000000 (4000000000000000) Double test fails! - previous Double: 6.000000 (4018000000000000) - current Double: 6.000000 (4018000000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) This fails also, if I remove the "sleep" completely: - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: 27945100.000000 (417AA688C0000000) Double test fails! - previous Double: 56525465.000000 (418AF414C8000000) - current Double: 56525465.000000 (418AF414C8000000) So, it is not the sleep self. It is the task switch, and/or something stupid in the keygen. I will check this some revisions before we changed the FPU save/restore (20090321, SVN r1237). Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 11:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-14 17:57:26
|
Bugs item #2760666, was opened at 2009-04-14 04:50 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2760666&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Crash / BSOD Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: VirtualBOX causes BSOD Initial Comment: Running Parallels or VMWare to virtualise a virtual machine doesnt result in crashing. Trying to boot Windows XP on VirtualBOX at the same time as running COLinux causes it to BSOD the host system (Windows Vista) Its strange but why does VBox BSOD the system whereas Parallels/VMWare can virtualise fine alongside COLinux.. Reproduction: Install VirtualBox Install COLinux V8 Run COLinux Create a new Windows VM In Virtualbox Attempt to Boot the new VM Into Windows setup. BSOD ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-14 19:57 Message: It would be nice to get an analyze from minidump of that BSOD, so we can see, what caused this crash. Please read "Howto use Windows crash dump" in your colinux file "debugging.txt" or here from source: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/doc/debugging Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2760666&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-14 06:54:01
|
Bugs item #2756909, was opened at 2009-04-12 22:23 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-14 06:54 Message: Hi, I have read carefully the email. I have tried your code, both dbl and int version. I cannot see the problem. It would be very interesting if you (nobody :=) )could test my colinux version. If you can, please concact me at paolo DOT minazzi AT gmail DOT com I send you a link, so you can try a lettle different version. It help us to understand better the problem. On my hardware (3 PCs) I cannot see this problem. Thanks, Paolo ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 21:40 Message: I called Stano, who has reported the same problem on UML. He sayed, that this bug is not solved in UML, and he has this workaround: "The only thing that helps in my case is running the guest with mode=skas0. This eliminated the problem completely and the guest is running for months without any problem." Currently I can not find what skas0 does, I not found the mode changer. The test have more modified, so I can say, it is not a stack clobbering: #include <stdio.h> volatile double theDouble; int main(int argc, char* argv[]){ theDouble = 1; while(1){ usleep(0); if(theDouble != 1){ printf("Double test fails!\n"); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); break; } } return 0; } Compiled with "gcc -ggdb -o dblchange dblchange.c" on Debian 4.0 (gcc 4.1.2). Here some of the errors: Double test fails! - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: nan (FFF8000000000000) - current Double: 1.000000 (3FF0000000000000) ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 20:58 Message: Aah, yeah the double instead of integer thing wasn't that clever :) I changed the code to really use integers this time and it doesn't crash anymore. Seems only the double operations/registers get hosed. Also when the int registers would be unreliable I would expect alot more errors popping up when running the guest system. As for reverting, I think that's the best solution for now. But I did notice the speed improvements in the 0.8.0 code base as opossed to the 0.7.3 one and it was quite nice, so I would love to see a working version of the speed enhancements. Henry, thanks for helping hunting this bug down and good luck with the colinux development. It's been real. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 15:32 Message: The comes from changes in SVN revision r1237. 2009-03-21T23:56:07 henryn r1237 * Remove co_switch_wrapper_protected and all workaround for SSE/MMX on raid modules. Reverts the workaround from SVN r1212, related Bugs #2524658, #2551241. r1236 20090319-Snapshot runs r1237 20090321-Snapshot fails Tested dblchange-nosleep.c and "openssl genrsa -out /dev/null 4096" in fltk console. Snapshorts are from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ I feel, we should revert this change in this release to the slower, but saver code. Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 15:11 Message: Hello Keith, in intchange.c: > double theDouble; > double theLastDouble; there you used also double, not integer. But very nice to see, that your dblchange.c fails very shortly after starting the "openssl genrsa -out /dev/null 4096" in second console. I have little tuned the fail by removing some calculations and force a task switch before the compair: dblchange-nosleep.c: #include <stdio.h> int main(int argc, char* argv[]){ double theDouble, theLastDouble; theDouble = 1; while(1){ theDouble += 1; theLastDouble = theDouble; sleep(0); /* force task switch here */ if(theLastDouble != theDouble){ printf("Double test fails!\n"); printf("- previous Double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current Double: %f (%LX)\n", theDouble, theDouble); break; } } return 0; } Some example failures: Double test fails! - previous Double: 151.000000 (4062E00000000000) - current Double: 151.000000 (4062E00000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: 2.000000 (4000000000000000) - current Double: 2.000000 (4000000000000000) Double test fails! - previous Double: 6.000000 (4018000000000000) - current Double: 6.000000 (4018000000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) This fails also, if I remove the "sleep" completely: - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: 27945100.000000 (417AA688C0000000) Double test fails! - previous Double: 56525465.000000 (418AF414C8000000) - current Double: 56525465.000000 (418AF414C8000000) So, it is not the sleep self. It is the task switch, and/or something stupid in the keygen. I will check this some revisions before we changed the FPU save/restore (20090321, SVN r1237). Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 09:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-14 02:50:22
|
Bugs item #2760666, was opened at 2009-04-14 02:50 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2760666&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Crash / BSOD Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: VirtualBOX causes BSOD Initial Comment: Running Parallels or VMWare to virtualise a virtual machine doesnt result in crashing. Trying to boot Windows XP on VirtualBOX at the same time as running COLinux causes it to BSOD the host system (Windows Vista) Its strange but why does VBox BSOD the system whereas Parallels/VMWare can virtualise fine alongside COLinux.. Reproduction: Install VirtualBox Install COLinux V8 Run COLinux Create a new Windows VM In Virtualbox Attempt to Boot the new VM Into Windows setup. BSOD ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2760666&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-13 21:40:59
|
Bugs item #2756909, was opened at 2009-04-13 00:23 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-13 23:40 Message: I called Stano, who has reported the same problem on UML. He sayed, that this bug is not solved in UML, and he has this workaround: "The only thing that helps in my case is running the guest with mode=skas0. This eliminated the problem completely and the guest is running for months without any problem." Currently I can not find what skas0 does, I not found the mode changer. The test have more modified, so I can say, it is not a stack clobbering: #include <stdio.h> volatile double theDouble; int main(int argc, char* argv[]){ theDouble = 1; while(1){ usleep(0); if(theDouble != 1){ printf("Double test fails!\n"); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); printf("- current Double: %f (%llX)\n", theDouble, theDouble); break; } } return 0; } Compiled with "gcc -ggdb -o dblchange dblchange.c" on Debian 4.0 (gcc 4.1.2). Here some of the errors: Double test fails! - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) - current Double: 1.000000 (3FF0000000000000) Double test fails! - current Double: 1.000000 (3FF0000000000000) - current Double: nan (FFF8000000000000) - current Double: 1.000000 (3FF0000000000000) ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 22:58 Message: Aah, yeah the double instead of integer thing wasn't that clever :) I changed the code to really use integers this time and it doesn't crash anymore. Seems only the double operations/registers get hosed. Also when the int registers would be unreliable I would expect alot more errors popping up when running the guest system. As for reverting, I think that's the best solution for now. But I did notice the speed improvements in the 0.8.0 code base as opossed to the 0.7.3 one and it was quite nice, so I would love to see a working version of the speed enhancements. Henry, thanks for helping hunting this bug down and good luck with the colinux development. It's been real. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 17:32 Message: The comes from changes in SVN revision r1237. 2009-03-21T23:56:07 henryn r1237 * Remove co_switch_wrapper_protected and all workaround for SSE/MMX on raid modules. Reverts the workaround from SVN r1212, related Bugs #2524658, #2551241. r1236 20090319-Snapshot runs r1237 20090321-Snapshot fails Tested dblchange-nosleep.c and "openssl genrsa -out /dev/null 4096" in fltk console. Snapshorts are from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ I feel, we should revert this change in this release to the slower, but saver code. Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 17:11 Message: Hello Keith, in intchange.c: > double theDouble; > double theLastDouble; there you used also double, not integer. But very nice to see, that your dblchange.c fails very shortly after starting the "openssl genrsa -out /dev/null 4096" in second console. I have little tuned the fail by removing some calculations and force a task switch before the compair: dblchange-nosleep.c: #include <stdio.h> int main(int argc, char* argv[]){ double theDouble, theLastDouble; theDouble = 1; while(1){ theDouble += 1; theLastDouble = theDouble; sleep(0); /* force task switch here */ if(theLastDouble != theDouble){ printf("Double test fails!\n"); printf("- previous Double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current Double: %f (%LX)\n", theDouble, theDouble); break; } } return 0; } Some example failures: Double test fails! - previous Double: 151.000000 (4062E00000000000) - current Double: 151.000000 (4062E00000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: 2.000000 (4000000000000000) - current Double: 2.000000 (4000000000000000) Double test fails! - previous Double: 6.000000 (4018000000000000) - current Double: 6.000000 (4018000000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) This fails also, if I remove the "sleep" completely: - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: 27945100.000000 (417AA688C0000000) Double test fails! - previous Double: 56525465.000000 (418AF414C8000000) - current Double: 56525465.000000 (418AF414C8000000) So, it is not the sleep self. It is the task switch, and/or something stupid in the keygen. I will check this some revisions before we changed the FPU save/restore (20090321, SVN r1237). Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 11:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-13 20:58:22
|
Bugs item #2756909, was opened at 2009-04-12 22:23 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 20:58 Message: Aah, yeah the double instead of integer thing wasn't that clever :) I changed the code to really use integers this time and it doesn't crash anymore. Seems only the double operations/registers get hosed. Also when the int registers would be unreliable I would expect alot more errors popping up when running the guest system. As for reverting, I think that's the best solution for now. But I did notice the speed improvements in the 0.8.0 code base as opossed to the 0.7.3 one and it was quite nice, so I would love to see a working version of the speed enhancements. Henry, thanks for helping hunting this bug down and good luck with the colinux development. It's been real. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 15:32 Message: The comes from changes in SVN revision r1237. 2009-03-21T23:56:07 henryn r1237 * Remove co_switch_wrapper_protected and all workaround for SSE/MMX on raid modules. Reverts the workaround from SVN r1212, related Bugs #2524658, #2551241. r1236 20090319-Snapshot runs r1237 20090321-Snapshot fails Tested dblchange-nosleep.c and "openssl genrsa -out /dev/null 4096" in fltk console. Snapshorts are from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ I feel, we should revert this change in this release to the slower, but saver code. Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 15:11 Message: Hello Keith, in intchange.c: > double theDouble; > double theLastDouble; there you used also double, not integer. But very nice to see, that your dblchange.c fails very shortly after starting the "openssl genrsa -out /dev/null 4096" in second console. I have little tuned the fail by removing some calculations and force a task switch before the compair: dblchange-nosleep.c: #include <stdio.h> int main(int argc, char* argv[]){ double theDouble, theLastDouble; theDouble = 1; while(1){ theDouble += 1; theLastDouble = theDouble; sleep(0); /* force task switch here */ if(theLastDouble != theDouble){ printf("Double test fails!\n"); printf("- previous Double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current Double: %f (%LX)\n", theDouble, theDouble); break; } } return 0; } Some example failures: Double test fails! - previous Double: 151.000000 (4062E00000000000) - current Double: 151.000000 (4062E00000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: 2.000000 (4000000000000000) - current Double: 2.000000 (4000000000000000) Double test fails! - previous Double: 6.000000 (4018000000000000) - current Double: 6.000000 (4018000000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) This fails also, if I remove the "sleep" completely: - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: 27945100.000000 (417AA688C0000000) Double test fails! - previous Double: 56525465.000000 (418AF414C8000000) - current Double: 56525465.000000 (418AF414C8000000) So, it is not the sleep self. It is the task switch, and/or something stupid in the keygen. I will check this some revisions before we changed the FPU save/restore (20090321, SVN r1237). Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 09:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-13 15:32:42
|
Bugs item #2756909, was opened at 2009-04-13 00:23 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-13 17:32 Message: The comes from changes in SVN revision r1237. 2009-03-21T23:56:07 henryn r1237 * Remove co_switch_wrapper_protected and all workaround for SSE/MMX on raid modules. Reverts the workaround from SVN r1212, related Bugs #2524658, #2551241. r1236 20090319-Snapshot runs r1237 20090321-Snapshot fails Tested dblchange-nosleep.c and "openssl genrsa -out /dev/null 4096" in fltk console. Snapshorts are from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ I feel, we should revert this change in this release to the slower, but saver code. Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-13 17:11 Message: Hello Keith, in intchange.c: > double theDouble; > double theLastDouble; there you used also double, not integer. But very nice to see, that your dblchange.c fails very shortly after starting the "openssl genrsa -out /dev/null 4096" in second console. I have little tuned the fail by removing some calculations and force a task switch before the compair: dblchange-nosleep.c: #include <stdio.h> int main(int argc, char* argv[]){ double theDouble, theLastDouble; theDouble = 1; while(1){ theDouble += 1; theLastDouble = theDouble; sleep(0); /* force task switch here */ if(theLastDouble != theDouble){ printf("Double test fails!\n"); printf("- previous Double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current Double: %f (%LX)\n", theDouble, theDouble); break; } } return 0; } Some example failures: Double test fails! - previous Double: 151.000000 (4062E00000000000) - current Double: 151.000000 (4062E00000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: 2.000000 (4000000000000000) - current Double: 2.000000 (4000000000000000) Double test fails! - previous Double: 6.000000 (4018000000000000) - current Double: 6.000000 (4018000000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) This fails also, if I remove the "sleep" completely: - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: 27945100.000000 (417AA688C0000000) Double test fails! - previous Double: 56525465.000000 (418AF414C8000000) - current Double: 56525465.000000 (418AF414C8000000) So, it is not the sleep self. It is the task switch, and/or something stupid in the keygen. I will check this some revisions before we changed the FPU save/restore (20090321, SVN r1237). Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 11:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-13 15:11:51
|
Bugs item #2756909, was opened at 2009-04-13 00:23 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-13 17:11 Message: Hello Keith, in intchange.c: > double theDouble; > double theLastDouble; there you used also double, not integer. But very nice to see, that your dblchange.c fails very shortly after starting the "openssl genrsa -out /dev/null 4096" in second console. I have little tuned the fail by removing some calculations and force a task switch before the compair: dblchange-nosleep.c: #include <stdio.h> int main(int argc, char* argv[]){ double theDouble, theLastDouble; theDouble = 1; while(1){ theDouble += 1; theLastDouble = theDouble; sleep(0); /* force task switch here */ if(theLastDouble != theDouble){ printf("Double test fails!\n"); printf("- previous Double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current Double: %f (%LX)\n", theDouble, theDouble); break; } } return 0; } Some example failures: Double test fails! - previous Double: 151.000000 (4062E00000000000) - current Double: 151.000000 (4062E00000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: 2.000000 (4000000000000000) - current Double: 2.000000 (4000000000000000) Double test fails! - previous Double: 6.000000 (4018000000000000) - current Double: 6.000000 (4018000000000000) Double test fails! - previous Double: 3.000000 (4008000000000000) - current Double: 3.000000 (4008000000000000) This fails also, if I remove the "sleep" completely: - previous Double: nan (FFF8000000000000) - current Double: nan (FFF8000000000000) Double test fails! - previous Double: nan (FFF8000000000000) - current Double: 27945100.000000 (417AA688C0000000) Double test fails! - previous Double: 56525465.000000 (418AF414C8000000) - current Double: 56525465.000000 (418AF414C8000000) So, it is not the sleep self. It is the task switch, and/or something stupid in the keygen. I will check this some revisions before we changed the FPU save/restore (20090321, SVN r1237). Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 11:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-13 14:33:51
|
Bugs item #2748015, was opened at 2009-04-09 17:54 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Daemons (Windows) Group: v0.8.x (devel) Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: PuTTY failing after a while with "Server key not valid" Initial Comment: Hi, a few days ago I upgraded my coLinux install to version 0.8.0 snapshot 20090329. While using the ndis-bridge feature I noticed my SSH connections crashing after a while (like 20 minutes, sometimes less, sometimes after hours). PuTTY and WinSCP report the error as "Server's host key did not match the signature supplied". WinSCP is build on the SSH code of PuTTY so it seems to be a low-level error. I tested a few setups to isolate the problem: - coLinux 0.8.0 (20090329) with ndis-bridge: fails after a while. - coLinux 0.8.0 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with ndis-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.3 (20080608) with pcap-bridge: still going strong after 2 hours and alot of traffic. The fact that I get the same error with the RC1 and the 0.8.0 development version is not supprising as it should be based on the same code. >From searching the net I found these possible causes for the problem: - The cached server key signature is not valid anymore (would prevent me from logging in) - The network traffic gets mauled between server and client - The client or server software can be at fault. My setup specs: - AMD AthlonXP 3800+, 2gb ram - Windows XP Pro SP3 + all updates to date - Network card: NVIDIA nForce 10/100/1000 Mbps Ethernet - Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) - Server software: OpenSSL (0.9.8j pacman package #1) / OpenSSH (5.1p1 pacman package #2) - Client software: PuTTY (0.60) / WinSCP (4.1.7 build 413) I tested the virtual machine on the windows host system. There cannot be any signal decay on the wire cause the bridged network shouldn't use it. This, and the fact I cannot reproduce the problem with version 0.7.3, leads me to believe the newer colinux versions are corrupting the data somewhere in between. I would like to know what you make of this problem. Solutions/workarounds and reproducability confirmations are also welcome. Thanks, Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-13 16:33 Message: First scene: I was idle in prompt with PuTTY on Host (no wire) and at same time was logged in from other machine with "ssh -o RekeyLimit=1K hn@192.168.2.104" and doing some compiling stuff under coLinux. Both ssh-sessions killed at same time. PuTTY with "Server's host key did not match the signature supplied", and the ssh with: """ RSA_public_decrypt failed: error:0407006A:rsa routines:RSA_padding_check_PKCS1_type_1:block type is not 01 key_verify failed for server_host_key """ Second scene: I was logged in with PuTTY (option key regen 1 minute) via *tuntap*, and on first nt-console was running the dblchange.c (see bug #2756909), on second nt-console (ALT-F2) was running "openssl genrsa -out /dev/null 4096". At same time the dblchange.c detected the error, my PuTTY was terminated with the error message "Server's host key did not match the signature supplied". The difference is, that I not was using pcap-bridge or ndis-bridge. Keith, you are right, that both bugs have some the same source. But, please lets follow the bug #2756909. That is better, as waiting the termination of PuTTY. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 00:24 Message: I cannot get version 0.7.3 connections to crash by increasing the rekey limit. I also tried cygwin's ssh with the RekeyLimit=1k option and it does not crash. PuTTY is still crashing though. And the sleep command is still dying occasionally, but this bug is being tracked here http://sourceforge.net/tracker/?func=detail&aid=2756909&group_id=98788&atid=622063 Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-12 19:37 Message: Hello Keith, > I would like to ask anyone (Henry ;) ) to try and reproduce the crashing > by setting your rekey PuTTY option to 1 minute. Yes. It's crashing with this setting after ~15 Minutes. I was running "watch cat /proc/colinux/stats" inside PuTTY. I have restarted PuTTY, and this is running without this problem longer than 60 minutes now. That to rarely to resolve the problem. PuTTY 0.60 Debian 4.0, openssl 0.9.8c-4etch5, openssh-server 4.3p2-9etch3 coLinux 0.7.4-rc1 (20090329) pcap-bridge on Realtek RTL8102E Family PCI-E Fast Ethernet NIC Hardware Checksum disabled. Only with this option I can connect to ssh from host, see Bug #2688891. The same coLinux connected from native Linux with "ssh -o RekeyLimit=1K user@192.168.2.104" the "watch cat /proc/colinux/stats" runs without problems for more as 60 minutes now. I can see the key-re-generation by tcpdump from fltk console for every ~20 seconds. The different here are the ssh vs. PuTTY, and this here goes over the wire. Can you check this with Cygwin's ssh on the host? As workaround you can use tuntap for your PuTTY login. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-12 16:22 Message: Bug research update. I've done the netio performance benchmark twice from windows to colinux and twice from colinux to windows (four runs total). Here are the results: >From windows to colinux #1: TCP connection established. Packet size 1k bytes: 11246 KByte/s Tx, 17599 KByte/s Rx. Packet size 2k bytes: 11296 KByte/s Tx, 18233 KByte/s Rx. Packet size 4k bytes: 11561 KByte/s Tx, 18841 KByte/s Rx. Packet size 8k bytes: 11566 KByte/s Tx, 19752 KByte/s Rx. Packet size 16k bytes: 11564 KByte/s Tx, 20031 KByte/s Rx. Packet size 32k bytes: 11586 KByte/s Tx, 18700 KByte/s Rx. Done. >From windows to colinux #2: TCP connection established. Packet size 1k bytes: 11331 KByte/s Tx, 17431 KByte/s Rx. Packet size 2k bytes: 11228 KByte/s Tx, 17502 KByte/s Rx. Packet size 4k bytes: 11564 KByte/s Tx, 17893 KByte/s Rx. Packet size 8k bytes: 11542 KByte/s Tx, 18682 KByte/s Rx. Packet size 16k bytes: 11500 KByte/s Tx, 16087 KByte/s Rx. Packet size 32k bytes: 10794 KByte/s Tx, 20235 KByte/s Rx. Done. >From colinux to windows #1: TCP connection established. Packet size 1k bytes: 17529 KByte/s Tx, 11034 KByte/s Rx. Packet size 2k bytes: 17824 KByte/s Tx, 11249 KByte/s Rx. Packet size 4k bytes: 17897 KByte/s Tx, 10706 KByte/s Rx. Packet size 8k bytes: 19426 KByte/s Tx, 10716 KByte/s Rx. Packet size 16k bytes: 19306 KByte/s Tx, 11464 KByte/s Rx. Packet size 32k bytes: 19284 KByte/s Tx, 11487 KByte/s Rx. Done. >From colinux to windows #2: TCP connection established. Packet size 1k bytes: 17279 KByte/s Tx, 11073 KByte/s Rx. Packet size 2k bytes: 17536 KByte/s Tx, 11239 KByte/s Rx. Packet size 4k bytes: 17704 KByte/s Tx, 11451 KByte/s Rx. Packet size 8k bytes: 17088 KByte/s Tx, 11338 KByte/s Rx. Packet size 16k bytes: 19456 KByte/s Tx, 11458 KByte/s Rx. Packet size 32k bytes: 19034 KByte/s Tx, 11500 KByte/s Rx. Done. Nothing interesting here. Also I looked at the netio description and source, to me it doesn't seem to do integrity checks on the data being received. The way I understand the error is that the key or packets get corrupted along the way. So it would be nice to know if what should have been sent out also gets received unharmed at the other end. After looking over my last log I found that the PuTTY sessions which crashed were exactly x hours old. Where x is mostly 1, 2 or 3 hours. See below: xxx pts/0 x.x.x.x Fri Apr 10 16:56 - 17:56 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 20:13 - 21:13 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 21:14 - 04:14 (07:00) xxx pts/2 x.x.x.x Fri Apr 10 20:14 - 03:14 (07:00) (These results are older and I'm not 100% sure these sessions have crashed but their entries are suspicious.) xxx pts/1 x.x.x.x Tue Apr 7 13:38 - 18:38 (05:00) xxx pts/0 x.x.x.x Tue Apr 7 19:57 - 20:57 (01:00) xxx pts/1 x.x.x.x Tue Apr 7 22:59 - 01:59 (03:00) xxx pts/0 x.x.x.x Tue Apr 7 22:59 - 23:59 (01:00) xxx pts/1 x.x.x.x Thu Apr 9 11:35 - 12:35 (01:00) xxx pts/0 x.x.x.x Thu Apr 9 11:34 - 15:34 (04:00) Then I remembered there is a 3600 value configuration option in sshd_config so I changed that to a more frequent value (10 seconds). The configuration option is the SSH-1 key regeneration interval (KeyRegenerationInterval). But I use SSH protocol v2 so this configuration parameter did not affect my putty connection crashes. Putty also has such an option, located in the session editor screen Connection -> SSH -> Kex option "Max minutes before rekey (0 for no limit)". When I changed this to 1 minute the session crashes became much more frequent. See below: xxx pts/1 x.x.x.x Sat Apr 11 19:42 - 19:44 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 19:27 - 23:58 (04:30) xxx pts/1 x.x.x.x Sat Apr 11 16:25 - 16:27 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 16:25 - 16:31 (00:06) xxx pts/0 x.x.x.x Sat Apr 11 13:46 - 15:46 (02:00) xxx pts/1 x.x.x.x Sat Apr 11 13:45 - 13:47 (00:02) xxx pts/1 x.x.x.x Sat Apr 11 13:43 - 13:44 (00:01) xxx pts/1 x.x.x.x Sat Apr 11 13:13 - 13:14 (00:01) xxx pts/0 x.x.x.x Sat Apr 11 13:12 - 13:45 (00:33) Now there are some sessions which lasted alot longer than the others. This is due to me having two different sessions open most of the time, one for running the 'echo $RANDOM' script and one for calling 'last | head' when the other session had crashed. The 'last | head' session does not generate much output and this seems to affect the probability of the connection crashing. Not sure as of yet how exactly the output and key exchange are related. I am currently trying to reproduce the connection crashing with the increased number of key exchanges with version 0.7.3, but I don't think it will crash. I would like to ask anyone (Henry ;) ) to try and reproduce the crashing by setting your rekey PuTTY option to 1 minute. Thanks. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-10 19:36 Message: Hello Keith, please open a separate Tracker for the sleep bug. It has nothing to do with putty. I can confirm it under Debian 4.0 with coLinux 0.7.4-rc1 on FLTK console, the message is sleep: xnanosleep.c:58: xnanosleep: Assertion `0 <= seconds' failed. For the network, it is normal, that all packets from Windows Host to coLinux Guest goes also over the wire out. Windows does not known, that we are in capture mode on this network interface. My question was more, goes the ssh connection over the wire. You sad no. Then please lets test some netio from Windows to coLinux. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-10 18:20 Message: After further investigating the problem I found this thread on the net: http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html It mentions something strange happening in the User Mode Linux kernel which causes random variables in random processes to get corrupted when doing something with OpenSSL. My little bash script became very unreliable when starting an openssl key generation process in another putty. Normally I get a neverending string of random decimal numbers, but now the sleep program starts aborting with message: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted Can someone please try to reproduce the error? Start this bash script in one putty session: while true; do echo -n "$RANDOM"; sleep 0.1; done Start the following command in another one: openssl genrsa -out /dev/null 4096 It should be obvious the sleep command is failing, 'cause there are alot of error messages. I used colinux 0.8.0 (20090329) this time with the pcap bridge for networking. Also, henryn, I monitored my network traffic and the network traffic does reach the wire. My network is very reliable, but of course we can't rule out the packet corruption is comming from interferance on the wire. But this does not explain sleep aborting when doing something with OpenSSL. This seems like a whole other bug, but I think the networking thing and the corruption of memory are related. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-09 23:12 Message: Thanks for the quick response. Above I wrote: I tested the virtual machine on the windows host system. This means I used the same desktop/computer system to run the virtual machine and to run the client software (PuTTY, WinSCP). Therefore I concluded the traffic does not go over the wire. As for the hours between my bug report and this comment the same virtual machine running under coLinux 0.7.3 is still doing great. I did not experience any corrupted downloads inside the vm, but I will test this later on. Also I noticed my sleep command aborting sometimes while running this bash script (to generate traffic) while true; do echo -n "$RANDOM"; sleep 0.5; done It blurts somekind of assertion not being valid or something in xnanosleep.c on some line, will report the actual error message later on. Maybe this is related, cause I don't get such aborts with the 0.7.3 version. I will investigate the problem further tomorrow or the day after that. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-09 19:18 Message: Hello Keith, do you use PuTTY on same desktop where coLinux is running? Or goes it over the wire? I can not assume, that this is a problem on the network bridge between coLinux and Host. I use PuTTY every day for many hours and never have corrupted data, or such errors. I'm auto logged in with public + private key pairs. PuTTY and coLinux runs on the same desktop. For me, I use PuTTY as connection between Host (Windows) and Guest (coLInux). I also have no errors from network mounts and getting downloads from internet via this bridging interface. coLinux ndis-bridge have heavy tested with "netio" and there are no errors. Please test your network connection without ssh, for example with netio ( http://www.nwlab.net/art/netio/netio.html ). Use the -t option to test only TCP (ssh uses TCP only). Test first from your location where you have detected the errors. Than test the connection between your Host (Windows) and Guest (coLinux). Changes on pcap-bridge betwen 0.7.3 and 0.7.4 are very rarely. Here is the list of changes: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/user/conet-bridged-daemon/?view=log Revision 1057 was the release build for 0.7.3. So, there exist only 4 changes up to version 0.7.4. The biggest change was revision 1222, this is first included in build 20090227. To find the regression, you can use snapshots from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-13 09:53:00
|
Bugs item #2756909, was opened at 2009-04-12 22:23 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-13 09:52 Message: To test this problem I simplified the program listed in the fixunix thread above. dblchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theDouble; double theLastDouble; theDouble = 1; while(true){ theLastDouble = theDouble; theDouble += 1; if(theLastDouble + 1 != theDouble){ printf("Double test fails!\n"); printf("- previous double: %f (%LX)\n", theLastDouble, theLastDouble); printf("- current double: %f (%LX)\n", theDouble, theDouble); break; } sleep(1); } return 0; } intchange.c: #include <stdio.h> #define true 1 #define false 0 int main(int argc, char* argv[]){ double theInteger; double theLastInteger; theInteger = 1; while(true){ theLastInteger = theInteger; theInteger += 1; if(theLastInteger + 1 != theInteger){ printf("Integer test fails!\n"); printf("- previous int: %d (%X)\n", theLastInteger, theLastInteger); printf("- current int: %d (%X)\n", theInteger, theInteger); break; } sleep(1); } return 0; } By analyzing the error thrown by sleep it seems the double value which specifies how long to sleep gets changed outside of the program's control. First I adapted the fixunix program to test for doubles being changed. It runs smoothly until I start the openssl key generation operation. Then it errors after several seconds: Double test fails! - previous double: nan (FFF8000000000000) - current double: nan (FFF8000000000000) By injecting some other printfs I've seen that in the fatal iteration the second read of the previous double goes wrong. But this doesn't matter that much because it gets overwritten by the current double variable. After that both variables are good again, but when increasing the current variable the outcome becomes the NAN value. Output below: +++ - previous double: 4.000000 (FFF8000000000000) - current double: 5.000000 (4014000000000000) theLastDouble = theDouble; - previous double: 5.000000 (4014000000000000) - current double: 5.000000 (4014000000000000) theDouble += 1; - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Double test fails! - previous double: 5.000000 (4014000000000000) - current double: nan (FFF8000000000000) Note: at the "Double test fails!" piece the previous double does not have a NAN value. This only occurs when I add printfs so I blame this on the printfs doing stuff in between which changes the data flow. After a little further investigation it shows the previous double gets corrupted because in the final iteration the second read of any double gets turned into the NAN value. This means the current value wil be read as NAN and then copied to the previous value. After the double catastrophy I was curious if integers would also be affected so I wrote intchange.c. This showed that even integers are affected by this bug. Output below: Integer test fails! - previous int: 0 (FFF80000) - current int: 0 (FFF80000) The hex pattern seems to be the same as the corruption which doubles seem to get. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-12 22:25:08
|
Bugs item #2748015, was opened at 2009-04-09 15:54 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Daemons (Windows) Group: v0.8.x (devel) Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: PuTTY failing after a while with "Server key not valid" Initial Comment: Hi, a few days ago I upgraded my coLinux install to version 0.8.0 snapshot 20090329. While using the ndis-bridge feature I noticed my SSH connections crashing after a while (like 20 minutes, sometimes less, sometimes after hours). PuTTY and WinSCP report the error as "Server's host key did not match the signature supplied". WinSCP is build on the SSH code of PuTTY so it seems to be a low-level error. I tested a few setups to isolate the problem: - coLinux 0.8.0 (20090329) with ndis-bridge: fails after a while. - coLinux 0.8.0 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with ndis-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.3 (20080608) with pcap-bridge: still going strong after 2 hours and alot of traffic. The fact that I get the same error with the RC1 and the 0.8.0 development version is not supprising as it should be based on the same code. >From searching the net I found these possible causes for the problem: - The cached server key signature is not valid anymore (would prevent me from logging in) - The network traffic gets mauled between server and client - The client or server software can be at fault. My setup specs: - AMD AthlonXP 3800+, 2gb ram - Windows XP Pro SP3 + all updates to date - Network card: NVIDIA nForce 10/100/1000 Mbps Ethernet - Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) - Server software: OpenSSL (0.9.8j pacman package #1) / OpenSSH (5.1p1 pacman package #2) - Client software: PuTTY (0.60) / WinSCP (4.1.7 build 413) I tested the virtual machine on the windows host system. There cannot be any signal decay on the wire cause the bridged network shouldn't use it. This, and the fact I cannot reproduce the problem with version 0.7.3, leads me to believe the newer colinux versions are corrupting the data somewhere in between. I would like to know what you make of this problem. Solutions/workarounds and reproducability confirmations are also welcome. Thanks, Keith ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-12 22:24 Message: I cannot get version 0.7.3 connections to crash by increasing the rekey limit. I also tried cygwin's ssh with the RekeyLimit=1k option and it does not crash. PuTTY is still crashing though. And the sleep command is still dying occasionally, but this bug is being tracked here http://sourceforge.net/tracker/?func=detail&aid=2756909&group_id=98788&atid=622063 Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-12 17:37 Message: Hello Keith, > I would like to ask anyone (Henry ;) ) to try and reproduce the crashing > by setting your rekey PuTTY option to 1 minute. Yes. It's crashing with this setting after ~15 Minutes. I was running "watch cat /proc/colinux/stats" inside PuTTY. I have restarted PuTTY, and this is running without this problem longer than 60 minutes now. That to rarely to resolve the problem. PuTTY 0.60 Debian 4.0, openssl 0.9.8c-4etch5, openssh-server 4.3p2-9etch3 coLinux 0.7.4-rc1 (20090329) pcap-bridge on Realtek RTL8102E Family PCI-E Fast Ethernet NIC Hardware Checksum disabled. Only with this option I can connect to ssh from host, see Bug #2688891. The same coLinux connected from native Linux with "ssh -o RekeyLimit=1K user@192.168.2.104" the "watch cat /proc/colinux/stats" runs without problems for more as 60 minutes now. I can see the key-re-generation by tcpdump from fltk console for every ~20 seconds. The different here are the ssh vs. PuTTY, and this here goes over the wire. Can you check this with Cygwin's ssh on the host? As workaround you can use tuntap for your PuTTY login. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-12 14:22 Message: Bug research update. I've done the netio performance benchmark twice from windows to colinux and twice from colinux to windows (four runs total). Here are the results: >From windows to colinux #1: TCP connection established. Packet size 1k bytes: 11246 KByte/s Tx, 17599 KByte/s Rx. Packet size 2k bytes: 11296 KByte/s Tx, 18233 KByte/s Rx. Packet size 4k bytes: 11561 KByte/s Tx, 18841 KByte/s Rx. Packet size 8k bytes: 11566 KByte/s Tx, 19752 KByte/s Rx. Packet size 16k bytes: 11564 KByte/s Tx, 20031 KByte/s Rx. Packet size 32k bytes: 11586 KByte/s Tx, 18700 KByte/s Rx. Done. >From windows to colinux #2: TCP connection established. Packet size 1k bytes: 11331 KByte/s Tx, 17431 KByte/s Rx. Packet size 2k bytes: 11228 KByte/s Tx, 17502 KByte/s Rx. Packet size 4k bytes: 11564 KByte/s Tx, 17893 KByte/s Rx. Packet size 8k bytes: 11542 KByte/s Tx, 18682 KByte/s Rx. Packet size 16k bytes: 11500 KByte/s Tx, 16087 KByte/s Rx. Packet size 32k bytes: 10794 KByte/s Tx, 20235 KByte/s Rx. Done. >From colinux to windows #1: TCP connection established. Packet size 1k bytes: 17529 KByte/s Tx, 11034 KByte/s Rx. Packet size 2k bytes: 17824 KByte/s Tx, 11249 KByte/s Rx. Packet size 4k bytes: 17897 KByte/s Tx, 10706 KByte/s Rx. Packet size 8k bytes: 19426 KByte/s Tx, 10716 KByte/s Rx. Packet size 16k bytes: 19306 KByte/s Tx, 11464 KByte/s Rx. Packet size 32k bytes: 19284 KByte/s Tx, 11487 KByte/s Rx. Done. >From colinux to windows #2: TCP connection established. Packet size 1k bytes: 17279 KByte/s Tx, 11073 KByte/s Rx. Packet size 2k bytes: 17536 KByte/s Tx, 11239 KByte/s Rx. Packet size 4k bytes: 17704 KByte/s Tx, 11451 KByte/s Rx. Packet size 8k bytes: 17088 KByte/s Tx, 11338 KByte/s Rx. Packet size 16k bytes: 19456 KByte/s Tx, 11458 KByte/s Rx. Packet size 32k bytes: 19034 KByte/s Tx, 11500 KByte/s Rx. Done. Nothing interesting here. Also I looked at the netio description and source, to me it doesn't seem to do integrity checks on the data being received. The way I understand the error is that the key or packets get corrupted along the way. So it would be nice to know if what should have been sent out also gets received unharmed at the other end. After looking over my last log I found that the PuTTY sessions which crashed were exactly x hours old. Where x is mostly 1, 2 or 3 hours. See below: xxx pts/0 x.x.x.x Fri Apr 10 16:56 - 17:56 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 20:13 - 21:13 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 21:14 - 04:14 (07:00) xxx pts/2 x.x.x.x Fri Apr 10 20:14 - 03:14 (07:00) (These results are older and I'm not 100% sure these sessions have crashed but their entries are suspicious.) xxx pts/1 x.x.x.x Tue Apr 7 13:38 - 18:38 (05:00) xxx pts/0 x.x.x.x Tue Apr 7 19:57 - 20:57 (01:00) xxx pts/1 x.x.x.x Tue Apr 7 22:59 - 01:59 (03:00) xxx pts/0 x.x.x.x Tue Apr 7 22:59 - 23:59 (01:00) xxx pts/1 x.x.x.x Thu Apr 9 11:35 - 12:35 (01:00) xxx pts/0 x.x.x.x Thu Apr 9 11:34 - 15:34 (04:00) Then I remembered there is a 3600 value configuration option in sshd_config so I changed that to a more frequent value (10 seconds). The configuration option is the SSH-1 key regeneration interval (KeyRegenerationInterval). But I use SSH protocol v2 so this configuration parameter did not affect my putty connection crashes. Putty also has such an option, located in the session editor screen Connection -> SSH -> Kex option "Max minutes before rekey (0 for no limit)". When I changed this to 1 minute the session crashes became much more frequent. See below: xxx pts/1 x.x.x.x Sat Apr 11 19:42 - 19:44 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 19:27 - 23:58 (04:30) xxx pts/1 x.x.x.x Sat Apr 11 16:25 - 16:27 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 16:25 - 16:31 (00:06) xxx pts/0 x.x.x.x Sat Apr 11 13:46 - 15:46 (02:00) xxx pts/1 x.x.x.x Sat Apr 11 13:45 - 13:47 (00:02) xxx pts/1 x.x.x.x Sat Apr 11 13:43 - 13:44 (00:01) xxx pts/1 x.x.x.x Sat Apr 11 13:13 - 13:14 (00:01) xxx pts/0 x.x.x.x Sat Apr 11 13:12 - 13:45 (00:33) Now there are some sessions which lasted alot longer than the others. This is due to me having two different sessions open most of the time, one for running the 'echo $RANDOM' script and one for calling 'last | head' when the other session had crashed. The 'last | head' session does not generate much output and this seems to affect the probability of the connection crashing. Not sure as of yet how exactly the output and key exchange are related. I am currently trying to reproduce the connection crashing with the increased number of key exchanges with version 0.7.3, but I don't think it will crash. I would like to ask anyone (Henry ;) ) to try and reproduce the crashing by setting your rekey PuTTY option to 1 minute. Thanks. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-10 17:36 Message: Hello Keith, please open a separate Tracker for the sleep bug. It has nothing to do with putty. I can confirm it under Debian 4.0 with coLinux 0.7.4-rc1 on FLTK console, the message is sleep: xnanosleep.c:58: xnanosleep: Assertion `0 <= seconds' failed. For the network, it is normal, that all packets from Windows Host to coLinux Guest goes also over the wire out. Windows does not known, that we are in capture mode on this network interface. My question was more, goes the ssh connection over the wire. You sad no. Then please lets test some netio from Windows to coLinux. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-10 16:20 Message: After further investigating the problem I found this thread on the net: http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html It mentions something strange happening in the User Mode Linux kernel which causes random variables in random processes to get corrupted when doing something with OpenSSL. My little bash script became very unreliable when starting an openssl key generation process in another putty. Normally I get a neverending string of random decimal numbers, but now the sleep program starts aborting with message: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted Can someone please try to reproduce the error? Start this bash script in one putty session: while true; do echo -n "$RANDOM"; sleep 0.1; done Start the following command in another one: openssl genrsa -out /dev/null 4096 It should be obvious the sleep command is failing, 'cause there are alot of error messages. I used colinux 0.8.0 (20090329) this time with the pcap bridge for networking. Also, henryn, I monitored my network traffic and the network traffic does reach the wire. My network is very reliable, but of course we can't rule out the packet corruption is comming from interferance on the wire. But this does not explain sleep aborting when doing something with OpenSSL. This seems like a whole other bug, but I think the networking thing and the corruption of memory are related. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-09 21:12 Message: Thanks for the quick response. Above I wrote: I tested the virtual machine on the windows host system. This means I used the same desktop/computer system to run the virtual machine and to run the client software (PuTTY, WinSCP). Therefore I concluded the traffic does not go over the wire. As for the hours between my bug report and this comment the same virtual machine running under coLinux 0.7.3 is still doing great. I did not experience any corrupted downloads inside the vm, but I will test this later on. Also I noticed my sleep command aborting sometimes while running this bash script (to generate traffic) while true; do echo -n "$RANDOM"; sleep 0.5; done It blurts somekind of assertion not being valid or something in xnanosleep.c on some line, will report the actual error message later on. Maybe this is related, cause I don't get such aborts with the 0.7.3 version. I will investigate the problem further tomorrow or the day after that. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-09 17:18 Message: Hello Keith, do you use PuTTY on same desktop where coLinux is running? Or goes it over the wire? I can not assume, that this is a problem on the network bridge between coLinux and Host. I use PuTTY every day for many hours and never have corrupted data, or such errors. I'm auto logged in with public + private key pairs. PuTTY and coLinux runs on the same desktop. For me, I use PuTTY as connection between Host (Windows) and Guest (coLInux). I also have no errors from network mounts and getting downloads from internet via this bridging interface. coLinux ndis-bridge have heavy tested with "netio" and there are no errors. Please test your network connection without ssh, for example with netio ( http://www.nwlab.net/art/netio/netio.html ). Use the -t option to test only TCP (ssh uses TCP only). Test first from your location where you have detected the errors. Than test the connection between your Host (Windows) and Guest (coLinux). Changes on pcap-bridge betwen 0.7.3 and 0.7.4 are very rarely. Here is the list of changes: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/user/conet-bridged-daemon/?view=log Revision 1057 was the release build for 0.7.3. So, there exist only 4 changes up to version 0.7.4. The biggest change was revision 1222, this is first included in build 20090227. To find the regression, you can use snapshots from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-12 22:24:00
|
Bugs item #2756909, was opened at 2009-04-12 22:23 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sleep crashing with "Assertion `0 <= seconds' failed." Initial Comment: While researching bug 2748015 (http://sourceforge.net/tracker/?func=detail&aid=2748015&group_id=98788&atid=622063), we came along another problem with the 0.74-rc1 and 0.8.0 code base. When starting command "while true; do sleep 0.1; done" and starting command "openssl genrsa -out /dev/null 4096" in another session the sleep command in the first session aborts occasionally with error: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted This problem can even be reproduced when you start the different commands as different unprivileged users. This seems like the kernel is changing the memory of random processes. Tested versions: colinux 0.8.0: suffers this bug colinux 0.7.4-rc1: suffers this bug colinux 0.7.3: does not suffer this bug Test system: AMD AthlonXP 3800+ Windows XP SP3 + all updates to date Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) While researching this further, I discovered this thread which describes a bug in the User Mode Linux kernel almost a year ago. http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html I have not been able to link this to a bug on the UML Sourceforge.net development page. Keith ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2756909&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-12 19:51:34
|
Bugs item #2747242, was opened at 2009-04-09 12:18 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2747242&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.7.x (release) Status: Open Resolution: None Priority: 5 Private: No Submitted By: MerseyViking (merseyviking) Assigned to: Nobody/Anonymous (nobody) Summary: cofs and cmake Initial Comment: I'm using andLinux to do some cross-platform development using cmake (version 2.4-patch 7) as the makefile generator. If I have a cofs mount pointing to my Windows drive, cmake falls over when trying to compile a simple test app. If the same directory is mounted as a SMB share, it works fine. The error I get from cmake is: $ cmake . -G"Unix Makefiles" -- Check for working C compiler: /usr/bin/gcc -- Check for working C compiler: /usr/bin/gcc -- broken CMake Error: The C compiler "/usr/bin/gcc" is not able to compile a simple test program. It fails with the following output: make: *** Makefile: No such file or directory. Stop. CMake will not be able to correctly generate this project. -- Configuring done I appreciate this may be an issue with cmake, such as it creating funky filenames that cofs can't deal with, but it makes more sense for a filesystem to support a tool rather than the other way round. ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-12 21:51 Message: Of curse, the same result without ccache: henry@coLinux:/media/windows$ cmake . -G"Unix Makefiles" -- Check for working C compiler: /usr/bin/gcc -- Check for working C compiler: /usr/bin/gcc -- works -- Check size of void* -- Check size of void* - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Configuring done -- Generating done -- Build files have been written to: /media/windows Henry ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-12 21:45 Message: What version of coLinux do you use? ("uname -r") Please update to version 0.7.4 from http://www.colinux.org/snapshots/ Some of these errors are fixed in the new version, for example see * Bug #2176188: File sync wrong on cofs, command after mv/rename fails. But remember: cofs is a FAT filesystem and not all of the native filesystems works there. Symlinks are not supported. There exist also some problems with timestamps. So, some make utilities means that the file is in future. This is a result from small timediff between Host and Guest. Builds on cofs are limited to use, that generally can not change. I'm prefer to compile under your home directory and sync your work with rsync to the cofs drive on windows side. A quick test under Debian 4.0 with coLinux 0.7.4-rc1 is working perfectly: henry@coLinux:/media/windows$ uname -r 2.6.22.18-co-0.7.4 henry@coLinux:/media/windows$ cmake --version cmake version 2.4-patch 5 henry@coLinux:/media/windows$ mount | grep cofs0 cofs0 on /media/windows type cofs (rw,noexec,nosuid,nodev,uid=1000,gid=1000) henry@coLinux:/media/windows$ ls CMakeLists.txt main.c henry@coLinux:/media/windows$ cat CMakeLists.txt add_executable(halloworld main.c) henry@coLinux:/media/windows$ cat main.c #include <stdio.h> int main() { printf("Hallo World!\n"); return 0; } henry@coLinux:/media/windows$ cmake . -G"Unix Makefiles" -- Check for working C compiler: /home/hn/bin/ccache/gcc -- Check for working C compiler: /home/hn/bin/ccache/gcc -- works -- Check size of void* -- Check size of void* - done -- Check for working CXX compiler: /home/hn/bin/ccache/c++ -- Check for working CXX compiler: /home/hn/bin/ccache/c++ -- works -- Configuring done -- Generating done -- Build files have been written to: /media/windows henry@coLinux:/media/windows$ make Scanning dependencies of target halloworld [100%] Building C object CMakeFiles/halloworld.dir/main.o Linking C executable halloworld [100%] Built target halloworld ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2747242&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-12 19:45:10
|
Bugs item #2747242, was opened at 2009-04-09 12:18 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2747242&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: v0.7.x (release) Status: Open Resolution: None Priority: 5 Private: No Submitted By: MerseyViking (merseyviking) Assigned to: Nobody/Anonymous (nobody) Summary: cofs and cmake Initial Comment: I'm using andLinux to do some cross-platform development using cmake (version 2.4-patch 7) as the makefile generator. If I have a cofs mount pointing to my Windows drive, cmake falls over when trying to compile a simple test app. If the same directory is mounted as a SMB share, it works fine. The error I get from cmake is: $ cmake . -G"Unix Makefiles" -- Check for working C compiler: /usr/bin/gcc -- Check for working C compiler: /usr/bin/gcc -- broken CMake Error: The C compiler "/usr/bin/gcc" is not able to compile a simple test program. It fails with the following output: make: *** Makefile: No such file or directory. Stop. CMake will not be able to correctly generate this project. -- Configuring done I appreciate this may be an issue with cmake, such as it creating funky filenames that cofs can't deal with, but it makes more sense for a filesystem to support a tool rather than the other way round. ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-12 21:45 Message: What version of coLinux do you use? ("uname -r") Please update to version 0.7.4 from http://www.colinux.org/snapshots/ Some of these errors are fixed in the new version, for example see * Bug #2176188: File sync wrong on cofs, command after mv/rename fails. But remember: cofs is a FAT filesystem and not all of the native filesystems works there. Symlinks are not supported. There exist also some problems with timestamps. So, some make utilities means that the file is in future. This is a result from small timediff between Host and Guest. Builds on cofs are limited to use, that generally can not change. I'm prefer to compile under your home directory and sync your work with rsync to the cofs drive on windows side. A quick test under Debian 4.0 with coLinux 0.7.4-rc1 is working perfectly: henry@coLinux:/media/windows$ uname -r 2.6.22.18-co-0.7.4 henry@coLinux:/media/windows$ cmake --version cmake version 2.4-patch 5 henry@coLinux:/media/windows$ mount | grep cofs0 cofs0 on /media/windows type cofs (rw,noexec,nosuid,nodev,uid=1000,gid=1000) henry@coLinux:/media/windows$ ls CMakeLists.txt main.c henry@coLinux:/media/windows$ cat CMakeLists.txt add_executable(halloworld main.c) henry@coLinux:/media/windows$ cat main.c #include <stdio.h> int main() { printf("Hallo World!\n"); return 0; } henry@coLinux:/media/windows$ cmake . -G"Unix Makefiles" -- Check for working C compiler: /home/hn/bin/ccache/gcc -- Check for working C compiler: /home/hn/bin/ccache/gcc -- works -- Check size of void* -- Check size of void* - done -- Check for working CXX compiler: /home/hn/bin/ccache/c++ -- Check for working CXX compiler: /home/hn/bin/ccache/c++ -- works -- Configuring done -- Generating done -- Build files have been written to: /media/windows henry@coLinux:/media/windows$ make Scanning dependencies of target halloworld [100%] Building C object CMakeFiles/halloworld.dir/main.o Linking C executable halloworld [100%] Built target halloworld ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2747242&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-12 17:37:10
|
Bugs item #2748015, was opened at 2009-04-09 17:54 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Daemons (Windows) Group: v0.8.x (devel) Status: Open >Resolution: Accepted Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: PuTTY failing after a while with "Server key not valid" Initial Comment: Hi, a few days ago I upgraded my coLinux install to version 0.8.0 snapshot 20090329. While using the ndis-bridge feature I noticed my SSH connections crashing after a while (like 20 minutes, sometimes less, sometimes after hours). PuTTY and WinSCP report the error as "Server's host key did not match the signature supplied". WinSCP is build on the SSH code of PuTTY so it seems to be a low-level error. I tested a few setups to isolate the problem: - coLinux 0.8.0 (20090329) with ndis-bridge: fails after a while. - coLinux 0.8.0 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with ndis-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.3 (20080608) with pcap-bridge: still going strong after 2 hours and alot of traffic. The fact that I get the same error with the RC1 and the 0.8.0 development version is not supprising as it should be based on the same code. >From searching the net I found these possible causes for the problem: - The cached server key signature is not valid anymore (would prevent me from logging in) - The network traffic gets mauled between server and client - The client or server software can be at fault. My setup specs: - AMD AthlonXP 3800+, 2gb ram - Windows XP Pro SP3 + all updates to date - Network card: NVIDIA nForce 10/100/1000 Mbps Ethernet - Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) - Server software: OpenSSL (0.9.8j pacman package #1) / OpenSSH (5.1p1 pacman package #2) - Client software: PuTTY (0.60) / WinSCP (4.1.7 build 413) I tested the virtual machine on the windows host system. There cannot be any signal decay on the wire cause the bridged network shouldn't use it. This, and the fact I cannot reproduce the problem with version 0.7.3, leads me to believe the newer colinux versions are corrupting the data somewhere in between. I would like to know what you make of this problem. Solutions/workarounds and reproducability confirmations are also welcome. Thanks, Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-12 19:37 Message: Hello Keith, > I would like to ask anyone (Henry ;) ) to try and reproduce the crashing > by setting your rekey PuTTY option to 1 minute. Yes. It's crashing with this setting after ~15 Minutes. I was running "watch cat /proc/colinux/stats" inside PuTTY. I have restarted PuTTY, and this is running without this problem longer than 60 minutes now. That to rarely to resolve the problem. PuTTY 0.60 Debian 4.0, openssl 0.9.8c-4etch5, openssh-server 4.3p2-9etch3 coLinux 0.7.4-rc1 (20090329) pcap-bridge on Realtek RTL8102E Family PCI-E Fast Ethernet NIC Hardware Checksum disabled. Only with this option I can connect to ssh from host, see Bug #2688891. The same coLinux connected from native Linux with "ssh -o RekeyLimit=1K user@192.168.2.104" the "watch cat /proc/colinux/stats" runs without problems for more as 60 minutes now. I can see the key-re-generation by tcpdump from fltk console for every ~20 seconds. The different here are the ssh vs. PuTTY, and this here goes over the wire. Can you check this with Cygwin's ssh on the host? As workaround you can use tuntap for your PuTTY login. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-12 16:22 Message: Bug research update. I've done the netio performance benchmark twice from windows to colinux and twice from colinux to windows (four runs total). Here are the results: >From windows to colinux #1: TCP connection established. Packet size 1k bytes: 11246 KByte/s Tx, 17599 KByte/s Rx. Packet size 2k bytes: 11296 KByte/s Tx, 18233 KByte/s Rx. Packet size 4k bytes: 11561 KByte/s Tx, 18841 KByte/s Rx. Packet size 8k bytes: 11566 KByte/s Tx, 19752 KByte/s Rx. Packet size 16k bytes: 11564 KByte/s Tx, 20031 KByte/s Rx. Packet size 32k bytes: 11586 KByte/s Tx, 18700 KByte/s Rx. Done. >From windows to colinux #2: TCP connection established. Packet size 1k bytes: 11331 KByte/s Tx, 17431 KByte/s Rx. Packet size 2k bytes: 11228 KByte/s Tx, 17502 KByte/s Rx. Packet size 4k bytes: 11564 KByte/s Tx, 17893 KByte/s Rx. Packet size 8k bytes: 11542 KByte/s Tx, 18682 KByte/s Rx. Packet size 16k bytes: 11500 KByte/s Tx, 16087 KByte/s Rx. Packet size 32k bytes: 10794 KByte/s Tx, 20235 KByte/s Rx. Done. >From colinux to windows #1: TCP connection established. Packet size 1k bytes: 17529 KByte/s Tx, 11034 KByte/s Rx. Packet size 2k bytes: 17824 KByte/s Tx, 11249 KByte/s Rx. Packet size 4k bytes: 17897 KByte/s Tx, 10706 KByte/s Rx. Packet size 8k bytes: 19426 KByte/s Tx, 10716 KByte/s Rx. Packet size 16k bytes: 19306 KByte/s Tx, 11464 KByte/s Rx. Packet size 32k bytes: 19284 KByte/s Tx, 11487 KByte/s Rx. Done. >From colinux to windows #2: TCP connection established. Packet size 1k bytes: 17279 KByte/s Tx, 11073 KByte/s Rx. Packet size 2k bytes: 17536 KByte/s Tx, 11239 KByte/s Rx. Packet size 4k bytes: 17704 KByte/s Tx, 11451 KByte/s Rx. Packet size 8k bytes: 17088 KByte/s Tx, 11338 KByte/s Rx. Packet size 16k bytes: 19456 KByte/s Tx, 11458 KByte/s Rx. Packet size 32k bytes: 19034 KByte/s Tx, 11500 KByte/s Rx. Done. Nothing interesting here. Also I looked at the netio description and source, to me it doesn't seem to do integrity checks on the data being received. The way I understand the error is that the key or packets get corrupted along the way. So it would be nice to know if what should have been sent out also gets received unharmed at the other end. After looking over my last log I found that the PuTTY sessions which crashed were exactly x hours old. Where x is mostly 1, 2 or 3 hours. See below: xxx pts/0 x.x.x.x Fri Apr 10 16:56 - 17:56 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 20:13 - 21:13 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 21:14 - 04:14 (07:00) xxx pts/2 x.x.x.x Fri Apr 10 20:14 - 03:14 (07:00) (These results are older and I'm not 100% sure these sessions have crashed but their entries are suspicious.) xxx pts/1 x.x.x.x Tue Apr 7 13:38 - 18:38 (05:00) xxx pts/0 x.x.x.x Tue Apr 7 19:57 - 20:57 (01:00) xxx pts/1 x.x.x.x Tue Apr 7 22:59 - 01:59 (03:00) xxx pts/0 x.x.x.x Tue Apr 7 22:59 - 23:59 (01:00) xxx pts/1 x.x.x.x Thu Apr 9 11:35 - 12:35 (01:00) xxx pts/0 x.x.x.x Thu Apr 9 11:34 - 15:34 (04:00) Then I remembered there is a 3600 value configuration option in sshd_config so I changed that to a more frequent value (10 seconds). The configuration option is the SSH-1 key regeneration interval (KeyRegenerationInterval). But I use SSH protocol v2 so this configuration parameter did not affect my putty connection crashes. Putty also has such an option, located in the session editor screen Connection -> SSH -> Kex option "Max minutes before rekey (0 for no limit)". When I changed this to 1 minute the session crashes became much more frequent. See below: xxx pts/1 x.x.x.x Sat Apr 11 19:42 - 19:44 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 19:27 - 23:58 (04:30) xxx pts/1 x.x.x.x Sat Apr 11 16:25 - 16:27 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 16:25 - 16:31 (00:06) xxx pts/0 x.x.x.x Sat Apr 11 13:46 - 15:46 (02:00) xxx pts/1 x.x.x.x Sat Apr 11 13:45 - 13:47 (00:02) xxx pts/1 x.x.x.x Sat Apr 11 13:43 - 13:44 (00:01) xxx pts/1 x.x.x.x Sat Apr 11 13:13 - 13:14 (00:01) xxx pts/0 x.x.x.x Sat Apr 11 13:12 - 13:45 (00:33) Now there are some sessions which lasted alot longer than the others. This is due to me having two different sessions open most of the time, one for running the 'echo $RANDOM' script and one for calling 'last | head' when the other session had crashed. The 'last | head' session does not generate much output and this seems to affect the probability of the connection crashing. Not sure as of yet how exactly the output and key exchange are related. I am currently trying to reproduce the connection crashing with the increased number of key exchanges with version 0.7.3, but I don't think it will crash. I would like to ask anyone (Henry ;) ) to try and reproduce the crashing by setting your rekey PuTTY option to 1 minute. Thanks. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-10 19:36 Message: Hello Keith, please open a separate Tracker for the sleep bug. It has nothing to do with putty. I can confirm it under Debian 4.0 with coLinux 0.7.4-rc1 on FLTK console, the message is sleep: xnanosleep.c:58: xnanosleep: Assertion `0 <= seconds' failed. For the network, it is normal, that all packets from Windows Host to coLinux Guest goes also over the wire out. Windows does not known, that we are in capture mode on this network interface. My question was more, goes the ssh connection over the wire. You sad no. Then please lets test some netio from Windows to coLinux. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-10 18:20 Message: After further investigating the problem I found this thread on the net: http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html It mentions something strange happening in the User Mode Linux kernel which causes random variables in random processes to get corrupted when doing something with OpenSSL. My little bash script became very unreliable when starting an openssl key generation process in another putty. Normally I get a neverending string of random decimal numbers, but now the sleep program starts aborting with message: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted Can someone please try to reproduce the error? Start this bash script in one putty session: while true; do echo -n "$RANDOM"; sleep 0.1; done Start the following command in another one: openssl genrsa -out /dev/null 4096 It should be obvious the sleep command is failing, 'cause there are alot of error messages. I used colinux 0.8.0 (20090329) this time with the pcap bridge for networking. Also, henryn, I monitored my network traffic and the network traffic does reach the wire. My network is very reliable, but of course we can't rule out the packet corruption is comming from interferance on the wire. But this does not explain sleep aborting when doing something with OpenSSL. This seems like a whole other bug, but I think the networking thing and the corruption of memory are related. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-09 23:12 Message: Thanks for the quick response. Above I wrote: I tested the virtual machine on the windows host system. This means I used the same desktop/computer system to run the virtual machine and to run the client software (PuTTY, WinSCP). Therefore I concluded the traffic does not go over the wire. As for the hours between my bug report and this comment the same virtual machine running under coLinux 0.7.3 is still doing great. I did not experience any corrupted downloads inside the vm, but I will test this later on. Also I noticed my sleep command aborting sometimes while running this bash script (to generate traffic) while true; do echo -n "$RANDOM"; sleep 0.5; done It blurts somekind of assertion not being valid or something in xnanosleep.c on some line, will report the actual error message later on. Maybe this is related, cause I don't get such aborts with the 0.7.3 version. I will investigate the problem further tomorrow or the day after that. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-09 19:18 Message: Hello Keith, do you use PuTTY on same desktop where coLinux is running? Or goes it over the wire? I can not assume, that this is a problem on the network bridge between coLinux and Host. I use PuTTY every day for many hours and never have corrupted data, or such errors. I'm auto logged in with public + private key pairs. PuTTY and coLinux runs on the same desktop. For me, I use PuTTY as connection between Host (Windows) and Guest (coLInux). I also have no errors from network mounts and getting downloads from internet via this bridging interface. coLinux ndis-bridge have heavy tested with "netio" and there are no errors. Please test your network connection without ssh, for example with netio ( http://www.nwlab.net/art/netio/netio.html ). Use the -t option to test only TCP (ssh uses TCP only). Test first from your location where you have detected the errors. Than test the connection between your Host (Windows) and Guest (coLinux). Changes on pcap-bridge betwen 0.7.3 and 0.7.4 are very rarely. Here is the list of changes: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/user/conet-bridged-daemon/?view=log Revision 1057 was the release build for 0.7.3. So, there exist only 4 changes up to version 0.7.4. The biggest change was revision 1222, this is first included in build 20090227. To find the regression, you can use snapshots from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-12 14:22:08
|
Bugs item #2748015, was opened at 2009-04-09 15:54 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Daemons (Windows) Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: PuTTY failing after a while with "Server key not valid" Initial Comment: Hi, a few days ago I upgraded my coLinux install to version 0.8.0 snapshot 20090329. While using the ndis-bridge feature I noticed my SSH connections crashing after a while (like 20 minutes, sometimes less, sometimes after hours). PuTTY and WinSCP report the error as "Server's host key did not match the signature supplied". WinSCP is build on the SSH code of PuTTY so it seems to be a low-level error. I tested a few setups to isolate the problem: - coLinux 0.8.0 (20090329) with ndis-bridge: fails after a while. - coLinux 0.8.0 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with ndis-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.3 (20080608) with pcap-bridge: still going strong after 2 hours and alot of traffic. The fact that I get the same error with the RC1 and the 0.8.0 development version is not supprising as it should be based on the same code. >From searching the net I found these possible causes for the problem: - The cached server key signature is not valid anymore (would prevent me from logging in) - The network traffic gets mauled between server and client - The client or server software can be at fault. My setup specs: - AMD AthlonXP 3800+, 2gb ram - Windows XP Pro SP3 + all updates to date - Network card: NVIDIA nForce 10/100/1000 Mbps Ethernet - Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) - Server software: OpenSSL (0.9.8j pacman package #1) / OpenSSH (5.1p1 pacman package #2) - Client software: PuTTY (0.60) / WinSCP (4.1.7 build 413) I tested the virtual machine on the windows host system. There cannot be any signal decay on the wire cause the bridged network shouldn't use it. This, and the fact I cannot reproduce the problem with version 0.7.3, leads me to believe the newer colinux versions are corrupting the data somewhere in between. I would like to know what you make of this problem. Solutions/workarounds and reproducability confirmations are also welcome. Thanks, Keith ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-12 14:22 Message: Bug research update. I've done the netio performance benchmark twice from windows to colinux and twice from colinux to windows (four runs total). Here are the results: >From windows to colinux #1: TCP connection established. Packet size 1k bytes: 11246 KByte/s Tx, 17599 KByte/s Rx. Packet size 2k bytes: 11296 KByte/s Tx, 18233 KByte/s Rx. Packet size 4k bytes: 11561 KByte/s Tx, 18841 KByte/s Rx. Packet size 8k bytes: 11566 KByte/s Tx, 19752 KByte/s Rx. Packet size 16k bytes: 11564 KByte/s Tx, 20031 KByte/s Rx. Packet size 32k bytes: 11586 KByte/s Tx, 18700 KByte/s Rx. Done. >From windows to colinux #2: TCP connection established. Packet size 1k bytes: 11331 KByte/s Tx, 17431 KByte/s Rx. Packet size 2k bytes: 11228 KByte/s Tx, 17502 KByte/s Rx. Packet size 4k bytes: 11564 KByte/s Tx, 17893 KByte/s Rx. Packet size 8k bytes: 11542 KByte/s Tx, 18682 KByte/s Rx. Packet size 16k bytes: 11500 KByte/s Tx, 16087 KByte/s Rx. Packet size 32k bytes: 10794 KByte/s Tx, 20235 KByte/s Rx. Done. >From colinux to windows #1: TCP connection established. Packet size 1k bytes: 17529 KByte/s Tx, 11034 KByte/s Rx. Packet size 2k bytes: 17824 KByte/s Tx, 11249 KByte/s Rx. Packet size 4k bytes: 17897 KByte/s Tx, 10706 KByte/s Rx. Packet size 8k bytes: 19426 KByte/s Tx, 10716 KByte/s Rx. Packet size 16k bytes: 19306 KByte/s Tx, 11464 KByte/s Rx. Packet size 32k bytes: 19284 KByte/s Tx, 11487 KByte/s Rx. Done. >From colinux to windows #2: TCP connection established. Packet size 1k bytes: 17279 KByte/s Tx, 11073 KByte/s Rx. Packet size 2k bytes: 17536 KByte/s Tx, 11239 KByte/s Rx. Packet size 4k bytes: 17704 KByte/s Tx, 11451 KByte/s Rx. Packet size 8k bytes: 17088 KByte/s Tx, 11338 KByte/s Rx. Packet size 16k bytes: 19456 KByte/s Tx, 11458 KByte/s Rx. Packet size 32k bytes: 19034 KByte/s Tx, 11500 KByte/s Rx. Done. Nothing interesting here. Also I looked at the netio description and source, to me it doesn't seem to do integrity checks on the data being received. The way I understand the error is that the key or packets get corrupted along the way. So it would be nice to know if what should have been sent out also gets received unharmed at the other end. After looking over my last log I found that the PuTTY sessions which crashed were exactly x hours old. Where x is mostly 1, 2 or 3 hours. See below: xxx pts/0 x.x.x.x Fri Apr 10 16:56 - 17:56 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 20:13 - 21:13 (01:00) xxx pts/0 x.x.x.x Fri Apr 10 21:14 - 04:14 (07:00) xxx pts/2 x.x.x.x Fri Apr 10 20:14 - 03:14 (07:00) (These results are older and I'm not 100% sure these sessions have crashed but their entries are suspicious.) xxx pts/1 x.x.x.x Tue Apr 7 13:38 - 18:38 (05:00) xxx pts/0 x.x.x.x Tue Apr 7 19:57 - 20:57 (01:00) xxx pts/1 x.x.x.x Tue Apr 7 22:59 - 01:59 (03:00) xxx pts/0 x.x.x.x Tue Apr 7 22:59 - 23:59 (01:00) xxx pts/1 x.x.x.x Thu Apr 9 11:35 - 12:35 (01:00) xxx pts/0 x.x.x.x Thu Apr 9 11:34 - 15:34 (04:00) Then I remembered there is a 3600 value configuration option in sshd_config so I changed that to a more frequent value (10 seconds). The configuration option is the SSH-1 key regeneration interval (KeyRegenerationInterval). But I use SSH protocol v2 so this configuration parameter did not affect my putty connection crashes. Putty also has such an option, located in the session editor screen Connection -> SSH -> Kex option "Max minutes before rekey (0 for no limit)". When I changed this to 1 minute the session crashes became much more frequent. See below: xxx pts/1 x.x.x.x Sat Apr 11 19:42 - 19:44 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 19:27 - 23:58 (04:30) xxx pts/1 x.x.x.x Sat Apr 11 16:25 - 16:27 (00:02) xxx pts/0 x.x.x.x Sat Apr 11 16:25 - 16:31 (00:06) xxx pts/0 x.x.x.x Sat Apr 11 13:46 - 15:46 (02:00) xxx pts/1 x.x.x.x Sat Apr 11 13:45 - 13:47 (00:02) xxx pts/1 x.x.x.x Sat Apr 11 13:43 - 13:44 (00:01) xxx pts/1 x.x.x.x Sat Apr 11 13:13 - 13:14 (00:01) xxx pts/0 x.x.x.x Sat Apr 11 13:12 - 13:45 (00:33) Now there are some sessions which lasted alot longer than the others. This is due to me having two different sessions open most of the time, one for running the 'echo $RANDOM' script and one for calling 'last | head' when the other session had crashed. The 'last | head' session does not generate much output and this seems to affect the probability of the connection crashing. Not sure as of yet how exactly the output and key exchange are related. I am currently trying to reproduce the connection crashing with the increased number of key exchanges with version 0.7.3, but I don't think it will crash. I would like to ask anyone (Henry ;) ) to try and reproduce the crashing by setting your rekey PuTTY option to 1 minute. Thanks. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-10 17:36 Message: Hello Keith, please open a separate Tracker for the sleep bug. It has nothing to do with putty. I can confirm it under Debian 4.0 with coLinux 0.7.4-rc1 on FLTK console, the message is sleep: xnanosleep.c:58: xnanosleep: Assertion `0 <= seconds' failed. For the network, it is normal, that all packets from Windows Host to coLinux Guest goes also over the wire out. Windows does not known, that we are in capture mode on this network interface. My question was more, goes the ssh connection over the wire. You sad no. Then please lets test some netio from Windows to coLinux. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-10 16:20 Message: After further investigating the problem I found this thread on the net: http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html It mentions something strange happening in the User Mode Linux kernel which causes random variables in random processes to get corrupted when doing something with OpenSSL. My little bash script became very unreliable when starting an openssl key generation process in another putty. Normally I get a neverending string of random decimal numbers, but now the sleep program starts aborting with message: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted Can someone please try to reproduce the error? Start this bash script in one putty session: while true; do echo -n "$RANDOM"; sleep 0.1; done Start the following command in another one: openssl genrsa -out /dev/null 4096 It should be obvious the sleep command is failing, 'cause there are alot of error messages. I used colinux 0.8.0 (20090329) this time with the pcap bridge for networking. Also, henryn, I monitored my network traffic and the network traffic does reach the wire. My network is very reliable, but of course we can't rule out the packet corruption is comming from interferance on the wire. But this does not explain sleep aborting when doing something with OpenSSL. This seems like a whole other bug, but I think the networking thing and the corruption of memory are related. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-09 21:12 Message: Thanks for the quick response. Above I wrote: I tested the virtual machine on the windows host system. This means I used the same desktop/computer system to run the virtual machine and to run the client software (PuTTY, WinSCP). Therefore I concluded the traffic does not go over the wire. As for the hours between my bug report and this comment the same virtual machine running under coLinux 0.7.3 is still doing great. I did not experience any corrupted downloads inside the vm, but I will test this later on. Also I noticed my sleep command aborting sometimes while running this bash script (to generate traffic) while true; do echo -n "$RANDOM"; sleep 0.5; done It blurts somekind of assertion not being valid or something in xnanosleep.c on some line, will report the actual error message later on. Maybe this is related, cause I don't get such aborts with the 0.7.3 version. I will investigate the problem further tomorrow or the day after that. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-09 17:18 Message: Hello Keith, do you use PuTTY on same desktop where coLinux is running? Or goes it over the wire? I can not assume, that this is a problem on the network bridge between coLinux and Host. I use PuTTY every day for many hours and never have corrupted data, or such errors. I'm auto logged in with public + private key pairs. PuTTY and coLinux runs on the same desktop. For me, I use PuTTY as connection between Host (Windows) and Guest (coLInux). I also have no errors from network mounts and getting downloads from internet via this bridging interface. coLinux ndis-bridge have heavy tested with "netio" and there are no errors. Please test your network connection without ssh, for example with netio ( http://www.nwlab.net/art/netio/netio.html ). Use the -t option to test only TCP (ssh uses TCP only). Test first from your location where you have detected the errors. Than test the connection between your Host (Windows) and Guest (coLinux). Changes on pcap-bridge betwen 0.7.3 and 0.7.4 are very rarely. Here is the list of changes: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/user/conet-bridged-daemon/?view=log Revision 1057 was the release build for 0.7.3. So, there exist only 4 changes up to version 0.7.4. The biggest change was revision 1222, this is first included in build 20090227. To find the regression, you can use snapshots from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 |
From: coLinux a. <col...@he...> - 2009-04-11 04:07:06
|
The autobuild system has detected a new revision in the source repository. Review last changed from changelog.txt, also attached in mail. Download the compiled version: http://www.henrynestler.com/colinux/autobuild/devel-20090410/ colinux-0.8.0-20090410.src.tgz (685195 Bytes) daemons-0.8.0-20090410.dbg.zip (590983 Bytes) daemons-0.8.0-20090410.zip (477755 Bytes) Note, the autobuild compilation does not include an installer. Remember to reload the driver with these commands: colinux-daemon.exe --remove-driver colinux-daemon.exe --install-driver The vmlinux and modules are up to date. Please use last version from http://www.henrynestler.com/colinux/autobuild/devel-20090329/ The autobuild compilations are not official releases of Cooperative Linux software. There is no warranty that any autobuild version is stable. If use this autobuild version, please give us feedback of your experience. Job runs on machine with 64 bit version of gcc 4.1.2. A service from http://gcc.gnu.org/wiki/CompileFarm -- Lots of fun with newest version, Henry Nestler ------------------------------------------------------------------------ r1242 | henryn | 2009-04-10 11:50:05 +0000 (Fri, 10 Apr 2009) | 1 line Changed paths: M /branches/devel/src/colinux/os/winnt/user/install/iDl.ini * Installer: Update links to release notes for ArchLinux, Debian and Gentoo. ------------------------------------------------------------------------ |
From: SourceForge.net <no...@so...> - 2009-04-10 17:36:08
|
Bugs item #2748015, was opened at 2009-04-09 17:54 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Daemons (Windows) Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: PuTTY failing after a while with "Server key not valid" Initial Comment: Hi, a few days ago I upgraded my coLinux install to version 0.8.0 snapshot 20090329. While using the ndis-bridge feature I noticed my SSH connections crashing after a while (like 20 minutes, sometimes less, sometimes after hours). PuTTY and WinSCP report the error as "Server's host key did not match the signature supplied". WinSCP is build on the SSH code of PuTTY so it seems to be a low-level error. I tested a few setups to isolate the problem: - coLinux 0.8.0 (20090329) with ndis-bridge: fails after a while. - coLinux 0.8.0 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with ndis-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.3 (20080608) with pcap-bridge: still going strong after 2 hours and alot of traffic. The fact that I get the same error with the RC1 and the 0.8.0 development version is not supprising as it should be based on the same code. >From searching the net I found these possible causes for the problem: - The cached server key signature is not valid anymore (would prevent me from logging in) - The network traffic gets mauled between server and client - The client or server software can be at fault. My setup specs: - AMD AthlonXP 3800+, 2gb ram - Windows XP Pro SP3 + all updates to date - Network card: NVIDIA nForce 10/100/1000 Mbps Ethernet - Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) - Server software: OpenSSL (0.9.8j pacman package #1) / OpenSSH (5.1p1 pacman package #2) - Client software: PuTTY (0.60) / WinSCP (4.1.7 build 413) I tested the virtual machine on the windows host system. There cannot be any signal decay on the wire cause the bridged network shouldn't use it. This, and the fact I cannot reproduce the problem with version 0.7.3, leads me to believe the newer colinux versions are corrupting the data somewhere in between. I would like to know what you make of this problem. Solutions/workarounds and reproducability confirmations are also welcome. Thanks, Keith ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2009-04-10 19:36 Message: Hello Keith, please open a separate Tracker for the sleep bug. It has nothing to do with putty. I can confirm it under Debian 4.0 with coLinux 0.7.4-rc1 on FLTK console, the message is sleep: xnanosleep.c:58: xnanosleep: Assertion `0 <= seconds' failed. For the network, it is normal, that all packets from Windows Host to coLinux Guest goes also over the wire out. Windows does not known, that we are in capture mode on this network interface. My question was more, goes the ssh connection over the wire. You sad no. Then please lets test some netio from Windows to coLinux. Henry ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-10 18:20 Message: After further investigating the problem I found this thread on the net: http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html It mentions something strange happening in the User Mode Linux kernel which causes random variables in random processes to get corrupted when doing something with OpenSSL. My little bash script became very unreliable when starting an openssl key generation process in another putty. Normally I get a neverending string of random decimal numbers, but now the sleep program starts aborting with message: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted Can someone please try to reproduce the error? Start this bash script in one putty session: while true; do echo -n "$RANDOM"; sleep 0.1; done Start the following command in another one: openssl genrsa -out /dev/null 4096 It should be obvious the sleep command is failing, 'cause there are alot of error messages. I used colinux 0.8.0 (20090329) this time with the pcap bridge for networking. Also, henryn, I monitored my network traffic and the network traffic does reach the wire. My network is very reliable, but of course we can't rule out the packet corruption is comming from interferance on the wire. But this does not explain sleep aborting when doing something with OpenSSL. This seems like a whole other bug, but I think the networking thing and the corruption of memory are related. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-09 23:12 Message: Thanks for the quick response. Above I wrote: I tested the virtual machine on the windows host system. This means I used the same desktop/computer system to run the virtual machine and to run the client software (PuTTY, WinSCP). Therefore I concluded the traffic does not go over the wire. As for the hours between my bug report and this comment the same virtual machine running under coLinux 0.7.3 is still doing great. I did not experience any corrupted downloads inside the vm, but I will test this later on. Also I noticed my sleep command aborting sometimes while running this bash script (to generate traffic) while true; do echo -n "$RANDOM"; sleep 0.5; done It blurts somekind of assertion not being valid or something in xnanosleep.c on some line, will report the actual error message later on. Maybe this is related, cause I don't get such aborts with the 0.7.3 version. I will investigate the problem further tomorrow or the day after that. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-09 19:18 Message: Hello Keith, do you use PuTTY on same desktop where coLinux is running? Or goes it over the wire? I can not assume, that this is a problem on the network bridge between coLinux and Host. I use PuTTY every day for many hours and never have corrupted data, or such errors. I'm auto logged in with public + private key pairs. PuTTY and coLinux runs on the same desktop. For me, I use PuTTY as connection between Host (Windows) and Guest (coLInux). I also have no errors from network mounts and getting downloads from internet via this bridging interface. coLinux ndis-bridge have heavy tested with "netio" and there are no errors. Please test your network connection without ssh, for example with netio ( http://www.nwlab.net/art/netio/netio.html ). Use the -t option to test only TCP (ssh uses TCP only). Test first from your location where you have detected the errors. Than test the connection between your Host (Windows) and Guest (coLinux). Changes on pcap-bridge betwen 0.7.3 and 0.7.4 are very rarely. Here is the list of changes: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/user/conet-bridged-daemon/?view=log Revision 1057 was the release build for 0.7.3. So, there exist only 4 changes up to version 0.7.4. The biggest change was revision 1222, this is first included in build 20090227. To find the regression, you can use snapshots from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 |
From: SourceForge.net <no...@so...> - 2009-04-10 16:20:23
|
Bugs item #2748015, was opened at 2009-04-09 15:54 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Daemons (Windows) Group: v0.8.x (devel) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: PuTTY failing after a while with "Server key not valid" Initial Comment: Hi, a few days ago I upgraded my coLinux install to version 0.8.0 snapshot 20090329. While using the ndis-bridge feature I noticed my SSH connections crashing after a while (like 20 minutes, sometimes less, sometimes after hours). PuTTY and WinSCP report the error as "Server's host key did not match the signature supplied". WinSCP is build on the SSH code of PuTTY so it seems to be a low-level error. I tested a few setups to isolate the problem: - coLinux 0.8.0 (20090329) with ndis-bridge: fails after a while. - coLinux 0.8.0 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with ndis-bridge: fails after a while. - coLinux 0.7.4 rc1 (20090329) with pcap-bridge: fails after a while. - coLinux 0.7.3 (20080608) with pcap-bridge: still going strong after 2 hours and alot of traffic. The fact that I get the same error with the RC1 and the 0.8.0 development version is not supprising as it should be based on the same code. >From searching the net I found these possible causes for the problem: - The cached server key signature is not valid anymore (would prevent me from logging in) - The network traffic gets mauled between server and client - The client or server software can be at fault. My setup specs: - AMD AthlonXP 3800+, 2gb ram - Windows XP Pro SP3 + all updates to date - Network card: NVIDIA nForce 10/100/1000 Mbps Ethernet - Guest OS is ArchLinux (ver 2009.02) (using only prebuild packages from the ArchLinux repositories) - Server software: OpenSSL (0.9.8j pacman package #1) / OpenSSH (5.1p1 pacman package #2) - Client software: PuTTY (0.60) / WinSCP (4.1.7 build 413) I tested the virtual machine on the windows host system. There cannot be any signal decay on the wire cause the bridged network shouldn't use it. This, and the fact I cannot reproduce the problem with version 0.7.3, leads me to believe the newer colinux versions are corrupting the data somewhere in between. I would like to know what you make of this problem. Solutions/workarounds and reproducability confirmations are also welcome. Thanks, Keith ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-10 16:20 Message: After further investigating the problem I found this thread on the net: http://fixunix.com/openssl/518688-re-uml-devel-dev-random-problems-fp-registers-corruption.html It mentions something strange happening in the User Mode Linux kernel which causes random variables in random processes to get corrupted when doing something with OpenSSL. My little bash script became very unreliable when starting an openssl key generation process in another putty. Normally I get a neverending string of random decimal numbers, but now the sleep program starts aborting with message: sleep: xnanosleep.c:67: xnanosleep: Assertion `0 <= seconds' failed. Aborted Can someone please try to reproduce the error? Start this bash script in one putty session: while true; do echo -n "$RANDOM"; sleep 0.1; done Start the following command in another one: openssl genrsa -out /dev/null 4096 It should be obvious the sleep command is failing, 'cause there are alot of error messages. I used colinux 0.8.0 (20090329) this time with the pcap bridge for networking. Also, henryn, I monitored my network traffic and the network traffic does reach the wire. My network is very reliable, but of course we can't rule out the packet corruption is comming from interferance on the wire. But this does not explain sleep aborting when doing something with OpenSSL. This seems like a whole other bug, but I think the networking thing and the corruption of memory are related. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-04-09 21:12 Message: Thanks for the quick response. Above I wrote: I tested the virtual machine on the windows host system. This means I used the same desktop/computer system to run the virtual machine and to run the client software (PuTTY, WinSCP). Therefore I concluded the traffic does not go over the wire. As for the hours between my bug report and this comment the same virtual machine running under coLinux 0.7.3 is still doing great. I did not experience any corrupted downloads inside the vm, but I will test this later on. Also I noticed my sleep command aborting sometimes while running this bash script (to generate traffic) while true; do echo -n "$RANDOM"; sleep 0.5; done It blurts somekind of assertion not being valid or something in xnanosleep.c on some line, will report the actual error message later on. Maybe this is related, cause I don't get such aborts with the 0.7.3 version. I will investigate the problem further tomorrow or the day after that. Keith ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2009-04-09 17:18 Message: Hello Keith, do you use PuTTY on same desktop where coLinux is running? Or goes it over the wire? I can not assume, that this is a problem on the network bridge between coLinux and Host. I use PuTTY every day for many hours and never have corrupted data, or such errors. I'm auto logged in with public + private key pairs. PuTTY and coLinux runs on the same desktop. For me, I use PuTTY as connection between Host (Windows) and Guest (coLInux). I also have no errors from network mounts and getting downloads from internet via this bridging interface. coLinux ndis-bridge have heavy tested with "netio" and there are no errors. Please test your network connection without ssh, for example with netio ( http://www.nwlab.net/art/netio/netio.html ). Use the -t option to test only TCP (ssh uses TCP only). Test first from your location where you have detected the errors. Than test the connection between your Host (Windows) and Guest (coLinux). Changes on pcap-bridge betwen 0.7.3 and 0.7.4 are very rarely. Here is the list of changes: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/user/conet-bridged-daemon/?view=log Revision 1057 was the release build for 0.7.3. So, there exist only 4 changes up to version 0.7.4. The biggest change was revision 1222, this is first included in build 20090227. To find the regression, you can use snapshots from http://www.henrynestler.com/colinux/testing/devel-0.8.0/ Henry ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=2748015&group_id=98788 |