From: Dan A. <da...@gm...> - 2004-02-01 22:14:23
|
Hello, Currently, there are two issues that stand between us and a more stable and easier to develop version of coLinux. BSOD ---- The BSODs that many people are getting are caused by a bug that manifests itself only on some machines. Unfortunately, I can't reproduce it here. Because of that, you should send me results of crash dumps, using tools freely available from Microsoft. Thanks ePAc for enlightening us about kernel dumps. Me and Okajima think it is important for the road ahead. I'd like you to know something aboue me, which should encourge everybody on this developers list to participate in development and debugging. When I started coLinux on November, I had almost no knowledge in Windows Kernel programming. A few years ago, I had no knowledge in Linux kernel programming at all. Don't be afraid to try something that you don't know, especially in the kernel field. I ask for anyone who experiences a BSOD with coLinux, please extract useful information out from the tools Microsoft has provided us: Debug tools at: http://www.microsoft.com/whdc/ddk/debugging/installx86.mspx Debugging symbols at: http://www.microsoft.com/whdc/ddk/debugging/symbolpkg.mspx To enable dumps (as ePAc noted): Right click on "My Computer" Select "Properties" Select "Advanced" tab Under "Startup And Recovery" there is some option to setup different type of memory dump (minidump, kernel dump, full memory dump). Plus, I am planning to add automatic traces to coLinux's build so it would be even more easier to make something out of these dumps. DDK dependency -------------- Making it possible to compile valid .sys files on Linux is something that can be developed quite separately from coLinux devleopment. ReactOS would also benefit from it (binary compatibility with Windows), although they don't depend on it at the moment. I know there must be some very talented developers on this list, so I seek volunteers fir this mini-development. coLinux's source has a CREDITS file, needless to remind ;) -- Dan Aloni da...@gm... |
From: Dan A. <da...@gm...> - 2004-02-01 23:07:37
|
On Mon, Feb 02, 2004 at 12:14:17AM +0200, Dan Aloni wrote: > I ask for anyone who experiences a BSOD with coLinux, please > extract useful information out from the tools Microsoft has > provided us: I tried to use test these tools by deliberately causing an exception, and it appears they are very useful. Using linux.pdb, which is also generated in the compilation, I am able to produce a stacktrace such as: nt!KiDispatchException+0x30e nt!CommonDispatchException+0x4d nt!KiUnexpectedInterruptTail+0x1f4 linux!co_os_file_block_read+0x3c linux!co_monitor_file_block_service+0xc5 linux!co_monitor_block_request+0x41 linux!co_monitor_device_request+0x70 linux!co_monitor_iteration+0x107 linux!co_monitor_run+0x99 linux!co_monitor_ioctl+0x11e linux!co_manager_ioctl+0xd5 linux!co_manager_dispatch+0xa4 nt!IopfCallDriver+0x35 nt!IopSynchronousServiceTail+0x60 nt!IopXxxControlFile+0x5e4 nt!NtDeviceIoControlFile+0x28 nt!KiSystemService+0xc4 ntdll!ZwDeviceIoControlFile+0xb KERNEL32!DeviceIoControl+0x100 WARNING: Stack unwind information not available. Following frames may be wrong. colinux_daemon+0x2f27 colinux_daemon+0x1b7e colinux_daemon+0x1bb0 colinux_daemon+0x158d colinux_daemon+0x1767 colinux_daemon+0x401a colinux_daemon+0x7400 cygwin1!forkpty+0x3688 cygwin1!dll_crt0+0x1ad colinux_daemon+0x7453 colinux_daemon+0x103c KERNEL32!BaseProcessStart+0x3d I'd like to see stack traces such as these, even before I send you a matched linux.pdb to the previously distributed linux.sys. -- Dan Aloni da...@gm... |
From: Dan A. <da...@gm...> - 2004-02-02 22:48:13
|
On Mon, Feb 02, 2004 at 04:35:36PM -0600, Richard Goodwin wrote: > I think it works! :) So far no crash!! What did you change? Cool. This might take care of those BSODs for now. --- colinux-20040131/src/colinux/kernel/monitor.c 2004-01-31 18:57:06.000000000 +0200 +++ colinux-20040131-patch/src/colinux/kernel/monitor.c 2004-02-03 00:19:25.000000000 +0200 @@ -114,9 +114,9 @@ co_rc_t rc; cmon->page_tables_size = cmon->physical_frames * sizeof(unsigned long *); - cmon->page_tables_pages = (cmon->page_tables_size + PAGE_SHIFT-1) >> PAGE_SHIFT; + cmon->page_tables_pages = (cmon->page_tables_size + PAGE_SIZE-1) >> PAGE_SHIFT; cmon->pa_maps_size = cmon->manager->host_memory_pages * sizeof(unsigned long); - cmon->pa_maps_pages = (cmon->pa_maps_size + PAGE_SHIFT-1) >> PAGE_SHIFT; + cmon->pa_maps_pages = (cmon->pa_maps_size + PAGE_SIZE-1) >> PAGE_SHIFT; rc = co_monitor_alloc_pages(cmon, cmon->page_tables_pages, (void **)&cmon->page_tables); if (!CO_OK(rc)) { An obvious bug-o. But I don't think it's the end of the fix. The problem is that the amount of physical RAM reported from userspace is not divided by 16MB, which means that something went wrong in its calculation. Look at this: unsigned long co_os_get_physical_ram_size() { MEMORYSTATUS memstat; GlobalMemoryStatus(&memstat); /* Round up in MBs: */ return (memstat.dwTotalPhys + 0xFFFFF) & 0xFFF00000; } It's BAD. Does anyone know a better way to get this information? -- Dan Aloni da...@gm... |
From: ePAc <ep...@ko...> - 2004-02-02 23:02:39
|
I have some output from the kernel debugger, but i don't have the symbols files (gotta hunt those suckers down on the MS website), and no linux.pdb to trace either. this looks like it can work (i say that, but i have no idea what i'm tlaking about, other than the fact it seems that the system is trying to write to something it shouldn't be writing to :o) and one studpid question, but why round it up ? and not down ? Jok On Tue, 3 Feb 2004, Dan Aloni wrote: > Date: Tue, 3 Feb 2004 00:48:03 +0200 > From: Dan Aloni <da...@gm...> > To: Richard Goodwin <ric...@ri...> > Cc: Cooperative Linux Development <col...@li...>, > Shachar Shemesh <win...@sh...>, > Steven Edwards <ste...@ya...> > Subject: Re: [coLinux-devel] Re: BSOD, DDK dependency > > On Mon, Feb 02, 2004 at 04:35:36PM -0600, Richard Goodwin wrote: > > I think it works! :) So far no crash!! What did you change? > > Cool. > > This might take care of those BSODs for now. > > --- colinux-20040131/src/colinux/kernel/monitor.c 2004-01-31 18:57:06.000000000 +0200 > +++ colinux-20040131-patch/src/colinux/kernel/monitor.c 2004-02-03 00:19:25.000000000 +0200 > @@ -114,9 +114,9 @@ > co_rc_t rc; > > cmon->page_tables_size = cmon->physical_frames * sizeof(unsigned long *); > - cmon->page_tables_pages = (cmon->page_tables_size + PAGE_SHIFT-1) >> PAGE_SHIFT; > + cmon->page_tables_pages = (cmon->page_tables_size + PAGE_SIZE-1) >> PAGE_SHIFT; > cmon->pa_maps_size = cmon->manager->host_memory_pages * sizeof(unsigned long); > - cmon->pa_maps_pages = (cmon->pa_maps_size + PAGE_SHIFT-1) >> PAGE_SHIFT; > + cmon->pa_maps_pages = (cmon->pa_maps_size + PAGE_SIZE-1) >> PAGE_SHIFT; > > rc = co_monitor_alloc_pages(cmon, cmon->page_tables_pages, (void **)&cmon->page_tables); > if (!CO_OK(rc)) { > > An obvious bug-o. > > But I don't think it's the end of the fix. The problem is that the > amount of physical RAM reported from userspace is not divided > by 16MB, which means that something went wrong in its calculation. > > Look at this: > > unsigned long co_os_get_physical_ram_size() > { > MEMORYSTATUS memstat; > > GlobalMemoryStatus(&memstat); > > /* Round up in MBs: */ > > return (memstat.dwTotalPhys + 0xFFFFF) & 0xFFF00000; > } > > It's BAD. > > Does anyone know a better way to get this information? > > -- > Dan Aloni > da...@gm... > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > coLinux-devel mailing list > coL...@li... > https://lists.sourceforge.net/lists/listinfo/colinux-devel > --- Nothing is foolproof to a sufficiently talented fool... oo ,(..)\ ~~ |
From: ePAc <ep...@ko...> - 2004-02-02 23:13:05
|
> I have some output from the kernel debugger, but i don't have the symbols > files (gotta hunt those suckers down on the MS website), and no linux.pdb > to trace either. this is the output (minus the big useless boxes). it loads in the windbg, and gives me some "disassembly". Dan, is you want to have a look at it, i'd be more than happy to give you more info.. :o) -------------------------------------------------------------------------- ATTEMPTED_WRITE_TO_READONLY_MEMORY (be) An attempt was made to write to readonly memory. The guilty driver is on the stack trace (and is typically the current instruction pointer). When possible, the guilty driver's name (Unicode string) is printed on the bugcheck screen and saved in KiBugCheckDriver. Arguments: Arg1: eea5e000, Virtual address for the attempted write. Arg2: 81ec1b38, PTE contents. Arg3: eebe0aa8, (reserved) Arg4: 0000000e, (reserved) Debugging Details: ------------------ ***** Kernel symbols are WRONG. Please fix symbols to do analysis. DEFAULT_BUCKET_ID: DRIVER_FAULT BUGCHECK_STR: 0xBE LAST_CONTROL_TRANSFER: from 80511b47 to 804f4103 STACK_TEXT: WARNING: Stack unwind information not available. Following frames may be wrong. eebe0a44 80511b47 000000be eea5e000 81ec1b38 nt!KeBugCheckEx+0x19 eebe0a90 80530140 00000001 eea5e000 00000000 nt!MmTrimAllSystemPagableMemory+0x50f1 eebe0aa8 00000000 c03ba778 eebe0ad4 eeda621d nt!Kei386EoiHelper+0x2388 FOLLOWUP_IP: nt!MmTrimAllSystemPagableMemory+50f1 80511b47 833dbc30548000 cmp dword ptr [nt!LpcPortObjectType+0x114 (805430bc)],0x0 FOLLOWUP_NAME: MachineOwner SYMBOL_NAME: nt!MmTrimAllSystemPagableMemory+50f1 IMAGE_NAME: Unknown_Image DEBUG_FLR_IMAGE_TIMESTAMP: 0 STACK_COMMAND: kb BUCKET_ID: WRONG_SYMBOLS MODULE_NAME: Unknown_Module Followup: MachineOwner --------- I hope this makes more sense to some of you than to me... this the the bugcheck i got after a clean boot, running it once, and got a error -15 (as mentioned in other posts), and on the second try, a couple of minutes later, this BSOD. --- Nothing is foolproof to a sufficiently talented fool... oo ,(..)\ ~~ |
From: Dan A. <da...@gm...> - 2004-02-02 23:30:21
|
On Mon, Feb 02, 2004 at 03:13:17PM -0800, ePAc wrote: > I hope this makes more sense to some of you than to me... I'll try to make the most of it. > this the the bugcheck i got after a clean boot, running it once, and got a > error -15 (as mentioned in other posts), and on the second try, a couple > of minutes later, this BSOD. Just to make it clear about what happens in the case described above: When the daemon terminates normally, it uninstalls the linux.sys service. If you get a BSOD, linux.sys is left installed as a service, and when the daemon is ran again after reboot, it returns -15 since the service is already installed. It then removes the service and exits. On the second try, it runs normally. Yeah, it could be made more simple. Please send patches :) -- Dan Aloni da...@gm... |
From: Steven E. <ste...@ya...> - 2004-02-02 23:28:58
|
NtQuerySystemInformation may give you what you need. Also you could look at using PSAPI or the toolhelp APIs. Thanks Steven --- Dan Aloni <da...@gm...> wrote: > It's BAD. > > Does anyone know a better way to get this information? __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/ |
From: Dan A. <da...@gm...> - 2004-02-03 06:28:28
|
On Mon, Feb 02, 2004 at 03:28:57PM -0800, Steven Edwards wrote: > NtQuerySystemInformation may give you what you need. Also you could > look at using PSAPI or the toolhelp APIs. I looked at its documentation and it doesn't appear to give me what I need. Other solutions? -- Dan Aloni da...@gm... |
From: Ballard J. <sac...@ho...> - 2004-02-03 08:34:24
|
Use GlobalMemoryStatusEx() for more consistency across NT platforms. unsigned long co_os_get_physical_ram_size( void ) { MEMORYSTATUSEX m ; m.dwLength = sizeof( m ) ; GlobalMemoryStatusEx( &m ) ; if( m.ullTotalPhys > (DWORDLONG) 0xFFF00000 ) return 0xFFF00000 ; // avoids 64bit to 32bit overflow conversion return 0xFFF00000 & (unsigned long) m.ullTotalPhys ; } ----- Original Message ----- From: "Dan Aloni" da...@gm... >[...] > Look at this: > > unsigned long co_os_get_physical_ram_size() > { > MEMORYSTATUS memstat; > > GlobalMemoryStatus(&memstat); > > /* Round up in MBs: */ > > return (memstat.dwTotalPhys + 0xFFFFF) & 0xFFF00000; > } > > It's BAD. > > Does anyone know a better way to get this information? |
From: Dan A. <da...@gm...> - 2004-02-02 23:03:06
|
On Mon, Feb 02, 2004 at 04:48:32PM -0600, Richard Goodwin wrote: > Dan, here is a register dump for the original "50" bugcheck. > kd> r > eax=ebab5000 ebx=eb3b6000 ecx=00000000 edx=0003fc00 esi=00000000 edi=00000000 ^^^^^^^^ edx=0003fc00. Is there 1GB of RAM in that machine? It crashed because the pa_to_host_va was allocated too short because of the PAGE_SHIFT bug, that was triggered because 1MB is not enough to round GlobalMemoryStatus's result. Aiee, Aiee, Aiee! -- Dan Aloni da...@gm... |
From: Steven E. <ste...@ya...> - 2004-02-02 00:14:52
|
--- Dan Aloni <da...@gm...> wrote: > DDK dependency > -------------- > > Making it possible to compile valid .sys files on Linux is > something that can be developed quite separately from coLinux > devleopment. ReactOS would also benefit from it (binary > compatibility with Windows), although they don't depend on it > at the moment. It could be a bug in the Cygwin cross-compiler. We have no problem using the .sys files created by Mingw under Windows. If you can post some object files I will try and link it on mingw when I get back on the 9th or 10th. In the mean-time try passing the -mno-cygwin flag when linking. Thanks Steven PS Just rip the ReactOS or w32api DDK in to CoLinux. __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/ |
From: Richard G. <ric...@ri...> - 2004-02-02 03:08:22
|
I've gotten a STOP 8E and a 50, both on 2k3, both seeming to occur when coLinux is shutting down. I haven't had the chance to analyze at all yet. I was fooling quite a bit with the networking, trying to get things working without using ICS, unsuccessfully I might add. Richard ----- Original Message ----- From: "Steven Edwards" <ste...@ya...> To: "Dan Aloni" <da...@gm...>; "Cooperative Linux Development" <col...@li...> Sent: Sunday, February 01, 2004 6:14 PM Subject: Re: [coLinux-devel] BSOD, DDK dependency > --- Dan Aloni <da...@gm...> wrote: > > DDK dependency > > -------------- > > > > Making it possible to compile valid .sys files on Linux is > > something that can be developed quite separately from coLinux > > devleopment. ReactOS would also benefit from it (binary > > compatibility with Windows), although they don't depend on it > > at the moment. > > It could be a bug in the Cygwin cross-compiler. We have no problem > using the .sys files created by Mingw under Windows. If you can post > some object files I will try and link it on mingw when I get back on > the 9th or 10th. In the mean-time try passing the -mno-cygwin flag when > linking. > > Thanks > Steven > > PS Just rip the ReactOS or w32api DDK in to CoLinux. > > __________________________________ > Do you Yahoo!? > Yahoo! SiteBuilder - Free web site building tool. Try it! > http://webhosting.yahoo.com/ps/sb/ > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > coLinux-devel mailing list > coL...@li... > https://lists.sourceforge.net/lists/listinfo/colinux-devel > > |
From: Ian C. B. <ia...@bl...> - 2004-02-02 05:39:58
|
I've managed to get the Debian3.0 root image and the binary release of coLinux running on a WinXP/Pro laptop with only minor issues. The block device size issue (30448/262144) causes a warning on mount. You can of course override it with: $ mount -o remount,rw / $ exec /sbin/init 3 Still, it will be nice to hunt this one down. I was able to generate a ~1G file on the filesystem with no errors with dd - so the block limitation reported on boot doesn't appear to be breaking writes past that point. The win32 Tap interface is working just fine, and works well with both ICS and bridging. Here's a quick guide for some of the unfamiliar users: For ICS: 1. Give your Tap Local Area Connection interface an IP of 192.168.0.1 2. Go to Network Settings, open your your "external" interface (whatever Local Area or Wireless Connection interface you use as your default route), click on the Advanced tab, select Internet Connection Sharing, Allow other network users to connect through this comput'ers Internet connection, and select the Local Area Connection interface assigned to your Tap driver. 3. The default Debian3.0 root image has an IP of 192.168.0.40 preconfigured in its /etc/network/interfaces file: simply do the following: $ ifup eth0 You should now be able to ping out. For bridging: 1. Turn off ICS (follow step 2 above, disabling ICS) 2. Select two interfaces (holding down the control key) and right click a menu to select "Bridge Connections". 3. Configure /etc/network/interfaces on your colinux image, or ifconfig up your eth0 interface with an IP on your segment and add a default route through your gateway. (Note: if you do happen to use a wireless interface in your testing, don't attempt bridging - it doesn't work). Both methods above work just fine for me. You might also try a direct routed approach, if you're more familiar with IP networking and able to add routes on a machine upstream from your windows box (or configure RIP/OSPF/etc to do the same). It should work with little effort. On the plus side: no BSODs yet. ;) - Ian C. Blenke <ia...@bl...> PS. I hate Windows, Windows' dialog boxes, and the pain that must be taken to describe such trivial procedures. |
From: Ian C. B. <ia...@bl...> - 2004-02-02 21:15:32
|
On Mon, Feb 02, 2004 at 03:27:20PM -0500, Remy Porter wrote: > ICS isn't working, nor is bridging. Bridging however, completely knocks > my computer off of the Internet (though not the local network, I fail to > understand this). With ICS, I give TAP the .0.1 IP, then when I activate > ICS, I'm told that my external connection will be set to .0.1 (is this > intentional, or is it some slight change in instructions between 2k and > XP?) I run ifup eth0, then try and ping and nothing. Any other thoughts? Win2k doesn't have ICS, that I'm aware of. It's an XP thing. You enabled ICS on your external interface, right? Not the Tap interface. You should not be setting your "external" connection to 192.168.0.1. ICS only NATs IPs in the 192.168.0.x netblock, with the ICS gateway itself at 192.168.0.1 on the "internal" connection. External Interface (Local Area Connection 1 - Ethernet) (24.164.164.210) WindowsXP w/ICS NAT (192.168.0.1) Tap Interface (Local Area Connection 2 - Tap interface) (192.168.0.40) Linux Image When you enable bridging, you need to assign the bridge interface the "external" IP of your PC (or use DHCP, as per default). Bridged Interface (24.164.154.210) Local Area Connection 1 (Ethernet) Local Area Connection 2 (Tap interface) (24.164.154.211) Linux Image Both your host's Bridge Interface *and* the Linux Image have IP adresses on the same external public segment, with the same default gateway. All packets that your Windows host sees will also be seen by the Linux image, and vice-versa. A Tap interface has two sides: the host side and the guest side. - With ICS, the host side needs to be assigned an IP address. The guest must be given an IP address on that same virtual ethernet segment (think of it as a crossover cable). The ICS host acts as the default gateway for the guest to route through. - With a bridge, you add the host side to the bridge - no IP is needed on the host side for layer-2 bridging. The guest side may take any IP on that external network, just as the host itself would (think of it as a hub to which both the host and the guest are plugged into on the outside). The host side, however DOES need to assign an IP address to the bridge interface itself (think of it as a virtual interface on that same segment that the host uses to talk to that network). I hope that didn't confuse you more than it helped. You know, a colinux-users list might be in order here at some point. I hate spamming *-devel lists with user support traffic. - Ian C. Blenke <ia...@bl...> |
From: Tim L. <ti...@ke...> - 2004-02-02 21:25:01
|
On Mon, Feb 02, 2004 at 04:15:17PM -0500, Ian C. Blenke wrote: > You know, a colinux-users list might be in order here at some point. I > hate spamming *-devel lists with user support traffic. Please wait till the project is bigger to split out a users list. I have watched some projects die because they split the community before they reached critical mass. I would not want to risk that on a project this important. --Tim Larson |
From: Dan A. <da...@gm...> - 2004-02-02 21:35:02
|
On Mon, Feb 02, 2004 at 09:26:01PM +0000, Tim Larson wrote: > On Mon, Feb 02, 2004 at 04:15:17PM -0500, Ian C. Blenke wrote: > > You know, a colinux-users list might be in order here at some point. I > > hate spamming *-devel lists with user support traffic. > > Please wait till the project is bigger to split out a users list. > I have watched some projects die because they split the community before > they reached critical mass. I would not want to risk that on a project > this important. I agree with Tim. The project's maturity is the reason why I didn't open a user mailing list yet. -- Dan Aloni da...@gm... |