From: SourceForge.net <no...@so...> - 2008-04-16 22:13:37
|
Bugs item #1555961, was opened at 2006-09-10 23:07 Message generated for change (Comment added) made by henryn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=1555961&group_id=98788 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None >Group: v0.6.x (release) >Status: Pending >Resolution: Out of Date Priority: 5 Private: No Submitted By: Andrew Tonner (rakslice) >Assigned to: Henry N. (henryn) Summary: Colinux thrashes on boot Initial Comment: I'm having a problem with colinux thrashing the disk on launch. I'm running colinux 0.6.4-linux-2.6.11, and I've got it set up with debian install on a reiserfs image on cobd0 (made by "cp -ax"ing the colinux stock debian image after installing the reiserfs utils to it). When I start my colinux setup it usually gets as far as: [... snip ...] NET: Registered protocol family 1 NET: Registered protocol family 17 ReiserFS: cobd0: found reiserfs format "3.6" with standard journal ReiserFS: cobd0: using ordered data mode ReiserFS: cobd0: journal params: device cobd0, size 8192, journal first block 18 , max trans len 1024, max batch 900, ReiserFS: cobd0: checking transaction log ( cobd0) and sits there hitting the disk for several minutes before continuing. If I force kill the colinux-daemon process while it's doing this (taskkill /im colinux-daemon.exe /f), it doesn't die for several minutes (i.e. the amount of time usually spent thrashing) presumably because it's blocked on a huge IO operation. But it doesn't always do this... Sometimes it boots without unusual disk activity, especially on subsequent colinux launches before I restart windows again. (That could just be the effects of disk caching in windows, but I'm not sure.) This behaviour happens on both the systems I've tried colinux on: my dual core athlon 64 X2 nforce 4 box at work, and my athlon XP 2500 nforce 2 box at home. On my work box, the cobd0 image is 20GB (21474836480 bytes); the one on my home box is substantially smaller, (~8GB IIRC -- I don't have it handy right now.) ---------------------------------------------------------------------- >Comment By: Henry N. (henryn) Date: 2008-04-17 00:13 Message: Logged In: YES user_id=579204 Originator: NO A bug in page fault handlner for sys_mount (mount the root filesystem) can be here the problem. Such similar bugs are fixed in 0.7.3 RC3 and snapshot devel 0.8.0-20080415, see http://www.colinux.org/snapshots/ ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2007-03-10 00:03 Message: Logged In: YES user_id=579204 Originator: NO I'm sorry. The second line should be: colinux-debug-daemon.exe -d -p -s prints=31,misc=31,blockdev=31 -f debug2.xml ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2007-03-09 23:49 Message: Logged In: YES user_id=579204 Originator: NO Hello, can be the problem the size? 21474836480 bytes = 20GB Please, before you starts coLinux, run the Debugger colinux-debug-daemon.exe -d -p -s prints=31,misc=31 -f debug.xml or colinux-debug-daemon.exe -d -p -s prints=31,misc=31,messages=31 -f debug2.xml You can stop the debugger with CTRL-C after beginning the "several minutes"-Problem. Than view into the debug. I'm interesting for the drive geometry detection. Please also locate for misterious messages about your drive there. The debug2.xml can be very big. You can remove all the duplicated block operations after beginning the problems to the end. But, locate for problems or some others non normal things in the output. The format is XML, text is human readable between the "<strings>", open it with IE. ---------------------------------------------------------------------- Comment By: Andrew Tonner (rakslice) Date: 2007-01-12 17:56 Message: Logged In: YES user_id=39760 Originator: YES I switched to an 0.8.0 snapshot (20061212), still using my 21474836480 byte ext3 volume, and I get exactly the same behaviour; it stalls at mount time and hits the disk for several minutes before continuing. ---------------------------------------------------------------------- Comment By: Andrew Tonner (rakslice) Date: 2007-01-06 00:57 Message: Logged In: YES user_id=39760 Originator: YES This doesn't seem to be a filesystem-specific problem. I mkfsed an identically-sized (21474836480 bytes) ext3 volume, cp -ax'd the contents of my reiserfs volume across to it, modified the fstab, and then put my colinux config back so that only the new ext3 volume is being used. After a windows restart, when I start colinux, it sits and thrashes for several minutes at roughly the same place. dmesg: Linux version 2.6.11-co-0.6.4 (george@CoDebianDevel) (gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)) #1 Mon Jun 19 05:36:13 UTC 2006 520MB LOWMEM available. On node 0 totalpages: 133120 DMA zone: 0 pages, LIFO batch:1 Normal zone: 133120 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Built 1 zonelists Kernel command line: root=/dev/cobd0 Initializing CPU#0 Setting proxy interrupt vectors PID hash table entries: 4096 (order: 12, 65536 bytes) Using cooperative for high-res timesource Console: colour CoCON 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 523648k/532480k available (1537k kernel code, 0k reserved, 521k data, 10 8k init, 0k highmem) Calibrating delay loop... 734.00 BogoMIPS (lpj=3670016) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003 CPU: After vendor identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 0 0000000 00000003 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000010 00000001 0000000 0 00000003 CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping 01 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. NET: Registered protocol family 16 devfs: 2004-01-31 Richard Gooch (rg...@at...) devfs: boot_options: 0x0 cofuse init 0.1 (API version 2.2) Initializing Cryptographic API serio: cokbd at irq 1 io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize cobd: loaded (max 32 devices) loop: loaded (max 8 devices) conet: loaded (max 16 devices) conet0: initialized conet1: initialized mice: PS/2 mouse device common for all mice input: AT Translated Set 2 keyboard on cokbd NET: Registered protocol family 2 IP: routing cache hash table of 4096 buckets, 32Kbytes TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 6, 262144 bytes) TCP: Hash tables configured (established 131072 bind 65536) NET: Registered protocol family 1 NET: Registered protocol family 17 [[Here is where it sits and thrashes for several minutes, then]] EXT3 FS on cobd0, internal journal EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem). Freeing unused kernel memory: 108k freed kjournald starting. Commit interval 5 seconds Adding 524280k swap on /dev/cobd1. Priority:-1 extents:1 EXT3 FS on cobd0, internal journal [... snip ...] ---------------------------------------------------------------------- Comment By: Ben Voigt (bvoigt) Date: 2007-01-04 04:51 Message: Logged In: YES user_id=782364 Originator: NO reiserfs, being a journalled filesystem, usually checks itself very quickly. However, by default every 20th boot it forces a full check. The frequency of checks can be changed in the reiser metadata... but looking at reiserfstune I can't find the command for it right now. ---------------------------------------------------------------------- Comment By: Andrew Tonner (rakslice) Date: 2007-01-04 02:06 Message: Logged In: YES user_id=39760 Originator: YES I've gone through this sequence of checks, and fsck never encounters any file system errors, and except for the occasional thrashing for several minutes when I mount a reiserfs volume nothing unusual happens. ---------------------------------------------------------------------- Comment By: Henry N. (henryn) Date: 2006-09-13 19:23 Message: Logged In: YES user_id=579204 It can be a limit in one of the block operations from colinux. Please can you boot from an other image. For sample from the small Debian, ArchLinux or Fedora. Than check the image without mount, with the reiser tools. I'm not know the tool, it is like "fsck.ext3 -f /dev/cobd1" for an ext3 system. Than mount this device, unmount it, check again. Than mount it, write down some, umount it, check again. An totaly other idea: I'm afraid, that your shutting down don't complete your reiser umount. Please try to go into runlevel S (single user mode without network). Check, that no other task are running and not task shoult need write access to your roor filesystem. Than do this command sequence "sync; sleep 1; sync; sleep 3; mount -o remount ro /" The umount should no give an error. Now check your root file system device with reiser tools. If it was clean, shutdown your system and run it again. This helps? ---------------------------------------------------------------------- Comment By: Andrew Tonner (rakslice) Date: 2006-09-12 19:05 Message: Logged In: YES user_id=39760 I had sort of assumed that even kernel space IO happening on the linux side wouldn't cause the colinux-daemon process to block for IO like this. But I don't know the internals so I guess I should stop making assumptions like that. =) Still, other things suggest to me that it's not a resierfs journal replay: - According to the messages by time the read is happening, the system hasn't got the part where the journal replay should happen yet AFAIK - I fired up Sysinternals' FileMon, and the disk activity is colinux-daemon doing a series of consecutive (in terms of offsets) 64k IRP_MJ_READs. FileMon doesn't show the target of the reads (it just gives C:) but it must be the volume file, judging by the eventually huge offsets (I don't have any other files that big) and the fact that the last read before colinux continues is right at 20GB (the last read offset & size lines up with the volume file end position)... unless its reading something other than a file. - Also this behaviour happens even when the the last run of colinux was one that worked fine and was shutdown normally with halt or shutdown. ---------------------------------------------------------------------- Comment By: George P Boutwell (gboutwel) Date: 2006-09-11 02:40 Message: Logged In: YES user_id=30412 Sounds like there is some big disk operation going on in coLinux, I don't know what that operation is (perhaps coLinux didn't get shutdown correctly & reiserfs is trying to replay a long journal?), but you should probably leave it to complete, instead of trying to kill it. Make sure that you are shutting down coLinux, by logging in and running a proper linux shutdown command (halt, poweroff, shutdown -h now, etc) and not just 'killing' coLinux processes. ---------------------------------------------------------------------- Comment By: Andrew Tonner (rakslice) Date: 2006-09-10 23:09 Message: Logged In: YES user_id=39760 I should mention that I've removed the initrd section from my configuration file in case this bug is somehow related to the known problem with that, but this problem didn't go away. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=622063&aid=1555961&group_id=98788 |