[coLinux-devel] [ colinux-Bugs-1555961 ] Colinux thrashes on boot

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Bugs item #1555961, was opened at 2006-09-10 23:07
Message generated for change (Comment added) made by henryn
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=622063&aid=1555961&group_id=98788

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
>Group: v0.6.x (release)
>Status: Pending
>Resolution: Out of Date
Priority: 5
Private: No
Submitted By: Andrew Tonner (rakslice)
>Assigned to: Henry N. (henryn)
Summary: Colinux thrashes on boot

Initial Comment:
I'm having a problem with colinux thrashing the disk on
launch.  I'm running colinux 0.6.4-linux-2.6.11, and
I've got it set up with debian install on a reiserfs
image on cobd0 (made by "cp -ax"ing the colinux stock
debian image after installing the reiserfs utils to
it). When I start my colinux setup it usually gets as
far as:

[... snip ...]

NET: Registered protocol family 1
NET: Registered protocol family 17
ReiserFS: cobd0: found reiserfs format "3.6" with
standard journal
ReiserFS: cobd0: using ordered data mode
ReiserFS: cobd0: journal params: device cobd0, size
8192, journal first block 18
, max trans len 1024, max batch 900, ReiserFS: cobd0:
checking transaction log (
cobd0)

and sits there hitting the disk for several minutes
before continuing.  If I force kill the colinux-daemon
process while it's doing this (taskkill /im
colinux-daemon.exe /f), it doesn't die for several
minutes (i.e. the amount of time usually spent
thrashing) presumably because it's blocked on a huge IO
operation.  

But it doesn't always do this... Sometimes it boots
without unusual disk activity, especially on subsequent
colinux launches before I restart windows again. (That
could just be the effects of disk caching in windows,
but I'm not sure.)

This behaviour happens on both the systems I've tried
colinux on: my dual core athlon 64 X2 nforce 4 box at
work, and my athlon XP 2500 nforce 2 box at home.  On
my work box, the cobd0 image is 20GB (21474836480
bytes); the one on my home box is substantially
smaller, (~8GB IIRC -- I don't have it handy right now.)

----------------------------------------------------------------------

>Comment By: Henry N. (henryn)
Date: 2008-04-17 00:13

Message:
Logged In: YES 
user_id=579204
Originator: NO

A bug in page fault handlner for sys_mount (mount the root filesystem) can
be here the problem. Such similar bugs are fixed in 0.7.3 RC3 and snapshot
devel 0.8.0-20080415, see http://www.colinux.org/snapshots/

----------------------------------------------------------------------

Comment By: Henry N. (henryn)
Date: 2007-03-10 00:03

Message:
Logged In: YES 
user_id=579204
Originator: NO

I'm sorry.  The second line should be:

colinux-debug-daemon.exe -d -p -s prints=31,misc=31,blockdev=31 -f
debug2.xml

----------------------------------------------------------------------

Comment By: Henry N. (henryn)
Date: 2007-03-09 23:49

Message:
Logged In: YES 
user_id=579204
Originator: NO

Hello,

can be the problem the size? 21474836480 bytes = 20GB

Please, before you starts coLinux, run the Debugger
 colinux-debug-daemon.exe -d -p -s prints=31,misc=31 -f debug.xml
or
 colinux-debug-daemon.exe -d -p -s prints=31,misc=31,messages=31 -f
debug2.xml

You can stop the debugger with CTRL-C after beginning the "several
minutes"-Problem.
Than view into the debug.  I'm interesting for the drive geometry
detection.  Please also locate for misterious messages about your drive
there.

The debug2.xml can be very big.  You can remove all the duplicated block
operations after beginning the problems to the end.  But, locate for
problems or some others non normal things in the output.

The format is XML, text is human readable between the "<strings>", open it
with IE.

----------------------------------------------------------------------

Comment By: Andrew Tonner (rakslice)
Date: 2007-01-12 17:56

Message:
Logged In: YES 
user_id=39760
Originator: YES

I switched to an 0.8.0 snapshot (20061212), still using my 21474836480
byte ext3 volume, and I get exactly the same behaviour; it stalls at mount
time and hits the disk for several minutes before continuing.

----------------------------------------------------------------------

Comment By: Andrew Tonner (rakslice)
Date: 2007-01-06 00:57

Message:
Logged In: YES 
user_id=39760
Originator: YES

This doesn't seem to be a filesystem-specific problem. I mkfsed an
identically-sized (21474836480 bytes) ext3 volume, cp -ax'd the contents of
my reiserfs volume across to it, modified the fstab, and then put my
colinux config back so that only the new ext3 volume is being used.  After
a windows restart, when I start colinux, it sits and thrashes for several
minutes at roughly the same place.

dmesg:

Linux version 2.6.11-co-0.6.4 (george@CoDebianDevel) (gcc version 3.4.4
20050314
 (prerelease) (Debian 3.4.3-13)) #1 Mon Jun 19 05:36:13 UTC 2006
520MB LOWMEM available.
On node 0 totalpages: 133120
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 133120 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
Built 1 zonelists
Kernel command line: root=/dev/cobd0
Initializing CPU#0
Setting proxy interrupt vectors
PID hash table entries: 4096 (order: 12, 65536 bytes)
Using cooperative for high-res timesource
Console: colour CoCON 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 523648k/532480k available (1537k kernel code, 0k reserved, 521k
data, 10
8k init, 0k highmem)
Calibrating delay loop... 734.00 BogoMIPS (lpj=3670016)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000
00000001
00000000 00000003
CPU: After vendor identify, caps: 178bfbff e3d3fbff 00000000 00000000
00000001 0
0000000 00000003
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000010 00000001
0000000
0 00000003
CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping 01
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
NET: Registered protocol family 16
devfs: 2004-01-31 Richard Gooch (rg...@at...)
devfs: boot_options: 0x0
cofuse init 0.1 (API version 2.2)
Initializing Cryptographic API
serio: cokbd at irq 1
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
cobd: loaded (max 32 devices)
loop: loaded (max 8 devices)
conet: loaded (max 16 devices)
conet0: initialized
conet1: initialized
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard on cokbd
NET: Registered protocol family 2
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
NET: Registered protocol family 1
NET: Registered protocol family 17
[[Here is where it sits and thrashes for several minutes, then]]
EXT3 FS on cobd0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem).
Freeing unused kernel memory: 108k freed
kjournald starting.  Commit interval 5 seconds
Adding 524280k swap on /dev/cobd1.  Priority:-1 extents:1
EXT3 FS on cobd0, internal journal
[... snip ...]

----------------------------------------------------------------------

Comment By: Ben Voigt (bvoigt)
Date: 2007-01-04 04:51

Message:
Logged In: YES 
user_id=782364
Originator: NO

reiserfs, being a journalled filesystem, usually checks itself very
quickly.  However, by default every 20th boot it forces a full check.  The
frequency of checks can be changed in the reiser metadata... but looking at
reiserfstune I can't find the command for it right now.

----------------------------------------------------------------------

Comment By: Andrew Tonner (rakslice)
Date: 2007-01-04 02:06

Message:
Logged In: YES 
user_id=39760
Originator: YES

I've gone through this sequence of checks, and fsck never encounters any
file system errors, and except for the occasional thrashing for several
minutes when I mount a reiserfs volume nothing unusual happens.

----------------------------------------------------------------------

Comment By: Henry N. (henryn)
Date: 2006-09-13 19:23

Message:
Logged In: YES 
user_id=579204

It can be a limit in one of the block operations from 
colinux.

Please can you boot from an other image.  For sample from 
the small Debian, ArchLinux or Fedora.

Than check the image without mount, with the reiser tools.  
I'm not know the tool, it is like "fsck.ext3 -f /dev/cobd1" 
for an ext3 system.

Than mount this device, unmount it, check again.

Than mount it, write down some, umount it, check again.

An totaly other idea:
I'm afraid, that your shutting down don't complete your 
reiser umount.  Please try to go into runlevel S (single 
user mode without network).  Check, that no other task are 
running and not task shoult need write access to your roor 
filesystem.  Than do this command sequence
  "sync; sleep 1; sync; sleep 3; mount -o remount ro /"
The umount should no give an error.
Now check your root file system device with reiser tools.
If it was clean, shutdown your system and run it again.  
This helps?

----------------------------------------------------------------------

Comment By: Andrew Tonner (rakslice)
Date: 2006-09-12 19:05

Message:
Logged In: YES 
user_id=39760

I had sort of assumed that even kernel space IO happening on
the linux side wouldn't cause the colinux-daemon process to
block for IO like this. But I don't know the internals so I
guess I should stop making assumptions like that. =) 

Still, other things suggest to me that it's not a resierfs
journal replay:

- According to the messages by time the read is happening,
the system hasn't got the part where the journal replay
should happen yet AFAIK

- I fired up Sysinternals' FileMon, and the disk activity is
colinux-daemon doing a series of consecutive (in terms of
offsets) 64k IRP_MJ_READs.  FileMon doesn't show the target
of the reads (it just gives C:) but it must be the volume
file, judging by the eventually huge offsets (I don't have
any other files that big) and the fact that the last read
before colinux continues is right at 20GB (the last read
offset & size lines up with the volume file end position)...
unless its reading something other than a file.

- Also this behaviour happens even when the the last run of
colinux was one that worked fine and was shutdown normally
with halt or shutdown. 

----------------------------------------------------------------------

Comment By: George P Boutwell (gboutwel)
Date: 2006-09-11 02:40

Message:
Logged In: YES 
user_id=30412

Sounds like there is some big disk operation going on in
coLinux, I don't know what that operation is (perhaps
coLinux didn't get shutdown correctly & reiserfs is trying
to replay a long journal?), but you should probably leave it
to complete, instead of trying to kill it.

Make sure that you are shutting down coLinux, by logging in
and running a proper linux shutdown command (halt, poweroff,
shutdown -h now, etc) and not just 'killing' coLinux processes.

----------------------------------------------------------------------

Comment By: Andrew Tonner (rakslice)
Date: 2006-09-10 23:09

Message:
Logged In: YES 
user_id=39760

I should mention that I've removed the initrd section from
my configuration file in case this bug is somehow related to
the known problem with that, but this problem didn't go away.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=622063&aid=1555961&group_id=98788

[coLinux-devel] [ colinux-Bugs-1555961 ] Colinux thrashes on boot

Run Linux on Windows or other OSes, natively.

[coLinux-devel] [ colinux-Bugs-1555961 ] Colinux thrashes on boot