From: David R. <dr...@gm...> - 2009-04-23 22:01:20
|
Hi, I'm trying to use colinux to access linux software raid-5 partitions from windows. I've got coliunx setup & running fine, and raid-1 works well, but with raid-5 there is an odd problem. I can assemble the raid array, mount the array, list files inside it, and even access very small files (not sure exact size, but things like 1 or 2KB max). But as soon as I try to open/copy/md5sum (just to test) a file that's a couple KB or larger, the command hangs. Colinux is still running fine, there are no errors in the console, it's just the command accessing the files is hung. I cannot even kill the command (cp, md5sum, or whatever I used). I saw there were some fixes for raid-5 in the very latest 0.7.4-rc2, which is what I'm using now, and I have confirmed that it is NOT using mmx or sse checksumming commands (from dmesg): > md: raid1 personality registered for level 1 raid5: measuring checksumming speed 8regs : 4271.200 MB/sec 8regs_prefetch: 4348.400 MB/sec 32regs : 2592.400 MB/sec 32regs_prefetch: 2247.600 MB/sec raid5: using function: 8regs_prefetch (4348.400 MB/sec) raid6: int32x1 865 MB/s raid6: int32x2 839 MB/s raid6: int32x4 667 MB/s raid6: int32x8 622 MB/s raid6: using algorithm int32x1 (865 MB/s) I'm using the Ubuntu-7.10.ext3.2gb root filesystem, and I'm using Xming to access X11 apps. Here's a snippet from my colinux.conf file, I'm using cobd to access drive partitions directly. The colinux wiki is very confusing on this matter, with some conflicting info, depending on which page you view, so it may be there is a better way to access native partitions than "cobd2=\Device\Harddisk1\Partition3" which I am using. > # File contains the root file system. # Download and extract preconfigured file from SF "Images for 2.6". cobd0="Ubuntu-7.10.ext3.2gb.fs" > # Swap device, should be an empty file with 128..512MB. cobd1="swap128.fs" > # large ext3 partition - works fine cobd2=\Device\Harddisk1\Partition3 > # raid-5 - has mentioned problem cobd3=\Device\Harddisk1\Partition1 cobd4=\Device\Harddisk2\Partition1 cobd5=\Device\Harddisk3\Partition1 > # raid-1 - works fine # cobd6=\Device\Harddisk1\Partition2 cobd7=\Device\Harddisk2\Partition2 Please let me know the best way to go about debugging this, if you would like me to run the colinux debugging tool, or if there are other tests I can run to try and solve this issue. Thanks for the help, and thanks for colinux, it's really a wonderful idea. -David R |
From: Paolo M. <pao...@gm...> - 2009-04-24 07:34:38
|
Hi David, are you sure the problem is not related to FPU or MMX ? If you are sure as you have said in your email, probably it is a new problem. I know a little about raid. All I know is about MMX and FPU problem. I use raid to do all test. To semplify our work, can you give us the MINIMAL steps to see the problem ? Bye, Paolo |
From: David R. <dr...@gm...> - 2009-04-24 16:47:57
|
I'm having trouble writing a quick way to reproduce it, it seems the same problem doesn't happen if I just use loop devices. That means the minimal way to reproduce might be for you to have three separate actual partitions to use for a raid-5 test :( I'll see if I can write the minimal steps for reproducing, assuming you create 3 small empty partitions on your disk first. Thanks On Fri, Apr 24, 2009 at 12:34 AM, Paolo Minazzi <pao...@gm...>wrote: > Hi David, > are you sure the problem is not related to FPU or MMX ? > If you are sure as you have said in your email, probably it is a new > problem. > I know a little about raid. > All I know is about MMX and FPU problem. > I use raid to do all test. > To semplify our work, can you give us the MINIMAL steps to see the problem > ? > Bye, > Paolo > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensign option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > coLinux-devel mailing list > coL...@li... > https://lists.sourceforge.net/lists/listinfo/colinux-devel > |
From: Henry N. <hen...@ar...> - 2009-04-24 19:41:54
|
Hello David, David Rorex wrote: > I'm having trouble writing a quick way to reproduce it, it seems the > same problem doesn't happen if I just use loop devices. That means the > minimal way to reproduce might be for you to have three separate actual > partitions to use for a raid-5 test :( > > I'll see if I can write the minimal steps for reproducing, assuming you > create 3 small empty partitions on your disk first. Creating partitions on a disk is a problem for mostly testers. An older Bug #1569947 has some instructions and testing steps. Perhaps you can use these also for your test here? see: "Data corruption on md/raid5 under 0.6.3 and also 0.7.1-hn14" https://sourceforge.net/tracker/?func=detail&aid=1569947&group_id=98788&atid=622063 -- Henry N. |
From: David R. <dr...@gm...> - 2009-04-24 20:26:53
|
On Fri, Apr 24, 2009 at 12:41 PM, Henry Nestler <hen...@ar...> wrote: > Hello David, > > David Rorex wrote: > >> I'm having trouble writing a quick way to reproduce it, it seems the >> same problem doesn't happen if I just use loop devices. That means the >> minimal way to reproduce might be for you to have three separate actual >> partitions to use for a raid-5 test :( >> >> I'll see if I can write the minimal steps for reproducing, assuming you >> create 3 small empty partitions on your disk first. >> > > Creating partitions on a disk is a problem for mostly testers. > An older Bug #1569947 has some instructions and testing steps. Perhaps you > can use these also for your test here? > > see: > "Data corruption on md/raid5 under 0.6.3 and also 0.7.1-hn14" > > https://sourceforge.net/tracker/?func=detail&aid=1569947&group_id=98788&atid=622063 > > Hi Henry, I tried the test in that page, and it does appear to reproduce the problem. My problem is different than his...in his case he has data corruption, seg fault, kernel panic. In my case, the dd commands write the files with no errors reported, but the md5sum hangs on trying to read the first file. 1. Create 3 files of 500MB filled with zeros in windows 2. Assign these files to colinux as block devices using cobd 3. Create the raid (x,y,z are the # of the cobd device set in the colinux config file) modprobe raid5 mdadm --create /dev/md1 /dev/cobdX /dev/cobdY /dev/cobdZ mkfs.ext3 /dev/md1 mkdir /raidtst mount /dev/md1 /raidtst 4. Run test while true; do for i in 1 2 3; do dd_rescue -m 300M /dev/zero /raidtst/f.$i; done for i in 1 2 3; do ls -la /raidtst/f.$i; md5sum /raidtst/f.$i; done for i in 1 2 3; do rm -f /raidtst/f.$i; done done 5. In my case, the last thing I see is the output of "ls -la /raidtst/f.1", the md5sum command never completes. Colinux is not crashed, if I run 'top', I see 75-99% cpu under 'wa' (aka IO Wait). I can still access /raidtst via ls, and copying a small 100 byte file over there works fine & I can read it back. Maybe sometimes it works directly after writing the file and reading it back, due to the data being cached in RAM? It seems like with creating a lot of small test files, it works ok, but creating a very large file exposes the problem quicker. So if you can't reproduce, maybe try increasing the file size even more? I will try running the debug daemon and see if I can get it to show anything that looks useful. Thanks |
From: David R. <dr...@gm...> - 2009-04-25 03:11:14
|
Well, I got the colinux debug daemon thing running, and it produced some logs, but I'm not sure it's telling me anything useful. Maybe I just didn't enable the right debug levels? I uploaded the file here, if you want to look at it: http://davr.org/dbg.xml.bz2 Please let me know if there's anything else I can try. Maybe I should just buy a separate low-specced PC and use that as a standalone linux server, instead of trying to be clever and run it all in one box ;) Thanks On Fri, Apr 24, 2009 at 1:26 PM, David Rorex <dr...@gm...> wrote: > On Fri, Apr 24, 2009 at 12:41 PM, Henry Nestler <hen...@ar...> wrote: > >> Hello David, >> >> David Rorex wrote: >> >>> I'm having trouble writing a quick way to reproduce it, it seems the >>> same problem doesn't happen if I just use loop devices. That means the >>> minimal way to reproduce might be for you to have three separate actual >>> partitions to use for a raid-5 test :( >>> >>> I'll see if I can write the minimal steps for reproducing, assuming you >>> create 3 small empty partitions on your disk first. >>> >> >> Creating partitions on a disk is a problem for mostly testers. >> An older Bug #1569947 has some instructions and testing steps. Perhaps you >> can use these also for your test here? >> >> see: >> "Data corruption on md/raid5 under 0.6.3 and also 0.7.1-hn14" >> >> https://sourceforge.net/tracker/?func=detail&aid=1569947&group_id=98788&atid=622063 >> >> > Hi Henry, > > I tried the test in that page, and it does appear to reproduce the problem. > My problem is different than his...in his case he has data corruption, seg > fault, kernel panic. In my case, the dd commands write the files with no > errors reported, but the md5sum hangs on trying to read the first file. > > 1. Create 3 files of 500MB filled with zeros in windows > 2. Assign these files to colinux as block devices using cobd > 3. Create the raid (x,y,z are the # of the cobd device set in the colinux > config file) > modprobe raid5 > mdadm --create /dev/md1 /dev/cobdX /dev/cobdY /dev/cobdZ > mkfs.ext3 /dev/md1 > mkdir /raidtst > mount /dev/md1 /raidtst > 4. Run test > while true; do > for i in 1 2 3; do dd_rescue -m 300M /dev/zero /raidtst/f.$i; done > for i in 1 2 3; do ls -la /raidtst/f.$i; md5sum /raidtst/f.$i; done > for i in 1 2 3; do rm -f /raidtst/f.$i; done > done > 5. In my case, the last thing I see is the output of "ls -la /raidtst/f.1", > the md5sum command never completes. Colinux is not crashed, if I run 'top', > I see 75-99% cpu under 'wa' (aka IO Wait). I can still access /raidtst via > ls, and copying a small 100 byte file over there works fine & I can read it > back. > > Maybe sometimes it works directly after writing the file and reading it > back, due to the data being cached in RAM? It seems like with creating a lot > of small test files, it works ok, but creating a very large file exposes the > problem quicker. So if you can't reproduce, maybe try increasing the file > size even more? > > I will try running the debug daemon and see if I can get it to show > anything that looks useful. > > Thanks > |