From: David R. <dr...@gm...> - 2009-04-25 03:11:14
|
Well, I got the colinux debug daemon thing running, and it produced some logs, but I'm not sure it's telling me anything useful. Maybe I just didn't enable the right debug levels? I uploaded the file here, if you want to look at it: http://davr.org/dbg.xml.bz2 Please let me know if there's anything else I can try. Maybe I should just buy a separate low-specced PC and use that as a standalone linux server, instead of trying to be clever and run it all in one box ;) Thanks On Fri, Apr 24, 2009 at 1:26 PM, David Rorex <dr...@gm...> wrote: > On Fri, Apr 24, 2009 at 12:41 PM, Henry Nestler <hen...@ar...> wrote: > >> Hello David, >> >> David Rorex wrote: >> >>> I'm having trouble writing a quick way to reproduce it, it seems the >>> same problem doesn't happen if I just use loop devices. That means the >>> minimal way to reproduce might be for you to have three separate actual >>> partitions to use for a raid-5 test :( >>> >>> I'll see if I can write the minimal steps for reproducing, assuming you >>> create 3 small empty partitions on your disk first. >>> >> >> Creating partitions on a disk is a problem for mostly testers. >> An older Bug #1569947 has some instructions and testing steps. Perhaps you >> can use these also for your test here? >> >> see: >> "Data corruption on md/raid5 under 0.6.3 and also 0.7.1-hn14" >> >> https://sourceforge.net/tracker/?func=detail&aid=1569947&group_id=98788&atid=622063 >> >> > Hi Henry, > > I tried the test in that page, and it does appear to reproduce the problem. > My problem is different than his...in his case he has data corruption, seg > fault, kernel panic. In my case, the dd commands write the files with no > errors reported, but the md5sum hangs on trying to read the first file. > > 1. Create 3 files of 500MB filled with zeros in windows > 2. Assign these files to colinux as block devices using cobd > 3. Create the raid (x,y,z are the # of the cobd device set in the colinux > config file) > modprobe raid5 > mdadm --create /dev/md1 /dev/cobdX /dev/cobdY /dev/cobdZ > mkfs.ext3 /dev/md1 > mkdir /raidtst > mount /dev/md1 /raidtst > 4. Run test > while true; do > for i in 1 2 3; do dd_rescue -m 300M /dev/zero /raidtst/f.$i; done > for i in 1 2 3; do ls -la /raidtst/f.$i; md5sum /raidtst/f.$i; done > for i in 1 2 3; do rm -f /raidtst/f.$i; done > done > 5. In my case, the last thing I see is the output of "ls -la /raidtst/f.1", > the md5sum command never completes. Colinux is not crashed, if I run 'top', > I see 75-99% cpu under 'wa' (aka IO Wait). I can still access /raidtst via > ls, and copying a small 100 byte file over there works fine & I can read it > back. > > Maybe sometimes it works directly after writing the file and reading it > back, due to the data being cached in RAM? It seems like with creating a lot > of small test files, it works ok, but creating a very large file exposes the > problem quicker. So if you can't reproduce, maybe try increasing the file > size even more? > > I will try running the debug daemon and see if I can get it to show > anything that looks useful. > > Thanks > |