#95 strange cobd problems with large devices

closed-fixed
nobody
None
5
2006-05-26
2006-05-22
Anonymous
No

I had a problem mouting my 2.1 TB big Raid5, which runs
in native Linux without any problem.

First I discovered the ids with Winobj. I entered all 8
discs, 320 GBs per drive, into the config, as cobd1-8.
Then I created the missing devices in /etc, cobd4-8. I
got the following mdadm.conf:

ARRAY /dev/md0 level=raid5 num-devices=8 spares=1
UUID=fdfe13dd:8727f3c6:959155de:2f32c88f
DEVICE /dev/cobd1
DEVICE /dev/cobd2
DEVICE /dev/cobd3
DEVICE /dev/cobd4
DEVICE /dev/cobd5
DEVICE /dev/cobd6
DEVICE /dev/cobd7
DEVICE /dev/cobd8

Then I was able to start the raid 5 with "mdadm -A
/dev/md0". Great:

colinux:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
[multipath] [raid6] [raid10] [faulty]
md0 : active raid5 cobd7[0] cobd1[7] cobd2[6] cobd6[5]
cobd5[4] cobd4[3] cobd8[2] cobd3[1]
2187980032 blocks level 5, 64k chunk, algorithm 2
[8/8] [UUUUUUUU]

unused devices: <none>
colinux:~#

Then I made at next the cryptoloop. This also works, so
I try to mount it. Here the error occures: in dmesg I
see the follow from kernel:

May 20 14:44:24 colinux kernel: ReiserFS: dm-0: found
reiserfs format "3.6" with standard journal
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=81002504, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=81264648, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=81526792, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=81788936, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=82051080, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=82313224, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=82575368, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=82837512, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=83099656, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=83361800, limit=80992768
May 20 14:44:24 colinux kernel: attempt to access
beyond end of device
May 20 14:44:24 colinux kernel: dm-0: rw=0,
want=83623944, limit=80992768
...
May 20 14:46:25 colinux kernel: ReiserFS: dm-0:
warning: sh-2029: reiserfs read_bitmaps: bitmap block
(#10125312) reading failed
May 20 14:46:25 colinux kernel: ReiserFS: dm-0:
warning: jmacd-8: reiserfs_fill_super: unable to read
bitmap

There are some thouthands of this error messages
repeating. After some minutes this messages are
generaged, mount tells that it was unable to mount
cause the bitmap couldn't be loaded.

I tried with dd if this might be a size problem: This
is what I discovered:

In the following, "WORKS" means dd reads one line and
writes one line. Done.

read(0,
"\322\251\2>\36&\325\325c\225,_\24KJ\366:Niu\t\33L>\240"...,
1048576) = 852992
write(1,
"\322\251\2>\36&\325\325c\225,_\24KJ\366:Niu\t\33L>\240"...,
852992) = 852992

In the opposit, "DOESNT'WORK" means dd does read and
read, without stopping, and don't write anything. The
system is loaded, and the colinux does only react slow
until I stop this dd.

read(0,
"\21\30\'\302$C\254q\1\210\371\334\316-\334^\276-\332\375"...,
1048576) = 1048576
read(0,
"\375\231\366\252\v\1\254\307\254!\226\234\26\3117mwt\26"...,
1048576) = 1048576
...

WORKS: strace dd if=/dev/cobd1 of=/dev/null bs=1048576
count=1 skip=305242

DOESN'T WORK: strace dd if=/dev/cobd1 of=/dev/null
bs=1048576 count=1 skip=305243

WORKS: strace dd if=/dev/cobd1 of=/dev/null bs=1048576
count=1 skip=305242

DOESN'T WORK: strace dd if=/dev/cobd1 of=/dev/null
bs=1048576 count=1 skip=305243

The interessting thing is, that the problem is at the
drive directly about 305 GB, the size of the drive. But
on the RAID is it just at about 30 GBs... very strange!

I also looked if dd is just beginning a the beginning
of hdd: This is not the case! The data it reads differ
from the dd without any skip.

If you want to access this system, I have no problem
about that. I use colinux 0.6.3 with a self compiled
kernel 2.6.11, cause you didn't include the lvm/dm
driver, and also not the raid 5 stuff.

Discussion

  • Logged In: NO

    My E-Mail is mmcol@priv.de.

     
  • Logged In: NO

    Arlg. The second WORKS/DONT'WORK should be:

    WORKS: strace dd if=/dev/md0 of=/dev/null bs=1048576 count=1
    skip=39547
    DON'T WORK: dd if=/dev/md0 of=/dev/null bs=1048576 count=1
    skip=39548

    Here some more info:

    colinux:/var/log# uname -a
    Linux colinux 2.6.11 #4 Sat May 20 14:38:50 CEST 2006 i686
    GNU/Linux
    colinux:/var/log#

     
  • Henry N.
    Henry N.
    2006-05-23

    Logged In: YES
    user_id=579204

    Think, nobody can help you with such big array.

    Please create a liddle small array with 8 file images a
    round about 1MB (10MB, 100MB, lowest you can) and try to
    check the same problem on the 1MB lmit now.

    You can also use kernel-special.
    http://www.henrynestler.com/colinux/testing/devel-2.6.12-
    hn/20060507/ There I have enabled MD for an other user.

    Henry

     
  • Logged In: NO

    You can close this bug, it doesn't exist. I just didn't
    enable the support of lange block devices, CONFIG_LBD.

     
    • status: open --> closed-fixed
     
  • Logged In: YES
    user_id=30412

    Thanks for reporting back... As you requested I'm closing
    the bug. I wonder if we should enable large block devices
    support by default...