Menu

#140 Silly values on web status interface during 3GB raid1 resync (overflow?)

Accepted
nobody
None
Medium
Defect
2014-07-09
2013-05-01
Anonymous
No

Originally created by: brian.br...@gmail.com

What steps will reproduce the problem?
1. Make a dual 3GB raid 1.
2. It starts resyncing.
3. Look at the web interface.

What is the expected output? What do you see instead?

I get on the web interface:

RAID
Dev.    Capacity    Level    State    Status    Action    Done    ETA
md0    2794.0 GB    raid1    active    OK     resync   
107%    -10.7min

This corresponds (roughly) to /proc/mdstat:

$ cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda2[1] sdb2[0]
      2929740112 blocks super 1.2 [2/2] [UU]
      [======>..............]  resync = 30.2% (885531648/2929740112) finish=300.7min speed=113284K/sec
      bitmap: 16/22 pages [64KB], 65536KB chunk

unused devices: <none>
$

What Alt-F version are you using? Have you flashed it?

Alt-F 0.1RC3  Flashed.

What is the box hardware revision level? A1, B1 or C1? (look at the label
at the box bottom)
N/A

What is your disk configuration? Standard, RAID (what level)...

Raid1

What operating system are you using on your computer? Using what browser?

Chrome linux.

Please provide any additional information below.

Discussion

  • Anonymous

    Anonymous - 2013-05-01

    Originally posted by: whoami.j...@gmail.com

    It is an arithmetic issue (big numbers with 3TB disks, probably awk %d should be replaced with a %f).

    The issue must be at /usr/www/cgi-bin/status.cgi, at around line 295 (where $mdev is md0 in your case)

    compl=$(drawbargraph $(awk '{printf "%d", $1 * 100 / $3}' /sys/block/$mdev/md/sync_completed))
    speed=$(cat /sys/block/$mdev/md/sync_speed)
    exp=$(awk '{printf "%.1fmin", ($3 - $1) * 512 / 1000 / '$speed' / 60}' /sys/block/$mdev/md/sync_completed 2> /dev/null)

    If it is still resync can you please post the output of

    cat /sys/block/md0/md/sync_completed
    cat /sys/block/md0/md/sync_speed

    Thanks

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-05-01

    Originally posted by: brian.br...@gmail.com

    Still going. Web currently says md0 2794.0 GB raid1 active OK resync 20% 142.2min
    which I would think was ok except /proc/mdstat sadly disagrees :-(

    $ cat /proc/mdstat
    Personalities : [linear] [raid1]
    md0 : active raid1 sda2[1] sdb2[0]
          2929740112 blocks super 1.2 [2/2] [UU]
          [===============>.....]  resync = 78.6% (2303036672/2929740112) finish=108.9min speed=95859K/sec
          bitmap: 6/22 pages [24KB], 65536KB chunk

    unused devices: <none>
    $ cat /sys/block/md0/md/sync_completed
    310542592 / 1564512928
    $ cat /sys/block/md0/md/sync_speed
    89338

    awk saying 20% is about right for that sync_completed numbers.

    And I just manually tried some big numbers in awk, and it doesn't seem to overflow, so I think awk must be using fp or longs for those calculations already.

    So looks like we have a kernel overflow issue here...

    Yeah, I just had a look at kernel source md.c, sync_completed_show function.
    It uses unsigned long in 2.6.25, and has been fixed to long long sometime since.

    Might be wise to change the web script to parse it out of /proc/mdstat instead!

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-05-24

    Originally posted by: whoami.j...@gmail.com

    /proc/mdstat contains very different type/formated information, it is difficult to parse it.

    I'm trying to port Alt-F to a more recent kernel, 3.8.11, and perhaps that will solve the issue.

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-05-25

    Originally posted by: brian.br...@gmail.com

    From what I saw in the kernel source, it was definitely fixed by that
    version.

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-06-28

    Originally posted by: crazymac...@gmail.com

    I can confirm this on my recently flashed DLINK DNS-323 running Alt-F 0.1RC3. I built a 2x3TB array Raid1 and am seeing he same here - Currently:

    RAID
    Dev.     Capacity     Level    State     Status    Action     Done    ETA
    md0     2794.0 GB     raid1     active     OK     resync     210%    -152.4min

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-06-29

    Originally posted by: whoami.j...@gmail.com

    (No comment was entered for this change.)

    Status: Accepted

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-09-02

    Originally posted by: stephane...@gmail.com

    Same here, see my ticket on sourceforge for details:  https://sourceforge.net/p/alt-f/tickets/10/

    RAID
    Dev.     Capacity     Level    State     Status    Action     Done    ETA
    md0     2794.0 GB     raid1     active     OK     resync     158%    -6517.8min

    how can I make sure this is a false positive and that the resyncing is actually done?

    Stephane

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-09-02

    Originally posted by: stephane...@gmail.com

    I tried the same commands on my box. our problem looks similar

    $ cat /sys/block/md0/md/sync_completed
    2564345088 / 1564512928

    $ cat /sys/block/md0/md/sync_speed
    150809

    $ cat /proc/mdstat
    Personalities : [linear] [raid1]
    md0 : active raid1 sda2[1] sdb2[0]
          2929740112 blocks super 1.2 [2/2] [UU]
          [========>............]  resync = 43.8% (1285270528/2929740112) finish=189.7min speed=144439K/sec
          bitmap: 13/22 pages [52KB], 65536KB chunk

    unused devices: <none>

     

    Last edit: Anonymous 2017-11-04
  • Anonymous

    Anonymous - 2013-09-02

    Originally posted by: brian.br...@gmail.com

    Don't worry, cat /proc/mdstat is telling the truth about what's happening, its only the other numbers used by the web interface that are overflowing. (We found there was a fixed kernel bug)

     

    Last edit: Anonymous 2017-11-04
  • João Cardoso

    João Cardoso - 2014-07-09

    Is this fixed with RC4? No 3/4TG disks here.

     

Log in to post a comment.