Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#30 Poor Local Variable Performance, gtm_lvscale Shows Potential

open
nobody
6
2014-02-25
2010-08-17
Bob Isch
No

I was hopeful that V5.4-001 with the gtm_lvscale feature might solve a long standing issue we have with the performance of large local arrays. We have observed that performance suffers dramatically with local arrays when a large number of subscripts are used. We have also observed that if sequential integer subscripts are being used performance can be increased by up to two orders of magnitude if by using negative subscripts and thus inserting into the array in descending order.

The problem is exhibited by the code below:

[code]test3 ; Test Local Variable stuff
;
w "test3> "_$zv,!
d test(320000,100) ; Very Slow
d test(-320000,100) ; Very Fast
;
q ; >>> test3

test(n,ln) ; Local Array Insertion Test
;
n (z,n,ln)
;
s n=$g(n,100) ; Negative values cause the use of subscripts -1..-|n|
s ln=$g(ln,1000)
;
s t0=$h,t0=t0*3600*24+$p(t0,",",2)
s s=$j("",ln),cn1=0
f i=1:1:n,-1:-1:n d
. s lvn(i,"lvn")=s,cn1=cn1+1
i cn1'=$tr(n,"-") w 0/0
;
s t1=$h,t1=t1*3600*24+$p(t1,",",2)
w $j($fn(cn1,","),12)_" records in"_$j(t1-t0,4)_"s. "_$s(n<0:"Descending",1:" Ascending")_" subscripts.",!
;
q ; >>> test
[/code]

Note the dramatic difference between the two modes:

$ export gtm_lvscale=1; mumps test3.m; time mumps -run test3
test3> GT.M V5.4-001 Linux x86
320,000 records in 64s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 1m4.704s
user 1m4.472s
sys 0m0.140s

Note: Reversing the order of the calls to test() has no effect. The n=320000 call still takes over a minute.

Using gtm_lvscale DOES help the ascending subscript version A LOT, however it never approaches the performance of the descending subscript version before both start to get very poor performance above gtm_lvscale=7:

$ for i in 1 2 3 6 7 8 9; do echo "lvscale=$i"; export gtm_lvscale=$i; mumps test3.m; time mumps -run test3; done

lvscale=1
test3> GT.M V5.4-001 Linux x86
320,000 records in 60s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 1m0.313s
user 1m0.028s
sys 0m0.140s

lvscale=2
test3> GT.M V5.4-001 Linux x86
320,000 records in 33s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 0m33.528s
user 0m33.250s
sys 0m0.168s

lvscale=3
test3> GT.M V5.4-001 Linux x86
320,000 records in 22s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 0m21.386s
user 0m21.173s
sys 0m0.172s

lvscale=6
test3> GT.M V5.4-001 Linux x86
320,000 records in 12s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 0m12.500s
user 0m12.181s
sys 0m0.284s

lvscale=7
test3> GT.M V5.4-001 Linux x86
320,000 records in 13s. Ascending subscripts.
320,000 records in 20s. Descending subscripts.
real 0m39.521s
user 0m12.205s
sys 0m0.936s

lvscale=8
test3> GT.M V5.4-001 Linux x86
320,000 records in 12s. Ascending subscripts.
320,000 records in 33s. Descending subscripts.
real 1m7.991s
user 0m9.945s
sys 0m1.232s

lvscale=9
test3> GT.M V5.4-001 Linux x86
320,000 records in 13s. Ascending subscripts.
320,000 records in 79s. Descending subscripts.
real 2m7.021s
user 0m9.509s
sys 0m1.612s
$

The key seems to be in adding nodes in with decreasing (rather than negative) subscript values. "for n=320000:-1:1..." would be quite fast also.

This certainly seems like an unreasonable asymmetry in Local Variable performance.

Best Regards,
-bob

Discussion

  • Bob Isch
    Bob Isch
    2010-09-06

    • priority: 5 --> 6
    • labels: 210456 --> Problems -- other bugs
     
  • Bob Isch
    Bob Isch
    2010-09-06

    Forgot to mention the asymmetry is inverted after lvscale > 7. Probably garbage collection related? Is there a counter for that yet?

     
  • K.S. Bhaskar
    K.S. Bhaskar
    2014-02-25

    Is this still an issue? There were major improvements to local variable performance in V5.4-002.

     
  • Bob Isch
    Bob Isch
    2014-02-25

    You are correct, the test case seems to be much better in V6 at least:

    $ mumps test3.m; time mumps -run test3
    test3> GT.M V6.1-000 Linux x86_64
         320,000 records in   1s.  Ascending subscripts.
         320,000 records in   1s. Descending subscripts.
    
    real    0m2.213s
    user    0m2.062s
    sys     0m0.150s
    

    Similar improvements in the application have not been reported (probably because of existing work-arounds) but we will investigate that separately if necessary.

    Thank you very much for following up on this.
    -bob