From: SourceForge.net <no...@so...> - 2011-07-28 18:21:34
|
Bugs item #3380073, was opened at 2011-07-27 18:32 Message generated for change (Comment added) made by cyan You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3380073&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 47. Bytecode Compiler Group: current: 8.5.10 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Don Porter (dgp) Assigned to: Don Porter (dgp) Summary: declining [expr] performance Initial Comment: A fairly typical calculation proc... set ::RADIAN_CONV 57.2957795131 proc distance { lat1 lon1 lat2 lon2 } { set lat1 [expr {$lat1 / $::RADIAN_CONV}] set lat2 [expr {$lat2 / $::RADIAN_CONV}] set dlon [expr {($lon2 - $lon1) / $::RADIAN_CONV}] return [expr {acos(sin($lat1) * sin($lat2) + \ cos($lat1) * cos($lat2) * cos($dlon)) * \ 6371.0}] } Measured performance trend: % time {distance 41.0 -81.0 42.0 -82.0} 100000 ;# 8.4 1.98998 microseconds per iteration % time {distance 41.0 -81.0 42.0 -82.0} 100000 ;# 8.5 2.17931 microseconds per iteration % time {distance 41.0 -81.0 42.0 -82.0} 100000 ;# 8.6 3.03864 microseconds per iteration ---------------------------------------------------------------------- Comment By: Cyan Ogilvie (cyan) Date: 2011-07-28 20:21 Message: Some interesting results: testing on my machine 8.5 and 8.6 were quite close (closer than the original bug report), but also the times reported by the "time" command disagreed quite a lot with a benchmarker I built to investigate performance in some of my code (and showed 8.6 to be faster than 8.5). Trying a few different versions of the distance proc on 8.5 and 8.6 (8.4 doesn't work on the benchmarker) (apologies for the formatting, I used vimdiff to compare the runs): namespace path {::tcl::mathop} set ::RADIAN_CONV 57.2957795131 proc distance { lat1 lon1 lat2 lon2 } { set lat1 [expr {$lat1 / $::RADIAN_CONV}] set lat2 [expr {$lat2 / $::RADIAN_CONV}] set dlon [expr {($lon2 - $lon1) / $::RADIAN_CONV}] return [expr {acos(sin($lat1) * sin($lat2) + \ cos($lat1) * cos($lat2) * cos($dlon)) * \ 6371.0}] } proc distance2 {lat1 lon1 lat2 lon2} { global RADIAN_CONV set lat1 [/ $lat1 $RADIAN_CONV] set lat2 [/ $lat2 $RADIAN_CONV] set dlon [expr {($lon2 - $lon1) / $RADIAN_CONV}] expr {acos(sin($lat1) * sin($lat2) + \ cos($lat1) * cos($lat2) * cos($dlon)) * \ 6371.0} } proc distance3 {lat1 lon1 lat2 lon2} { set lat1 [/ $lat1 57.2957795131] set lat2 [/ $lat2 57.2957795131] expr { acos(sin($lat1) * sin($lat2) + cos($lat1) * cos($lat2) * cos(($lon2 - $lon1) / 57.2957795131) ) * 6371.0 } } puts "Tcl version: [info patchlevel]" # Initial calls to ensure proc bodies are bytecode compiled, and verify correctness puts "distance: [distance 41.0 -81.0 42.0 -82.0]" puts "distance2: [distance2 41.0 -81.0 42.0 -82.0]" puts "distance3: [distance3 41.0 -81.0 42.0 -82.0]" puts "\n=========================================================" puts "distance (using time): [apply {{} {time {distance 41.0 -81.0 42.0 -82.0} 1000000}}]" puts "distance (using benchmark):" benchmark::test 1000 1000 usec distance 41.0 -81.0 42.0 -82.0 puts "\n=========================================================" puts "distance2 (using time): [apply {{} {time {distance 41.0 -81.0 42.0 -82.0} 1000000}}]" puts "distance2 (using benchmark):" benchmark::test 1000 1000 usec distance2 41.0 -81.0 42.0 -82.0 puts "\n=========================================================" puts "distance3 (using time): [apply {{} {time {distance 41.0 -81.0 42.0 -82.0} 1000000}}]" puts "distance3 (using benchmark):" benchmark::test 1000 1000 usec distance3 41.0 -81.0 42.0 -82.0 Tcl version: 8.5.9 distance: 138.92161836383994 distance2: 138.92161836383994 distance3: 138.92161836383994 ========================================================= distance (using time): 4.02733 microseconds per iteration distance (using benchmark): == testing proc (distance 41.0 -81.0 42.0 -82.0) ===== munge times: 3.432 ms llength times: 1000 descriptive parameters: min: 3.343 usec median: 3.457 usec harmonic_mean: 3.468 usec arithmetic_mean: 3.469 usec max: 4.102 usec sample_stddev: 0.069 usec 3.457 usec per iteration / 288088.464 iterations per second (1000000 iterations in 3.471 seconds) <= 3.344370 1 <= 3.370397 11 ## <= 3.384095 22 #### <= 3.397793 39 ######## <= 3.411491 86 ################# <= 3.425190 94 ################## <= 3.438888 98 ################### <= 3.452586 116 ####################### <= 3.466284 103 #################### <= 3.479983 94 ################## <= 3.493681 79 ############### 257 ################################################## ========================================================= distance2 (using time): 4.05635 microseconds per iteration distance2 (using benchmark): == testing proc (distance2 41.0 -81.0 42.0 -82.0) ===== munge times: 3.415 ms llength times: 1000 descriptive parameters: min: 2.793 usec median: 2.847 usec harmonic_mean: 2.851 usec arithmetic_mean: 2.851 usec max: 3.263 usec sample_stddev: 0.033 usec 2.847 usec per iteration / 350468.313 iterations per second (1000000 iterations in 2.853 seconds) <= 2.793661 1 <= 2.806227 15 ### <= 2.812840 40 ####### <= 2.819454 81 ############### <= 2.826067 79 ############### <= 2.832681 78 ############## <= 2.839294 95 ################# <= 2.845908 83 ############### <= 2.852521 106 ################### <= 2.859135 85 ################ <= 2.865748 65 ############ 272 ################################################## ========================================================= distance3 (using time): 4.123645 microseconds per iteration distance3 (using benchmark): == testing proc (distance3 41.0 -81.0 42.0 -82.0) ===== munge times: 3.505 ms llength times: 1000 descriptive parameters: min: 2.570 usec median: 2.620 usec harmonic_mean: 2.623 usec arithmetic_mean: 2.623 usec max: 2.894 usec sample_stddev: 0.028 usec 2.620 usec per iteration / 380933.661 iterations per second (1000000 iterations in 2.625 seconds) <= 2.570557 1 <= 2.581149 18 ### <= 2.586724 27 #### <= 2.592299 38 ###### <= 2.597873 63 ######### <= 2.603448 108 ################ <= 2.609023 93 ############## <= 2.614597 75 ########### <= 2.620172 90 ############## <= 2.625747 80 ############ <= 2.631321 74 ########### 333 ################################################## Tcl version: 8.6b1.2 distance: 138.92161836383994 distance2: 138.92161836383994 distance3: 138.92161836383994 ========================================================= distance (using time): 3.869864 microseconds per iteration distance (using benchmark): == testing proc (distance 41.0 -81.0 42.0 -82.0) ===== munge times: 2.652 ms llength times: 1000 descriptive parameters: min: 3.679 usec median: 3.812 usec harmonic_mean: 3.816 usec arithmetic_mean: 3.817 usec max: 4.391 usec sample_stddev: 0.061 usec 3.812 usec per iteration / 261862.502 iterations per second (1000000 iterations in 3.819 seconds) <= 3.680225 1 <= 3.703496 8 # <= 3.715744 3 <= 3.727992 17 ## <= 3.740240 33 ### <= 3.752488 43 #### <= 3.764736 55 ###### <= 3.776984 65 ####### <= 3.789232 92 ########## <= 3.801480 86 ######### <= 3.813728 117 ############ 480 ################################################## ========================================================= distance2 (using time): 3.821812 microseconds per iteration distance2 (using benchmark): == testing proc (distance2 41.0 -81.0 42.0 -82.0) ===== munge times: 2.623 ms llength times: 1000 descriptive parameters: min: 3.008 usec median: 3.114 usec harmonic_mean: 3.116 usec arithmetic_mean: 3.117 usec max: 3.862 usec sample_stddev: 0.048 usec 3.114 usec per iteration / 320614.862 iterations per second (1000000 iterations in 3.119 seconds) <= 3.008962 1 <= 3.027237 4 <= 3.036856 6 # <= 3.046474 8 # <= 3.056093 27 ### <= 3.065712 32 ### <= 3.075330 61 ###### <= 3.084949 57 ###### <= 3.094567 109 ########### <= 3.104186 91 ######### <= 3.113804 88 ######### 516 ################################################## ========================================================= distance3 (using time): 3.894228 microseconds per iteration distance3 (using benchmark): == testing proc (distance3 41.0 -81.0 42.0 -82.0) ===== munge times: 2.598 ms llength times: 1000 descriptive parameters: min: 2.654 usec median: 2.741 usec harmonic_mean: 2.746 usec arithmetic_mean: 2.747 usec max: 3.174 usec sample_stddev: 0.043 usec 2.741 usec per iteration / 363791.405 iterations per second (1000000 iterations in 2.749 seconds) <= 2.654868 1 <= 2.671353 3 <= 2.680029 6 # <= 2.688706 25 ### <= 2.697382 44 ##### <= 2.706058 69 ######## <= 2.714735 59 ####### <= 2.723411 81 ########## <= 2.732087 108 ############# <= 2.740764 95 ########### <= 2.749440 83 ########## 426 ################################################## ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2011-07-28 11:33 Message: On my machine (older macbook pro), I get extremely similar performance for 8.5(.2; from AS) and 8.6 (trunk tip, configured for release build) but both are significantly slower than 8.4 (the system installation, 8.4.7). This is particularly the case once using [global] to bring in the constant (a recommended practice). I guess there may be a problem with function calls? 8.5 switched from specialized bytecodes to using general command dispatch, which is lovely and flexible but means that it makes a hot spot (that was hit in 8.6's NRE work) even hotter. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-07-27 21:22 Message: Yet another data point: the overhead of going into and out of the proc seems to play a role: 8.5 flat : 5.24437 microseconds per iteration 8.5 proc: 5.12985 microseconds per iteration 8.6 flat : 6.1804 microseconds per iteration 8.6 proc : 6.81622 microseconds per iteration Done with: proc flatloop { lat1 lon1 lat2 lon2 } { time { set lat1 [expr {$lat1 / $::RADIAN_CONV}] set lat2 [expr {$lat2 / $::RADIAN_CONV}] set dlon [expr {($lon2 - $lon1) / $::RADIAN_CONV}] expr {acos(sin($lat1) * sin($lat2) + \ cos($lat1) * cos($lat2) * cos($dlon)) * \ 6371.0} } 100000 } puts [flatloop 41.0 -81.0 42.0 -82.0] puts [time {distance 41.0 -81.0 42.0 -82.0} 100000] ---------------------------------------------------------------------- Comment By: Jeffrey Hobbs (hobbs) Date: 2011-07-27 21:21 Message: Both TIPs 174 and 232 have an impact in 8.5. I believe the NRE changes are part of the further 8.6 degradation. You will find a thread on tcl-core about the suspected 8.4->8.5->8.6 issues, but you have to compare the latest, as we made some fixes in 8.5 and 8.6 (notably TIP378) that changed some values. You can also see a further analysis of normalized tclbench data between 8.4, 8.5.9 and 8.5.9-new with some of these fixes in place at: https://spreadsheets.google.com/spreadsheet/ccc?key=0AiqUOwK_jbW-dEtlcFo0b09TUTVsRzFIaWU5bWFXSFE&hl=en_US ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2011-07-27 20:49 Message: Agreed with my own observations: it is a case of "barely measurable" effect of [times] overhead. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-07-27 20:38 Message: Also, the :: vs [global] slowdown mentioned in 3302370 also hits orthogonally (roughly 20% in both cases). ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-07-27 20:31 Message: OK, a bit of hard facts proving that this indeed orthogonal to the [times] overhead: Using proc time2 {s n} { proc _tmp {} [format { set n %d while {[incr n -1]>=0} {%s} } $n $s] return [time _tmp] } puts [time {distance 41.0 -81.0 42.0 -82.0} 100000] puts [time2 {distance 41.0 -81.0 42.0 -82.0} 100000] I get the following pairs: ===== 8.5: 4.97726 microseconds per iteration 491821 microseconds per iteration ===== 8.6: 6.82854 microseconds per iteration 663014 microseconds per iteration So, though the T_EOE effect is indeed a bit stronger on 8.6, the bulk of the slowdown is intrinsic. Second data point: disassembling shows exactly the same bytecodes. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2011-07-27 19:46 Message: Nothing so odd about it: 1. [time] iterates over Tcl_EvalObjEx 2. Tcl_EvalObjEx IS much slower on 8.6 3. normal Tcl scripts do not iterate over Tcl_EvalObjEx So many benchmarks overemphasize 8.6's slowdowns, some by huge margins. If this is one of those or not, I do not know. My guess is that it will be between 'measurably' and 'quite a bit' ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-07-27 19:38 Message: Also, I'd check whether the literals shimmer at each turn of the [times] loop. I remember a chat session with miguel, where we discovered that odd things happened to multi-iteration-[times], and didn't with a regular [for] loop in a proc. ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2011-07-27 18:42 Message: Chat excerpts... aku: Might be interesting to see how the bytecode of 8.4/5/6 differ for the expressions. dgp: Some bisecting to isolate things might be interesting, if the slowdown happened in discrete steps. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3380073&group_id=10894 |