From: Trevor D. (Twylite) <tw...@cr...> - 2012-08-26 20:06:35
|
On 2012/08/25 09:21 AM, Donal K. Fellows wrote: >> Actual problem remains, and has the following characteristics: >> >> (1) try/trap doesn't bytecode compile, so don't be fooled into trusting >> its performance figures. >> (2) [catch] is a _lot_ faster than try/on or try/finally. > > It depends on the case you're dealing with. First off, you need to start > using [tcl::unsupported::disassemble] to see what is actually generated. > Next, [catch] with only a script argument is a specially optimized case > in the [catch] compiler. We generate particularly good bytecode when you > do that. More generally, the [catch] compiler's been thoroughly > optimized over quite a few iterations; the [try] compiler is rather > younger and is also a lot more complex. Thanks - this gives me some good starting points. > Looking through the generated bytecode for your [perf_try_on] and > [perf_try_fin], I'd say that the issue is that pushing the return > options (“pushReturnOpts” in the disassembled code) is always executed, > even when not strictly needed. In part this is due to re-dispatching > through “returnStk” when we're not actually going to return. That means my micro-benchmark is doing the wrong thing. The code that got me started on this performance investigation is very similar to the Tcl implementation of try/on/finally, i.e. I do a 'set code [catch {uplevel 1 body} r opts] ; set code2 [catch {uplevel 1 finally} r2 opts2] ; determine which r/opts to use (or combine them) based on code/code2, then dict incr opts -level 1 and return -options'. All of that was beating an 'uplevel 1 try/finally'. > The [perf_try_trap] case is an odd side case that I didn't compile, a > 'trap' > with an empty match code-list-pattern; since you're not actually ever > failing in the body adding a dummy word lets you compare. (All other > procedure decls are as in the message I'm replying to.) TIP #329 and the man page for [try] note that the handlers are searched in order to find a match. That means that 'trap {}' is semantically equivalent to 'on error' (the man page also notes this), and can probably be treated as such by the compiler. I'll look into it. > > % time {perf_catch_err 10000} 5 > 28007.542999999998 microseconds per iteration > % time {perf_catch_opt 10000} 5 > 99395.6432 microseconds per iteration > % time {perf_try_trap 10000} 5 > 377917.91839999997 microseconds per iteration > % time {perf_try_trap2 10000} 5 > 169406.5764 microseconds per iteration Okay, progress: #1: catch {body} e: time {perf_catch_err 10000} 100 ;# 1410.00 us/iter, factor 1.00 #2: catch {body} e o time {perf_catch_opt 10000} 100 ;# 6570.00 us/iter, factor 4.66 #3: if { 0 == [catch {body} e o] } { catch {finally} } time {perf_catch_opt2 10000} 100 ;# 7500.00 us/iter, factor 5.32 #4: try {body} finally {finally} time {perf_try_fin 10000} 100 ;# 16410.00 us/iter, factor 11.64 In an ideal world the last two should be pretty close. #5: catch {body} e o ; catch {finally} ; return -options $o $e time {perf_catch_fin 10000} 100 ;# 17180.00 us/iter, factor 12.18 #6: try {body} on error {e o} {finally} time {perf_try_on_opt 10000} 100 ;# 16560.00 us/iter, factor 11.74 The difference between #1 and (#2, #3) is 'pushReturnOpts'. The hurt in (#4, #5, #6) comes from 'returnStk'. Can you give me an idea of whether I'm more or less on the right path with the following: (1) I think that bytecode compilation for [try] can be changed to avoid 'returnStk' in all cases except finally: - If no handlers match then the result in the interp is already correct, so no 'returnStk' is required. - If any handler matches then the handler will determine the result, so we don't need a 'returnStk'. - Only a 'finally' requires us to save off and restore the result into to the [try] construct. (2) pushReturnOpts and returnStk end up calling Tcl_GetReturnOptions() and Tcl_SetReturnOptions() respectively, which serialise the interp's returnOpts into a dict. Can I accomplish the same thing using Tcl_SaveInterpState() and Tcl_RestoreInterpState() (assuming that the 'finally' doesn't produce an error)? And part 2: I assume there's no way to call either of these functions without adding a new bytecode opcode? Regards, Twylite |