From: miguel s. <mig...@gm...> - 2011-03-25 13:49:33
|
I can now document my misgivings about tclbench: it is too noisy. It may be suitable for some comparisons, but its value as a guide for optimization work is at least doubtful. A - The experiment I ran tclbench from cvs on core.tcl.tk: * against five copies of THE SAME tclsh (trunk --disable-shared) * tclbench from CVS * option "-repeat 5" Results attached; in summary: - there are differences of up to 16% between the min and max values - 44 benchmarks out of 644 show differences above 5% (see below) - the worst happened in benchmarks that do not involve system calls - 5 repeats are not enough to filter out the noise - it took almost an hour to run these 5 repeats B - Now what? One advantage of tclbench is that it is portable (we get unreliable results on all platforms). We would need a good and fast way of running reliable benchmarks on at least one platform. I would propose that "we" (who?) develop something to run on linux, and have it installed on core.tcl.tk. It could share most of the code of tclbench, just replacing [time] with something based on linux performance counters. It would possibly be better to use the instruction counter rather than cycles or task-clock-msecs, as it shows a much smaller run-to-run variation. What do you guys think? ------- The top differences were 16% 598 STR string compare uni long 11% 285 LIST insert an item at "end" 10% 294 LIST lset foreach l 10% 295 LIST lset foreach list 9% 055 DATA create in a list 9% 286 LIST insert an item at middle 9% 305 LIST replace first el with multiple 9% 308 LIST replace last el with multiple 9% 311 LIST replace middle element 9% 551 STR repeat, abcdefghij * 1000 9% 552 STR replace, equal replacement 8% 287 LIST insert an item at start 8% 301 LIST remove first element 8% 302 LIST remove in mixed list 8% 303 LIST remove last element 8% 348 MAP string 2 val -nocase 8% 501 STR append 8% 510 STR append (10KB + 1KB) 8% 563 STR reverse iter/append, 100 c 7% 304 LIST remove middle element 7% 307 LIST replace in mixed list 7% 309 LIST replace last element 7% 310 LIST replace middle el with multiple 7% 312 LIST replace range 7% 564 STR reverse iter/append, 100 uc 7% 565 STR reverse iter/append, 400 c 6% 069 EVAL list cmd and pure lists 6% 212 KLIST shuffle0 llength 1 6% 325 LOOP for, iterate list 6% 373 MTHD inline call 6% 534 STR match, exact (success) 6% 566 STR reverse iter/append, 400 uc 5% 112 FILE exists! tmpfile (str) 5% 204 IF if true al 5% 254 KLIST shuffle6 llength 10000 5% 262 LIST concat CONCAT 2x1000 5% 306 LIST replace first element 5% 313 LIST reverse core 5% 327 LOOP foreach, iterate list 5% 546 STR range, index 100..200 of 4010 5% 591 STR streq bin long neqS 5% 596 STR string compare long (same obj) 5% 632 VAR incr global var 1000x 5% 636 VAR mset (foreach) |
From: Larry M. <lm...@bi...> - 2011-03-25 13:54:47
|
On Fri, Mar 25, 2011 at 10:49:16AM -0300, miguel sofer wrote: > I can now document my misgivings about tclbench: it is too noisy. It may > be suitable for some comparisons, but its value as a guide for > optimization work is at least doubtful. Try modifying it to do N runs and report the lowest number. The rational for this is that you are at the mercy of the os and the hardware and other users, all of which can make your numbers take longer. By definition, the lowest one is the closest to the real number for your purposes. I'd also suggest you do this as a post processing step, so you run 4 copies in parallel, record the results, and take the min. That way you use 4/6 cores on core.tcl.tk leaving 2 for fossil et al. And it's no slower. Take the test that varied the most and try this technique on just it and see if you get stable numbers. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: miguel s. <mig...@gm...> - 2011-03-25 13:57:22
|
On 03/25/2011 10:54 AM, Larry McVoy wrote: > On Fri, Mar 25, 2011 at 10:49:16AM -0300, miguel sofer wrote: >> I can now document my misgivings about tclbench: it is too noisy. It may >> be suitable for some comparisons, but its value as a guide for >> optimization work is at least doubtful. > > Try modifying it to do N runs and report the lowest number. The rational > for this is that you are at the mercy of the os and the hardware and other > users, all of which can make your numbers take longer. By definition, > the lowest one is the closest to the real number for your purposes. That thing took 5 runs and reported the lowest, that's what the "-repeat 5" option does. |
From: Twylite <tw...@cr...> - 2011-03-25 14:27:20
|
Hi, On 2011/03/25 03:49 PM, miguel sofer wrote: > I can now document my misgivings about tclbench: it is too noisy. It > may be suitable for some comparisons, but its value as a guide for > optimization work is at least doubtful. > > Results attached; in summary: > - there are differences of up to 16% between the min and max values > - 44 benchmarks out of 644 show differences above 5% (see below) The situation is worse on Windows, where - despite using QueryPerformanceCounters() - the resolution of [time] appears to be insufficiently fine, especially on fast systems. I have a hacked tclbench with better autoscaling to compensate ... I suppose I should talk to someone about getting that back into the main sources? Moreover, while tclbench should be a good guide for keyhole optimisation and maintaining the performance of individual functions, it does not provide any reasonable way to compare the overall performance of different builds of Tcl. How one may compute such a metric is a complex issue for another thread, but the weighting (or lack thereof) of individual tests means for example that the claimed 30-35% performance hit of the NRE shows up as a 1 second difference on a 134 second run time on my desktop. > We would need a good and fast way of running reliable benchmarks on at > least one platform. > > I would propose that "we" (who?) develop something to run on linux, > and have it installed on core.tcl.tk. One problem with the "one platform" approach is that it can lead quite quickly to optimisations (or worse, designs) that favour a single compiler at the expense of others. > It could share most of the code of tclbench, just replacing [time] > with something based on linux performance counters. It would possibly > be better to use the instruction counter rather than cycles or > task-clock-msecs, as it shows a much smaller run-to-run variation. Isn't this more an argument to extend [time] to take a flag or additional argument indicating the desired resolution of the answer (cycles, usec, etc.)? Then portability is maintained. Regards, Twylite |
From: miguel s. <mig...@gm...> - 2011-03-25 14:42:23
|
On 03/25/2011 11:25 AM, Twylite wrote: > One problem with the "one platform" approach is that it can lead quite > quickly to optimisations (or worse, designs) that favour a single > compiler at the expense of others. It's a tool, it can be misused and we should indeed be careful. But my wish is for AT LEAST ONE platform where I can get reliable results. Of course the best would be all platforms, but one is infinitely superior to none (as today). Until and unless someone comes up with a revolutionary new way of running Tcl, perf work is mostly chipping away at 5% here and 3% there. So we need a good measuring tool. If and when a patch produces a 50% improvement, we'll notice alright. Needed are quick and good answers to "does this two line patch have a measurable impact on proc dispatch speed?": - quick as in "I get my answer within 30 seconds"; tclbench or even [time] alone give me this much, for a well crafted test script or two - good as in "if I see a 3% speedup, I know it will be no slower than the original". Or "if I see a 3% slowdown, I know it will not be worse than 5% in reality". |
From: Larry M. <lm...@bi...> - 2011-03-25 14:47:35
|
On Fri, Mar 25, 2011 at 11:42:11AM -0300, miguel sofer wrote: > On 03/25/2011 11:25 AM, Twylite wrote: > > One problem with the "one platform" approach is that it can lead quite > > quickly to optimisations (or worse, designs) that favour a single > > compiler at the expense of others. > > It's a tool, it can be misused and we should indeed be careful. > > But my wish is for AT LEAST ONE platform where I can get reliable > results. Of course the best would be all platforms, but one is > infinitely superior to none (as today). > > Until and unless someone comes up with a revolutionary new way of > running Tcl, perf work is mostly chipping away at 5% here and 3% there. > So we need a good measuring tool. If and when a patch produces a 50% > improvement, we'll notice alright. > > Needed are quick and good answers to "does this two line patch have a > measurable impact on proc dispatch speed?": > - quick as in "I get my answer within 30 seconds"; tclbench or even > [time] alone give me this much, for a well crafted test script or two > - good as in "if I see a 3% speedup, I know it will be no slower than > the original". Or "if I see a 3% slowdown, I know it will not be worse > than 5% in reality". One possible way might be to have some microbenchmarks and use perf counters to measure cache misses. It's a strange way to think about it but in general, less caches == better. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: miguel s. <mig...@gm...> - 2011-03-25 14:44:59
|
On 03/25/2011 11:25 AM, Twylite wrote: >> It could share most of the code of tclbench, just replacing [time] >> with something based on linux performance counters. It would possibly >> be better to use the instruction counter rather than cycles or >> task-clock-msecs, as it shows a much smaller run-to-run variation. > Isn't this more an argument to extend [time] to take a flag or > additional argument indicating the desired resolution of the answer > (cycles, usec, etc.)? Then portability is maintained. As a matter of fact, I would prefer extending [clock] or better a completely new command. Then a replacement for [time] can be scripted easily. |
From: Jeff H. <je...@ac...> - 2011-03-25 15:46:19
|
On 25/03/2011 7:25 AM, Twylite wrote: > The situation is worse on Windows, where - despite using > QueryPerformanceCounters() - the resolution of [time] appears to be > insufficiently fine, especially on fast systems. I have a hacked > tclbench with better autoscaling to compensate ... I suppose I should > talk to someone about getting that back into the main sources? Yes, they are in tcllib/tclbench, still in CVS for the moment. Patches welcome, or I can give you commit access (note that Andreas plans to move tcllib in whole or pieces to fossil in the near future). > Moreover, while tclbench should be a good guide for keyhole optimisation > and maintaining the performance of individual functions, it does not > provide any reasonable way to compare the overall performance of > different builds of Tcl. How one may compute such a metric is a complex > issue for another thread, but the weighting (or lack thereof) of > individual tests means for example that the claimed 30-35% performance > hit of the NRE shows up as a 1 second difference on a 134 second run > time on my desktop. You are making sure not to use autoscale when you look at the overall number, right? autoscale's entire purpose is to have runs that "level out" to about the same overall runtime, so only the individual metrics then matter. Better tools for capturing and analyzing individual metrics would be nice. I think the biggest win would be a times(2) (cpu time) based variant to Tcl's 'time'. The other tools would be nice, but tclbench employs the best facilities for smoothing wall time issues now, and that still isn't enough on some setups (though I do recommend -repeat 10 or better for larger range of smoothing). Jeff |
From: Twylite <tw...@cr...> - 2011-03-25 16:04:58
|
Hi, On 2011/03/25 05:44 PM, Jeff Hobbs wrote: >> individual tests means for example that the claimed 30-35% performance >> hit of the NRE shows up as a 1 second difference on a 134 second run >> time on my desktop. > > You are making sure not to use autoscale when you look at the overall > number, right? autoscale's entire purpose is to have runs that "level > out" to about the same overall runtime, so only the individual metrics > then matter. Yeah ;) But many of the tests that would show up bytecode performance differences (as opposed to say a difference in the implementation of a particular string or file function) run in under 10ms, which means a single context switch can make a 2x difference in measured performance, and even taking all of those tests together there is a minimal overall impact when compared with say the time taken by the file or lsearch tests. Regards, Twylite |
From: Jeff H. <je...@ac...> - 2011-03-25 16:11:13
|
On 25/03/2011 9:02 AM, Twylite wrote: > On 2011/03/25 05:44 PM, Jeff Hobbs wrote: >>> individual tests means for example that the claimed 30-35% performance >>> hit of the NRE shows up as a 1 second difference on a 134 second run >>> time on my desktop. >> >> You are making sure not to use autoscale when you look at the overall >> number, right? autoscale's entire purpose is to have runs that "level >> out" to about the same overall runtime, so only the individual metrics >> then matter. > Yeah ;) But many of the tests that would show up bytecode performance > differences (as opposed to say a difference in the implementation of a > particular string or file function) run in under 10ms, which means a > single context switch can make a 2x difference in measured performance, > and even taking all of those tests together there is a minimal overall > impact when compared with say the time taken by the file or lsearch tests. Yeah, lost in the noise. Any ideas on making those less "fine"? I've considered just repeating the same chunk of code more times in the proc. You'll see that expr.bench has that a bit already when I moved to faster machines and got lower numbers (the lower the number, the higher the noise impact), e.g: proc expr-streq {a b} { expr {$a == $b}; expr {$a == $b}; expr {$a == $b}; expr {$a == $b} } I could repeat even more ... but I don't think that's the ideal direction for writing tests. Jeff |
From: Donal K. F. <don...@ma...> - 2011-03-29 09:42:59
Attachments:
donal_k_fellows.vcf
|
On 27/03/2011 17:16, Twylite wrote: > The best idea I have at the moment is to separate "engine" tests from > "feature" tests. I am defining "engine" tests as those involving Tcl's > dispatch mechanism, stack frame, bytecode execution and Tcl_Obj/memory > management. "Feature" tests by comparison would be individual C-coded > commands that do a significant amount of work (and are not bytecoded). There are actually three types of benchmarks. I'd call them "framework" (what you identify as "engine"), "feature" and "system" benchmarks. The system benchmarks check groups of commands used together in common patterns, and are important for keeping perspective. :-) They're also measuring the speed that users of Tcl are most likely to notice; if I was going to claim a headline speed increase, I'd want it backed up by system benchmark or few. To make this concrete, I'd identify the GCCont group of benchmarks as being of the system type. I suspect that the different types of benchmarks need different tuning parameters. Framework benchmarks are typically microbenchmarks and need great care to overcome the noise, whereas feature benchmarks and system benchmarks run for much longer and so have less low-level variability. Donal. |
From: miguel s. <mig...@gm...> - 2011-03-29 12:35:53
|
On 03/29/2011 06:42 AM, Donal K. Fellows wrote: > On 27/03/2011 17:16, Twylite wrote: >> The best idea I have at the moment is to separate "engine" tests from >> "feature" tests. I am defining "engine" tests as those involving Tcl's >> dispatch mechanism, stack frame, bytecode execution and Tcl_Obj/memory >> management. "Feature" tests by comparison would be individual C-coded >> commands that do a significant amount of work (and are not bytecoded). > > There are actually three types of benchmarks. I'd call them "framework" > (what you identify as "engine"), "feature" and "system" benchmarks. The > system benchmarks check groups of commands used together in common > patterns, and are important for keeping perspective. :-) They're also > measuring the speed that users of Tcl are most likely to notice; if I > was going to claim a headline speed increase, I'd want it backed up by > system benchmark or few. To make this concrete, I'd identify the GCCont > group of benchmarks as being of the system type. > > I suspect that the different types of benchmarks need different tuning > parameters. Framework benchmarks are typically microbenchmarks and need > great care to overcome the noise, whereas feature benchmarks and system > benchmarks run for much longer and so have less low-level variability. Interesting take, thank you. In my measurements, trunk (column 6, normalized against core-8-5) happens to be faster than core-8-5 (column 8) at the GCCont group of benchmarks: http://core.tcl.tk/tcl/artifact/e40a3771cb94430b2cfffd8aecc71d4f889dcf0a |
From: Twylite <tw...@cr...> - 2011-03-27 16:16:41
|
On 2011/03/25 06:09 PM, Jeff Hobbs wrote: > On 25/03/2011 9:02 AM, Twylite wrote: >> Yeah ;) But many of the tests that would show up bytecode performance >> differences (as opposed to say a difference in the implementation of a >> particular string or file function) run in under 10ms, which means a >> single context switch can make a 2x difference in measured performance, >> and even taking all of those tests together there is a minimal overall >> impact when compared with say the time taken by the file or lsearch tests. > Yeah, lost in the noise. Any ideas on making those less "fine"? I've > considered just repeating the same chunk of code more times in the proc. I'm not sure that repeating code chunks helps significantly though, as you still end up with a situation where code and data is more likely (than a less predictable program) to be in L1/L2 cache, leading to an optimistic microbenchmark. The best idea I have at the moment is to separate "engine" tests from "feature" tests. I am defining "engine" tests as those involving Tcl's dispatch mechanism, stack frame, bytecode execution and Tcl_Obj/memory management. "Feature" tests by comparison would be individual C-coded commands that do a significant amount of work (and are not bytecoded). By writing reasonably complex procs (and sub-procedures) that exercise the "engine" in a broad manner (exercising all bytecodes, in similar amounts) we can see whether engine-level changes are having a performance impact. Such procs would avoid calling non-bytecoded C-coded commands that do any significant amount of work. I imagine such a test would involve a computationally and data intensive task decomposed into several procs, some of which are called in tight loops. At the same time the test must no go out of its way to confound the bytecode compiler, or improvements in that area - which could have real-world benefits - may not be noticed. A glance at the list of bytecodes in tclCompile.c suggests that we should do a lot of simple string, list, dict and math ops -- perhaps something like inserting the outputs of a simple hashing function into a tree, then traversing the tree in some order. Microbenchmarks could then be used to test "features" as well as individual byte-coded commands, to check that a change doesn't introduce a command-specific performance regression. My 2c - Twylite |
From: Kevin K. <ke...@ac...> - 2011-03-25 17:19:28
|
On 03/25/2011 10:25 AM, Twylite wrote: > Hi, > > On 2011/03/25 03:49 PM, miguel sofer wrote: >> I can now document my misgivings about tclbench: it is too noisy. It >> may be suitable for some comparisons, but its value as a guide for >> optimization work is at least doubtful. >> >> Results attached; in summary: >> - there are differences of up to 16% between the min and max values >> - 44 benchmarks out of 644 show differences above 5% (see below) > > The situation is worse on Windows, where - despite using > QueryPerformanceCounters() - the resolution of [time] appears to be > insufficiently fine, especially on fast systems. I have a hacked > tclbench with better autoscaling to compensate ... I suppose I should > talk to someone about getting that back into the main sources? For timing studies, I'd be entirely willing to entertain a TIP to provide a command that asks for accounted user and system CPU time. It wasn't really feasible until we brought Win9x to end-of-life, but the whole NT line provides the necessary instrumentation. I'm by no means sure that will help, because every time we suffer a context switch, we also lose some part of the working set from caches and have to pay the cost of restoring it. For that reason I suspect that timings even of CPU consumption will remain quite noisy. (That's why, when I get into performance optimization, it tends to be on big stinkers like TclDoubleDigits. I could measure that one with eyeball and wristwatch. :) ) -- 73 de ke9tv/2, Kevin -- 73 de ke9tv/2, Kevin |
From: Alexandre F. <ale...@gm...> - 2011-03-25 18:49:12
|
On Fri, Mar 25, 2011 at 6:19 PM, Kevin Kenny <ke...@ac...> wrote: > > I could measure that one with eyeball and wristwatch. :) ) May I respectfully observe that given the proper # of iterations (hence the autoscale feature in tclbench), _anything_ can be profiled with eyeball and wristwatch ? Once scaling is properly done, the elapsed time is just as reliable as CPU time (yes, I know the difference ;-). Frankly, I don't buy the idea that tclbench is a lost soul. If its autoscaling parameters are not perfect, let's improve them. If [time] has an overhead worse than a scripted [while], let's fix it. If another source of noise remains, let's track it, and kill it. Let's be tenacious. -Alex |
From: Karl L. <kar...@gm...> - 2011-03-25 19:30:20
|
My tclbsd package provides access to the getrusage() call on systems equipped with it, including total amount of time spent executing in user mode, system mode (in clock ticks, converted to double-precision seconds), resident set size, page faults, swaps, blocks in and out, messages sent and received, context switches, etc. I don't know that it would be useful even as a jumping off point but if anyone would like to take a peek it's at https://github.com/flightaware/tclbsd On 3/25/11 12:19 PM, "Kevin Kenny" <ke...@ac...> wrote: >On 03/25/2011 10:25 AM, Twylite wrote: >> Hi, >> >> On 2011/03/25 03:49 PM, miguel sofer wrote: >>> I can now document my misgivings about tclbench: it is too noisy. It >>> may be suitable for some comparisons, but its value as a guide for >>> optimization work is at least doubtful. >>> >>> Results attached; in summary: >>> - there are differences of up to 16% between the min and max values >>> - 44 benchmarks out of 644 show differences above 5% (see below) >> >> The situation is worse on Windows, where - despite using >> QueryPerformanceCounters() - the resolution of [time] appears to be >> insufficiently fine, especially on fast systems. I have a hacked >> tclbench with better autoscaling to compensate ... I suppose I should >> talk to someone about getting that back into the main sources? > >For timing studies, I'd be entirely willing to entertain a TIP to >provide a command that asks for accounted user and system CPU time. >It wasn't really feasible until we brought Win9x to end-of-life, >but the whole NT line provides the necessary instrumentation. > >I'm by no means sure that will help, because every time we suffer >a context switch, we also lose some part of the working set from >caches and have to pay the cost of restoring it. For that reason >I suspect that timings even of CPU consumption will remain quite >noisy. > >(That's why, when I get into performance optimization, it tends to >be on big stinkers like TclDoubleDigits. I could measure that one >with eyeball and wristwatch. :) ) > >-- >73 de ke9tv/2, Kevin > >-- >73 de ke9tv/2, Kevin > >-------------------------------------------------------------------------- >---- >Enable your software for Intel(R) Active Management Technology to meet the >growing manageability and security demands of your customers. Businesses >are taking advantage of Intel(R) vPro (TM) technology - will your >software >be a part of the solution? Download the Intel(R) Manageability Checker >today! http://p.sf.net/sfu/intel-dev2devmar >_______________________________________________ >Tcl-Core mailing list >Tcl...@li... >https://lists.sourceforge.net/lists/listinfo/tcl-core |
From: Andreas K. <and...@ac...> - 2011-03-25 20:55:10
|
On 3/25/2011 12:30 PM, Karl Lehenbauer wrote: > My tclbsd package provides access to the getrusage() call on systems > equipped with it, including total amount of time spent executing in user > mode, system mode (in clock ticks, converted to double-precision seconds), > resident set size, page faults, swaps, blocks in and out, messages sent > and received, context switches, etc. > > I don't know that it would be useful even as a jumping off point but if > anyone would like to take a peek it's at > https://github.com/flightaware/tclbsd For quick experimentation, without having to compile it yourself, see % teacup list --all-platforms bsd entity name version platform ------- ---- ------- ---------------------- package BSD 1.4 linux-glibc2.3-ix86 package BSD 1.4 linux-glibc2.3-x86_64 package BSD 1.4 macosx-universal package BSD 1.4 macosx10.5-i386-x86_64 ------- ---- ------- ---------------------- 4 entities found -- Andreas Kupries Senior Tcl Developer ActiveState, The Dynamic Language Experts P: 778.786.1122 F: 778.786.1133 and...@ac... http://www.activestate.com Get insights on Open Source and Dynamic Languages at www.activestate.com/blog |
From: <tpo...@gm...> - 2011-03-25 16:06:35
|
On Mar 25, 2011 7:49am, miguel sofer <mig...@gm...> wrote: > I can now document my misgivings about tclbench: it is too noisy. It may > be suitable for some comparisons, but its value as a guide for > optimization work is at least doubtful. I for one would be in favor of adding a set of application benchmarks to tclbench. How optimization X or tweak Y really effects *my* application is always the most important factor for *me*. Having a set of representative applications is probably the next best thing, as long as enough problem domains are covered. I would probably want benchmark apps that cover: - computationally intensive - i/o intensive - symbol manipulation - data parsing This also might be a better judge of an specific optimization. I would suspect that we might see increased performance of some apps, but decrease others, for any particular tweak. Of course developing and verifying a set of representative applications will take a fair amount of time (which is why I'm proposing this, but don't have any concrete examples.) I think Tcllib offers much of what we would want in raw form: various struct packages, aes/des/sha/md5 ciphers, page parser generators and finite automata, document generation tools, etc. A few apps mentioned on the wiki also come to mind: Nagelfar, taccle, sudoku solvers. Ideally, some apps would be Tcl 8.4 compatible, so that we could see performance over the last several years of development. Applications should probably have enough input to run for a significant amount of time on modern hardware. > Results attached; in summary: > - there are differences of up to 16% between the min and max values > - 44 benchmarks out of 644 show differences above 5% (see below) > - the worst happened in benchmarks that do not involve system calls > - 5 repeats are not enough to filter out the noise > - it took almost an hour to run these 5 repeats This is worth viewing: http://www.infoq.com/presentations/click-crash-course-modern-hardware slides only: http://www.azulsystems.com/events/javaone_2009/session/2009_J1_HardwareCrashCourse.pdf (don't let the 'Java' fool you - this is all about ASM level code and what hardware does with it) -Tom |
From: Larry M. <lm...@bi...> - 2011-03-25 16:09:35
|
On Fri, Mar 25, 2011 at 04:06:29PM +0000, tpo...@gm... wrote: > On Mar 25, 2011 7:49am, miguel sofer <mig...@gm...> wrote: >> I can now document my misgivings about tclbench: it is too noisy. It >> may be suitable for some comparisons, but its value as a guide for >> optimization work is at least doubtful. > > > I for one would be in favor of adding a set of application benchmarks to > tclbench. How optimization X or tweak Y > really effects *my* application is always the most important factor for > *me*. Having a set of representative applications is > probably the next best thing, as long as enough problem domains are > covered. I would probably want benchmark > apps that cover: > > - computationally intensive > - i/o intensive > - symbol manipulation > - data parsing I think langbench has all of that, has stable numbers, and does it for tcl, perl, ruby, python, and L. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: Tom P. <tpo...@gm...> - 2011-03-25 16:27:15
|
On Fri, Mar 25, 2011 at 10:09 AM, Larry McVoy <lm...@bi...> wrote: > I think langbench has all of that, has stable numbers, and does it for > tcl, perl, ruby, python, and L. I mean no disrespect to you Larry, but I regard langbench as micro-benchmarks. Or, perhaps the code I write is vastly inefficient, taking several hundred or thousands of lines of code to do the same thing :-) -Tom |
From: Larry M. <lm...@bi...> - 2011-03-25 16:31:43
|
On Fri, Mar 25, 2011 at 10:27:09AM -0600, Tom Poindexter wrote: > On Fri, Mar 25, 2011 at 10:09 AM, Larry McVoy <lm...@bi...> wrote: > > > I think langbench has all of that, has stable numbers, and does it for > > tcl, perl, ruby, python, and L. > > I mean no disrespect to you Larry, but I regard langbench as micro-benchmarks. What's wrong with microbenchmarks? langbench is indeed microbenchmarking, that's by design. I've got a lot of experience with taking very large programs and turning them into microbenchmarks of the critical performance path. It's much easier to understand what is slow if the benchmark is tiny. |
From: Jeff H. <je...@ac...> - 2011-03-25 16:32:58
|
On 25/03/2011 9:06 AM, tpo...@gm... wrote: > I for one would be in favor of adding a set of application benchmarks > to tclbench. How optimization X or tweak Y really effects *my* > application is always the most important factor for *me*. Having a > set of representative applications is probably the next best thing, > as long as enough problem domains are covered. I would probably want > benchmark apps that cover: tclbench is a mish-mash of tests that do cover some aspects of the above, with a lot of micro-tests as well. I think for the larger tests you get truer estimation of impact, but I would guess you are looking for even larger app runs. Some examples in tclbench: > - computationally intensive base64, gccont*, md5 and sha all count here (all pure Tcl in tclbench) > - i/o intensive File IO? That would be READ, but maybe not sufficient to represent larger app usage. tclbench does no socket tests. > - symbol manipulation > - data parsing Aren't these the same? gccont may be the best rep for the former, which is an extract of real code where someone was biotech algorithms with pure Tcl, and it is compute intensive for playing around with lots of variety to achieve the same result. This is another aspect of tclbench that can be very informative to learning Tcl'ers about how to achieve the best results using a different algorithmic style in Tcl. Data parsing ... could cover a large range, but the PARSE html form upload is a "real world" example (though again, just prepares the data, doesn't do any networking). > Of course developing and verifying a set of representative applications > will take a fair amount of time (which is why > I'm proposing this, but don't have any concrete examples.) I think This might actually be a good GSoC project, if it included creating a better base of larger test apps. Overall tclbench is over 8000 loc for just the benchmark tests, and I'd say it really only scratches the surface of the operations that are "real world". That said, while tclbench isn't exact enough for some performance testing, it is best to look at it like the canary in the coalmine. Sometimes the twitches are no big deal, but it will also keel over, and we shouldn't ignore that. It's been more useful than not, and does highlight major regressions sometimes. Jeff |
From: <L....@su...> - 2011-03-25 16:12:43
|
On Mar 25, 2011 7:49am, miguel sofer <mig...@gm...> wrote: > I can now document my misgivings about tclbench: it is too noisy. It may be suitable for some comparisons, but its value as a guide for optimization work is at least doubtful. I'm having trouble buying this argument. If the improvement is lost in the noise, rather than an order-of-magnitude-or-better improvement, why waste time on it? |
From: Twylite <tw...@cr...> - 2011-03-27 15:40:46
|
On 2011/03/25 06:11 PM, L....@su... wrote: > On Mar 25, 2011 7:49am, miguel sofer<mig...@gm...> wrote: >> I can now document my misgivings about tclbench: it is too noisy. It may be suitable for some comparisons, but its value as a guide for optimization work is at least doubtful. > I'm having trouble buying this argument. If the improvement is lost in the noise, rather than an order-of-magnitude-or-better improvement, why waste time on it? A benchmark to test the speed of calling a proc may call an empty proc 1,000,000 times in a tight loop. The real time taken by that benchmark will still be insignificant compared to reading 1Mb from a file, or looping a thousand times around a large [lsort] or [lsearch]. A 50% slowdown in proc calling time would impact significantly on a real-world app, but he hardly noticed in the benchmarks. - Twylite |
From: Donald G P. <dg...@ni...> - 2011-03-25 16:17:53
|
L....@su... wrote: > I'm having trouble buying this argument. If the improvement is lost > in the noise, rather than an order-of-magnitude-or-better improvement, > why waste time on it? I'm less interested in a tool for measuring attempts at improvement than I am in a tool that will tell me my changes not aimed at performance matters did not harm performance significantly. Why? There's an established history that performance losses of 5 to 10% will raise complaints. And they will raise them years after the change causing them was committed. I'd rather have some chance of knowing at the time of commit that the changes are going to cause controversy. -- | Don Porter Mathematical and Computational Sciences Division | | don...@ni... Information Technology Laboratory | | http://math.nist.gov/~DPorter/ NIST | |______________________________________________________________________| |