|
From: Julian S. <js...@ac...> - 2005-12-14 14:42:36
|
One of the Valgrind developers (Nick) has recently started putting together a collection of programs intended to measure Valgrind's performance (in perf/ if you have a recent svn checkout). This is a good thing. For the first time it allows us to systematically measure Valgrind's performance across a range of different program types and different CPUs. Even at this early stage it has uncovered some performance problems, the fixes for which will be in 3.2.0. We are looking for a good floating point test program for the suite. There are already two FP programs in it, but neither meets the following set of requirements. If you know of a program which does meet them, and especially if you have the time/expertise to help modify an existing program to meet them, we would be pleased to hear from you. Essential requirements: - must be available under an open-source license. Doesn't have to be GPL since is not being linked into Valgrind. - must be written in C, and work correctly on both 32- and 64-bit platforms, big- and little-endian - must be predominantly double-precision floating-point in workload - must have several innermost loops/hotspots. A program of at least moderate complexity is desired (1000-10000 lines of C). Our existing FP benchmarks are synthetic benchmarks and do not meet this requirement. Perhaps something like a simple raytracer, fluid dynamics code, or audio codec would be suitable. J |
|
From: Dirk M. <dm...@gm...> - 2005-12-14 16:20:22
|
On Wednesday 14 December 2005 15:42, Julian Seward wrote: > Perhaps something like a simple raytracer, fluid dynamics code, > or audio codec would be suitable. http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz matches this pretty good, except that its C++ and used as a compiler optimisation testcase (it needs 3-4 minutes to compile even). Dirk |
|
From: Stefan K. <en...@ho...> - 2005-12-14 20:03:35
|
Hi Julian, wouldn't that be a good candidate: http://www.fftw.org/ Otherwise I could code you a little mandelbrot-generator. Stefan Julian Seward wrote: > One of the Valgrind developers (Nick) has recently started > putting together a collection of programs intended to measure > Valgrind's performance (in perf/ if you have a recent svn checkout). > > This is a good thing. For the first time it allows us to > systematically measure Valgrind's performance across a range > of different program types and different CPUs. Even at this > early stage it has uncovered some performance problems, the > fixes for which will be in 3.2.0. > > We are looking for a good floating point test program for the > suite. There are already two FP programs in it, but neither meets > the following set of requirements. If you know of a program > which does meet them, and especially if you have the time/expertise > to help modify an existing program to meet them, we would be pleased > to hear from you. > > Essential requirements: > > - must be available under an open-source license. Doesn't have > to be GPL since is not being linked into Valgrind. > > - must be written in C, and work correctly on both 32- and 64-bit > platforms, big- and little-endian > > - must be predominantly double-precision floating-point in workload > > - must have several innermost loops/hotspots. A program of at > least moderate complexity is desired (1000-10000 lines of C). > Our existing FP benchmarks are synthetic benchmarks and do not > meet this requirement. > > Perhaps something like a simple raytracer, fluid dynamics code, > or audio codec would be suitable. > > J > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Nicholas N. <nj...@cs...> - 2005-12-15 20:20:30
|
On Thu, 15 Dec 2005, Thomas Lavergne wrote: > I do not know what you call a "simple" raytracer, but I would say the > "rayshade" is quite simple. Written in C (not a very recent C style), the > core tracer routines have around 6000 lines of instructions. It has a history > of configuration on linux / aix (does it proove the point on big- and little- > endian?) and I read somewhere about a 64-bit build. It was part (is still > part???) of some Linux distros. Like all ray-tracer, it involves quite a few > double precision floating point operations. Moreover, you can design your > test-case "as heavy as you wish" by using more complicated scenes to be > rendered. > The coding style tries to emulate an object-oriented encapsulation with > structures and function pointers. > I already ran valgrind on rayshade and it is clear that there is a very poor > memory management: what is allocated is rarely freed. I do not know if you > want an error-free code for you test cases. > Rayshade development stopped around 1994 and some hacks will certainly be > needed to have it build smoothly on all platforms. Especially, there is a > Lex/Yacc grammar file that needs some updates in order to build under Linux. > There is 1 popen() command, but it is not mandatory. > Let me know if I can be of any help. I have a 32-bit Linux PC and have built > rayshade from source (with modifications). The good things: - The size is good. - The quasi-OO style is good, since much Valgrind use is on C++ programs. - The popen() is not a problem. - The poor memory management doesn't sound like a problem. - The use of double precision is good. The bad things: - The dependence on Lex/Yacc is bad. Perhaps the generated .c file(s) could be used, hopefully they're not too big. - Portability is a concern; if it's not portable that would be a problem. Nick |
|
From: Julian S. <js...@ac...> - 2005-12-14 20:21:41
|
On Wednesday 14 December 2005 20:17, Stefan Kost wrote: > Hi Julian, > > wouldn't that be a good candidate: > http://www.fftw.org/ Maybe. The problem is we don't really have much time to investigate whether it could be turned into a suitable standalone, portable benchmark. Can you do that? > Otherwise I could code you a little mandelbrot-generator. Well, that would probably fail the ... > > - must have several innermost loops/hotspots. A program of at > > least moderate complexity is desired (1000-10000 lines of C). > > Our existing FP benchmarks are synthetic benchmarks and do not > > meet this requirement. ... requirement. J |
|
From: Nicholas N. <nj...@cs...> - 2005-12-14 20:26:23
|
On Wed, 14 Dec 2005, Julian Seward wrote: >> wouldn't that be a good candidate: >> http://www.fftw.org/ > > Maybe. The problem is we don't really have much time > to investigate whether it could be turned into a suitable > standalone, portable benchmark. Can you do that? > >> Otherwise I could code you a little mandelbrot-generator. > > Well, that would probably fail the ... > >>> - must have several innermost loops/hotspots. A program of at >>> least moderate complexity is desired (1000-10000 lines of C). >>> Our existing FP benchmarks are synthetic benchmarks and do not >>> meet this requirement. > > ... requirement. We already have an FFT benchmark, would the FFTW program also be dominated by a single loop? Nick |
|
From: Julian S. <js...@ac...> - 2005-12-14 20:31:43
|
On Wednesday 14 December 2005 20:26, Nicholas Nethercote wrote: > On Wed, 14 Dec 2005, Julian Seward wrote: > >> wouldn't that be a good candidate: > >> http://www.fftw.org/ > > We already have an FFT benchmark, would the FFTW program also be dominated > by a single loop? Quite possibly, yes (at a guess). J |
>>>>> "JS" == Julian Seward <js...@ac...> writes: JS> We are looking for a good floating point test program for the JS> suite. When I was in need of something similar recently, I spent some time munging the Vorbis audio encoder program oggenc and its associated libraries into a program that compiled as a single file; it's available at http://pag.csail.mit.edu/~smcc/projects/single-file-programs/ Here's my understanding of how it stacks up: JS> Essential requirements: JS> - must be available under an open-source license. Check. Mixture of GPL and BSDish. JS> - must be written in C, and work correctly on both 32- and 64-bit JS> platforms, big- and little-endian Check. There's a bit of inline x86 assembly for setting the rounding mode, but there's also a pure-C fallback. JS> - must be predominantly double-precision floating-point in JS> workload Mixture of float, double, and some pure-integer parts (e.g., bitpacking). After doing a s/float/double/g, it still seems to run. JS> - must have several innermost loops/hotspots. Seems that way from a brief glance with callgrind here. It does an FFT in one place and an FFT-like operation (MDCT) in another, but also a bunch of other signal processing stuff and a qsort() by fabs(). JS> A program of at least moderate complexity is desired (1000-10000 JS> lines of C). It's somewhat big: 58k lines and 1.7MB of source (though a bunch of that is static tables). A related disadvantage is that the inputs you would want to run it on would be pretty big. -- Stephen |
|
From: Michael S. <ms...@xi...> - 2005-12-14 23:13:12
|
On 12/14/05, Stephen McCamant <sm...@cs...> wrote: > > When I was in need of something similar recently, I spent some time > munging the Vorbis audio encoder program oggenc and its associated > libraries into a program that compiled as a single file; it's > available at > > http://pag.csail.mit.edu/~smcc/projects/single-file-programs/ I'd been intending to suggest doing something like this, though I didn't realise someone had already done the munging-into-single-file part. Thanks! > > Here's my understanding of how it stacks up: > > JS> Essential requirements: > > JS> - must be available under an open-source license. > > Check. Mixture of GPL and BSDish. Right. I don't think that's an issue, but I'll happily relicense the oggenc parts of it if there's an issue. > > JS> - must be written in C, and work correctly on both 32- and 64-bit > JS> platforms, big- and little-endian > > Check. There's a bit of inline x86 assembly for setting the rounding > mode, but there's also a pure-C fallback. Yes. They're also only used for decoding. Encoding is more interesting as a benchmark, probably. Ripping them out for a benchmark version would be fine (does valgrind even obey these properly?). > > JS> - must be predominantly double-precision floating-point in > JS> workload > > Mixture of float, double, and some pure-integer parts (e.g., > bitpacking). After doing a s/float/double/g, it still seems to > run. The computationally expensive parts are almost entirely single precision floats, though as you note, it's safe to convert them to doubles everywhere. That would probably make it a decent match for the requirements Julian wanted. If you have any more questions about the vorbis codebase, or want some help with getting it in shape for valgrind benchmarking, please let me know (I wrote oggenc, as well as some small bits of libvorbis, and I know the code reasonably well). Mike |
|
From: Julian S. <js...@ac...> - 2005-12-17 12:52:38
|
Stephen, Michael, Thanks for the info on this. 1.7M of source is, well, a lot, but not beyond the bounds of possibility. What worries me more is how to generate enough workload to make it run for 2-5 seconds (natively). I don't want to have megabytes of .wav file in the repo to achieve that. Michael, is it possible to get it to compress the same bunch of samples repeatedly to achieve the necessary run-time? Do you deal in frames of data or some such? then we could maybe add a C const array holding one frame's worth of input, and iterate over that repeatedly. Is that viable? J On Wednesday 14 December 2005 23:13, Michael Smith wrote: > On 12/14/05, Stephen McCamant <sm...@cs...> wrote: > > When I was in need of something similar recently, I spent some time > > munging the Vorbis audio encoder program oggenc and its associated > > libraries into a program that compiled as a single file; it's > > available at > > > > http://pag.csail.mit.edu/~smcc/projects/single-file-programs/ > > I'd been intending to suggest doing something like this, though I > didn't realise someone had already done the munging-into-single-file > part. Thanks! > > > Here's my understanding of how it stacks up: > > > > JS> Essential requirements: > > > > JS> - must be available under an open-source license. > > > > Check. Mixture of GPL and BSDish. > > Right. I don't think that's an issue, but I'll happily relicense the > oggenc parts of it if there's an issue. > > > JS> - must be written in C, and work correctly on both 32- and 64-bit > > JS> platforms, big- and little-endian > > > > Check. There's a bit of inline x86 assembly for setting the rounding > > mode, but there's also a pure-C fallback. > > Yes. They're also only used for decoding. Encoding is more interesting > as a benchmark, probably. Ripping them out for a benchmark version > would be fine (does valgrind even obey these properly?). > > > JS> - must be predominantly double-precision floating-point in > > JS> workload > > > > Mixture of float, double, and some pure-integer parts (e.g., > > bitpacking). After doing a s/float/double/g, it still seems to > > run. > > The computationally expensive parts are almost entirely single > precision floats, though as you note, it's safe to convert them to > doubles everywhere. That would probably make it a decent match for the > requirements Julian wanted. > > If you have any more questions about the vorbis codebase, or want some > help with getting it in shape for valgrind benchmarking, please let me > know (I wrote oggenc, as well as some small bits of libvorbis, and I > know the code reasonably well). > > Mike |
|
From: Michael S. <ms...@xi...> - 2005-12-17 13:04:22
|
On 12/17/05, Julian Seward <js...@ac...> wrote: > > Stephen, Michael, > > Thanks for the info on this. 1.7M of source is, well, a lot, but > not beyond the bounds of possibility. What worries me more is how > to generate enough workload to make it run for 2-5 seconds (natively). > I don't want to have megabytes of .wav file in the repo to achieve that. It's actually about 20 thousand lines of source - ~600kB. Then on top of that, there's about a megabyte of static tables; for a single benchmark you could create a version that used only a small subset of those. > > Michael, is it possible to get it to compress the same bunch of samples > repeatedly to achieve the necessary run-time? Do you deal in frames of > data or some such? then we could maybe add a C const array holding > one frame's worth of input, and iterate over that repeatedly. Is > that viable? I haven't looked at Stephen's single-file version of all of this yet, so I'm not sure exactly what he's kept and what he hasn't. My impression was that it was oggenc AND libvorbis. libvorbis deals in arbitrarily sized chunks of data (it collects them internally to create blocks of the sizes that it wants) that get passed in directly, so it's very easy to do what you asked here. oggenc (the command-line frontend) is structured to read from actual files, but this wouldn't be a particularly interesting part of the benchmark anyway. I expect the best option would be to hack away most of oggenc, and just make it iterate over either a file or a static array (it doesn't really matter which, so long as the data isn't just all zeros!) submitting that to libvorbis. Depending on how this cold progresses (as I sit here sipping my hot honey&lemon drink), I'll either have a look at doing this this afternoon, or I'll be in bed... we'll see! Mike |
|
From: Julian S. <js...@ac...> - 2005-12-17 13:45:58
|
I've just done a quick experiment which convinces me that oggenc would be a good FP benchmark -- the FP activity is spread out over a number of blocks, which is what we want. The profile below is from a run of OggEnv 1.0.1 as supplied with SuSE10, encoding /opt/kde3/share/sounds/KDE_Startup_new.wav at level 3. It ran for 76.7 million basic blocks. The top 100 blocks account for less than 70% of the total blocks run, which is a good thing. Does the profile roughly concur with your understanding about where the inner loops in oggenc are? > Depending on how this cold progresses (as I sit here sipping my hot > honey&lemon drink), I'll either have a look at doing this this > afternoon, or I'll be in bed... we'll see! Well, if you're feeling up to it, any such help would be appreciated. J ----------------------------------------------------------- --- BEGIN BB Profile (summary of scores) --- ----------------------------------------------------------- Total score = 76732436 0: ( 2765033 3.60%) 2765033 3.60% 0x414B282 _vp_tonemask+610 1: ( 5517409 7.19%) 2752376 3.58% 0x414B26B _vp_tonemask+587 2: ( 8050022 10.49%) 2532613 3.30% 0x414B27E _vp_tonemask+606 3: ( 9440332 12.30%) 1390310 1.81% 0x414B25F _vp_tonemask+575 4: ( 10815055 14.09%) 1374723 1.79% 0x414B296 _vp_tonemask+630 5: ( 11908465 15.51%) 1093410 1.42% 0x4150B45 6: ( 13001875 16.94%) 1093410 1.42% 0x4150B82 7: ( 13972771 18.20%) 970896 1.26% 0x414AE40 _vp_remove_floor+48 8: ( 14941763 19.47%) 968992 1.26% 0x414A8E4 9: ( 15807976 20.60%) 866213 1.12% 0x414AA65 10: ( 16672368 21.72%) 864392 1.12% 0x414AA92 11: ( 17348984 22.60%) 676616 0.88% 0x414A5E6 12: ( 18024648 23.49%) 675664 0.88% 0x414A5E0 13: ( 18697971 24.36%) 673323 0.87% 0x414A870 14: ( 19369827 25.24%) 671856 0.87% 0x414B363 _vp_tonemask+835 15: ( 20027222 26.10%) 657395 0.85% 0x414B372 _vp_tonemask+850 16: ( 20653726 26.91%) 626504 0.81% 0x414B379 _vp_tonemask+857 17: ( 21265163 27.71%) 611437 0.79% 0x414B354 _vp_tonemask+820 18: ( 21869585 28.50%) 604422 0.78% 0x4148A50 19: ( 22457963 29.26%) 588378 0.76% 0x4148A7C 20: ( 23043394 30.03%) 585431 0.76% 0x414A696 21: ( 23619480 30.78%) 576086 0.75% 0x4151A0A floor1_fit+954 22: ( 24195566 31.53%) 576086 0.75% 0x4151A9E floor1_fit+1102 23: ( 24699981 32.18%) 504415 0.65% 0x414A651 24: ( 25202525 32.84%) 502544 0.65% 0x41515BB 25: ( 25705069 33.49%) 502544 0.65% 0x41515B5 26: ( 26207613 34.15%) 502544 0.65% 0x41515AE 27: ( 26699517 34.79%) 491904 0.64% 0x4148820 28: ( 27185917 35.42%) 486400 0.63% 0x414B792 _vp_offset_and_mix+674 29: ( 27672317 36.06%) 486400 0.63% 0x414B693 _vp_offset_and_mix+419 30: ( 28158717 36.69%) 486400 0.63% 0x4145B60 drft_forward+368 31: ( 28645117 37.33%) 486400 0.63% 0x414B7FA _vp_offset_and_mix+778 32: ( 29131517 37.96%) 486400 0.63% 0x414B636 _vp_offset_and_mix+326 33: ( 29617914 38.59%) 486397 0.63% 0x414AFC0 _vp_noisemask+288 34: ( 30104301 39.23%) 486387 0.63% 0x414B838 _vp_offset_and_mix+840 35: ( 30589749 39.86%) 485448 0.63% 0x414B5DD _vp_offset_and_mix+237 36: ( 31075197 40.49%) 485448 0.63% 0x414B144 _vp_tonemask+292 37: ( 31560645 41.13%) 485448 0.63% 0x41565D9 38: ( 32046093 41.76%) 485448 0.63% 0x414B404 _vp_tonemask+996 39: ( 32531541 42.39%) 485448 0.63% 0x414AFA0 _vp_noisemask+256 40: ( 33016917 43.02%) 485376 0.63% 0x414B63D _vp_offset_and_mix+333 41: ( 33501413 43.66%) 484496 0.63% 0x4156451 42: ( 33984930 44.29%) 483517 0.63% 0x4148A70 43: ( 34468186 44.91%) 483256 0.62% 0x414B82A _vp_offset_and_mix+826 44: ( 34951442 45.54%) 483256 0.62% 0x414B80A _vp_offset_and_mix+794 45: ( 35431956 46.17%) 480514 0.62% 0x4142BE0 mdct_forward+848 46: ( 35912433 46.80%) 480477 0.62% 0x415649F 47: ( 36390403 47.42%) 477970 0.62% 0x4148A60 48: ( 36858138 48.03%) 467735 0.60% 0x414B3E9 _vp_tonemask+969 49: ( 37320536 48.63%) 462398 0.60% 0x41525D4 floor1_encode+1476 50: ( 37779588 49.23%) 459052 0.59% 0x415390E 51: ( 38225886 49.81%) 446298 0.58% 0x4151A3B floor1_fit+1003 52: ( 38672184 50.39%) 446298 0.58% 0x4151A3F floor1_fit+1007 53: ( 39116995 50.97%) 444811 0.57% 0x4151A5B floor1_fit+1035 54: ( 39553540 51.54%) 436545 0.56% 0x4151A74 floor1_fit+1060 55: ( 39978692 52.10%) 425152 0.55% 0x414BFC0 56: ( 40389737 52.63%) 411045 0.53% 0x414B7E9 _vp_offset_and_mix+761 57: ( 40787608 53.15%) 397871 0.51% 0x4153980 58: ( 41177571 53.66%) 389963 0.50% 0x414A608 59: ( 41567534 54.17%) 389963 0.50% 0x414A61F 60: ( 41957187 54.67%) 389653 0.50% 0x41525E4 floor1_encode+1492 61: ( 42345557 55.18%) 388370 0.50% 0x4151580 62: ( 42730875 55.68%) 385318 0.50% 0x414A62B 63: ( 43115827 56.18%) 384952 0.50% 0x4153900 64: ( 43499130 56.68%) 383303 0.49% 0x414BF79 65: ( 43881920 57.18%) 382790 0.49% 0x41519F1 floor1_fit+929 66: ( 44258790 57.67%) 376870 0.49% 0x414BFC5 67: ( 44597574 58.12%) 338784 0.44% 0x414B06D _vp_tonemask+77 68: ( 44922222 58.54%) 324648 0.42% 0x41434C4 69: ( 45246870 58.96%) 324648 0.42% 0x414354D 70: ( 45571518 59.39%) 324648 0.42% 0x4143420 71: ( 45890710 59.80%) 319192 0.41% 0x41432D0 72: ( 46207164 60.21%) 316454 0.41% 0x414A740 73: ( 46504737 60.60%) 297573 0.38% 0x414A8FF 74: ( 46764897 60.94%) 260160 0.33% 0x4153830 75: ( 47012601 61.26%) 247704 0.32% 0x414B14F _vp_tonemask+303 76: ( 47260305 61.59%) 247704 0.32% 0x414B3D0 _vp_tonemask+944 77: ( 47504385 61.90%) 244080 0.31% 0x4148B10 78: ( 47747841 62.22%) 243456 0.31% 0x4147D16 vorbis_analysis_blockout+918 79: ( 47991041 62.54%) 243200 0.31% 0x414AF00 _vp_noisemask+96 80: ( 48234241 62.86%) 243200 0.31% 0x414D16F _vp_couple+1023 81: ( 48477441 63.17%) 243200 0.31% 0x414B0A3 _vp_tonemask+131 82: ( 48720641 63.49%) 243200 0.31% 0x4148F70 _vorbis_apply_window+304 83: ( 48963841 63.81%) 243200 0.31% 0x414AF60 _vp_noisemask+192 84: ( 49207041 64.12%) 243200 0.31% 0x4154C60 res2_forward+128 85: ( 49450241 64.44%) 243200 0.31% 0x41547B8 res2_class+408 86: ( 49693185 64.76%) 242944 0.31% 0x414BF95 87: ( 49936129 65.07%) 242944 0.31% 0x4148F40 _vorbis_apply_window+256 88: ( 50178853 65.39%) 242724 0.31% 0x414BDB6 _vp_quantize_couple_memo+262 89: ( 50421540 65.71%) 242687 0.31% 0x804BF50 90: ( 50664227 66.02%) 242687 0.31% 0x804BF2E 91: ( 50906914 66.34%) 242687 0.31% 0x804BF77 92: ( 51149364 66.65%) 242450 0.31% 0x804BF27 93: ( 51388060 66.97%) 238696 0.31% 0x414B345 _vp_tonemask+805 94: ( 51626756 67.28%) 238696 0.31% 0x414B3B4 _vp_tonemask+916 95: ( 51865452 67.59%) 238696 0.31% 0x414B13E _vp_tonemask+286 96: ( 52104148 67.90%) 238696 0.31% 0x414B30B _vp_tonemask+747 97: ( 52342844 68.21%) 238696 0.31% 0x414B39E _vp_tonemask+894 98: ( 52581540 68.52%) 238696 0.31% 0x414B3DB _vp_tonemask+955 99: ( 52819806 68.83%) 238266 0.31% 0x4148AD4 |