From: Albert G. <Dr....@t-...> - 2008-07-07 20:42:41
|
Hi Jiri, >> In return I promise to investigate why the dict and set modules take >> so long to compile. :) > > It would be nice, it is really slow. Ok, I did some profiling now, and it seems that the lion's share (over 70%) of the time is spent within LLVM, only a small fraction (not exceeding 15%) goes to the parser and all my own semantic routines. There goes my theory that it might be related to my own environment code... (Actually I would have preferred that, if it's my own code then I can fix it. But I checked specifically those routines and they don't even make a blip on the radar.) I still have to analyze the profiles more thoroughly, there might be some hidden inefficiencies in my semantic routines which make it spend a lot of time in STL routines (which make up for the remaining 15%). But in any case it looks like LLVM is to blame, which is a major bummer. :( I would have suspected that it's the optimization passes, but disabling these doesn't speed it up much either, so it seems that most of those 70% of the compilation time the darn thing just generates LLVM IR. I guess I'll have to try LLVM 2.3 and see whether it's any faster. There's still hope since they do have a new IR Builder class in 2.3... Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Rooslan S. K. <kh...@gm...> - 2008-07-07 21:37:46
Attachments:
pure-for-llvm23.patch
|
Albert Graef wrote: > I guess I'll have to try LLVM 2.3 and see whether it's any faster. > There's still hope since they do have a new IR Builder class in 2.3... Here is a patch (created by mostly mechanical substitution) to compile Pure with LLVM trunk. I guess it should work with 2.3 as well. It passes all tests on my Linux x86 box. I've never tried Pure with 2.2, so can't really compare these performance wise. Compiling any of supplied library modules or sources in examples/ folder takes less than 1 sec on a not so modern hardware (single core AMD 3200+). -- Best regards, Rooslan |
From: Albert G. <Dr....@t-...> - 2008-07-07 22:10:50
|
Hi Rooslan, welcome to the list! Rooslan S. Khayrov wrote: > Here is a patch (created by mostly mechanical substitution) to compile > Pure with LLVM trunk. I guess it should work with 2.3 as well. It passes > all tests on my Linux x86 box. Thanks a bunch, that's heaven-sent! :) Will try it immediately. > I've never tried Pure with 2.2, so can't > really compare these performance wise. Compiling any of supplied library > modules or sources in examples/ folder takes less than 1 sec on a not so > modern hardware (single core AMD 3200+). Did you run these with -i? Otherwise the interpreter is not really compiling anything (it's all done lazily). To actually force the Pure interpreter to generate the IR for a module when you run it in batch mode and you're not actually computing anything, you can use a command like the following: pure -i set.pure </dev/null This will force it to think it's in interactive mode, in which case it compiles all pending definitions before it enters the read-eval-print loop. Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Rooslan S. K. <kh...@gm...> - 2008-07-07 22:24:31
|
Albert Graef wrote: > Did you run these with -i? Otherwise the interpreter is not really > compiling anything (it's all done lazily). Uh, no. (Unfortunately I don't have time to closely follow Pure development, but Q loveliness made me to try out its successor :-) ) So, real compilation time for sets and dict is 3-4 secs, contrary to 1-1.5 for other modules. -- Best regards, Rooslan |
From: Albert G. <Dr....@t-...> - 2008-07-07 22:57:17
|
Rooslan S. Khayrov wrote: > Uh, no. (Unfortunately I don't have time to closely follow Pure > development, but Q loveliness made me to try out its successor :-) ) Once you tried it there's no going back. ;-) No, seriously, if you give it a whirl, let me know what you think. > So, real compilation time for sets and dict is 3-4 secs, contrary to > 1-1.5 for other modules. So the new builder class isn't any faster, that's bad news. :( Jiri, I think we might have to work around this for a while. I'll have a look at those babies tomorrow and see how I can massage some of the definitions so that the matching automata don't get that big. Good night, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiri S. <jir...@bl...> - 2008-07-07 23:19:56
|
Albert Graef wrote: > So the new builder class isn't any faster, that's bad news. :( > > Jiri, I think we might have to work around this for a while. I'll have a > look at those babies tomorrow and see how I can massage some of the > definitions so that the matching automata don't get that big. > Maybe we could try the original Bird/Wadler algo and look at the compilation/execution speed trade-off. The executions speed in Pure vs. Q is often inscrutable. Most operation with dict run some 5 times faster with Pure than with Q but the members operation runs about 10 times slower with Pure than with Q. Maybe the difference between that two implementations of AVLT will not be that big as with Q and the compilation will be surely much faster. Jiri |
From: Albert G. <Dr....@t-...> - 2008-07-07 23:54:11
|
Jiri Spitz wrote: > Maybe we could try the original Bird/Wadler algo and look at the > compilation/execution speed trade-off. Ok, will try that tomorrow. > Most operation with dict run some 5 times faster > with Pure than with Q but the members operation runs about 10 times > slower with Pure than with Q. 10 times *slower*?? I don't have this over here. > let d = dict $ zipwith (=>) (1..100000) (1..100000); > stats > #members d; 100000 0.86s ==> def D = dict $ zip [1..100000] [1..100000] ==> #members D; stats 100000 0.81 secs, 300002 reductions, 265536 cells Still the Pure version seems awfully slow given that Pure is supposed to run a lot *faster* than Q. Hmm, and it gets slower each time I run the members function. Something's wrong there. I will have to see whether the B/W implementation exhibits the same behaviour. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiří S. <jir...@bl...> - 2008-07-08 12:12:45
|
Albert Graef wrote: > Added (examples/avltree.pure). It's a fairly simplistic implementation, > but it compiles much faster than set.pure. OTOH, insert and delete take > about twice as long compared to your implementation, which IIRC is > consistent with the Q benchmarks that you did earlier. So I really > wouldn't want to replace your version with it. ;-) > Albert I'll have time to play with it in the evening. For now only one remark. The type annotations improve the execution speed quite a lot, so it is not upright to compare implementations without and with them. Jiri |
From: Albert G. <Dr....@t-...> - 2008-07-08 13:38:49
|
Jir(í Spitz wrote: > Albert I'll have time to play with it in the evening. For now only one > remark. The type annotations improve the execution speed quite a lot, so > it is not upright to compare implementations without and with them. All I can say that I added them to avltree.pure, and they didn't make much of a difference in the examples I tried. The worst example I saw was that deleting all 100000 (int) members from a tree took some 9% longer than with the ::int tags. -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-07-08 11:49:48
|
Hi Jiri, Albert Graef wrote: > Jiri Spitz wrote: >> Maybe we could try the original Bird/Wadler algo and look at the >> compilation/execution speed trade-off. > > Ok, will try that tomorrow. Added (examples/avltree.pure). It's a fairly simplistic implementation, but it compiles much faster than set.pure. OTOH, insert and delete take about twice as long compared to your implementation, which IIRC is consistent with the Q benchmarks that you did earlier. So I really wouldn't want to replace your version with it. ;-) I found that adding special case rules to avltree.pure only yields marginal improvements (<10%), so you might be able to remove them from your implementation, too, without taking a major performance hit. That alone will make your modules load much faster, and there's some other refactoring that we can do (like checking the balance factor in a guard instead of matching it on the lhs) that will further reduce the number of overlapping rules. > Still the Pure version seems awfully slow given that Pure is supposed to > run a lot *faster* than Q. Rubbish, who said that? Oops, that was me. :) In fact, all operations consistently outperform the corresponding Q definitions by a factor of 5-7, if they're just plain Q definitions. The only cases where Q is able to keep up with Pure is when operations are involved that are actually implemented in C, like creating enumerations and # on lists. Another factor is that Q has a more compact internal list representation which makes some of these operations even faster. For a bytecode interpreter Q indeed performs quite well on this stuff (compared to the recursive benchmark where it sucks big time), but that's because these definitions don't *do* much beyond pattern matching and tree construction. I already mentioned in another thread that there's still room for improvements in Pure there, in particular inlining the runtime calls which construct symbol and application nodes will surely help. > Hmm, and it gets slower each time I run the > members function. Something's wrong there. I will have to see whether > the B/W implementation exhibits the same behaviour. It's the same with my implementation. Will have to investigate. Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiri S. <jir...@bl...> - 2008-07-08 17:53:29
|
Albert Graef wrote: > I found that adding special case rules to avltree.pure only yields > marginal improvements (<10%), so you might be able to remove them from > your implementation, too, without taking a major performance hit. That > alone will make your modules load much faster, and there's some other > refactoring that we can do (like checking the balance factor in a guard > instead of matching it on the lhs) that will further reduce the number > of overlapping rules. > OK. Testing set is now in 'examples/set_test.pure'. Removed type annotations don't hurt. I changed pattern matches of constants to guards where possible. The result does not seem to compile any faster (at least on my PC) and the execution is substantially slower :-( . Do you see any other possible rewrites of the code? I looked at other similar implementation of AVL trees (I already mentioned it somewhere in the Q discussion). It doesn't use those 'table' and 'table2' to calculate the new balances, there is some div/mod/&/| magic instead. The rest of the code is almost the same. This variant of the algorithm was only by about 30 % faster than Bird/Wadler in Q. I don't think it would compile much faster than the current one. Jiri |
From: Albert G. <Dr....@t-...> - 2008-07-11 03:04:57
|
Rooslan S. Khayrov wrote: > Here is a patch (created by mostly mechanical substitution) to compile > Pure with LLVM trunk. I guess it should work with 2.3 as well. Ok, I committed this now, thanks a lot for the patch. This means that starting with r434 Pure really needs LLVM 2.3 now. The LLVM 2.3 version of Cyrille Berger's 64 bit -fPIC patch is available under the following URL: http://pure-lang.sf.net/X86JITInfo.cpp.pic.2.3.patch Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Ryan S. <rya...@us...> - 2008-07-18 10:02:44
|
On Jul 10, 2008, at 22:05, Albert Graef wrote: > Rooslan S. Khayrov wrote: > >> Here is a patch (created by mostly mechanical substitution) to >> compile >> Pure with LLVM trunk. I guess it should work with 2.3 as well. > > Ok, I committed this now, thanks a lot for the patch. > > This means that starting with r434 Pure really needs LLVM 2.3 now. Now that llvm in MacPorts is updated to 2.3 I tried updating pure- devel to r438 but half of the tests fail (with Mac OS X 10.4.11 and Xcode 2.5 on Intel Core 2 Duo): Running tests. prelude.pure: FAILED test001.pure: passed test002.pure: FAILED test003.pure: FAILED test004.pure: FAILED test005.pure: passed test006.pure: passed test007.pure: passed test008.pure: passed test009.pure: passed test010.pure: FAILED test011.pure: FAILED test012.pure: passed test013.pure: passed test014.pure: FAILED test015.pure: FAILED > The LLVM 2.3 version of Cyrille Berger's 64 bit -fPIC patch is > available under the following URL: > http://pure-lang.sf.net/X86JITInfo.cpp.pic.2.3.patch Do we need to add this patch to the llvm port? |
From: Albert G. <Dr....@t-...> - 2008-07-18 18:29:43
|
Ryan Schmidt wrote: > Now that llvm in MacPorts is updated to 2.3 I tried updating pure- > devel to r438 but half of the tests fail (with Mac OS X 10.4.11 and > Xcode 2.5 on Intel Core 2 Duo): Latest svn? Can you please do a 'make cleanlogs logs' and send me your logs please (just zip up the test directory)? Afterwards you can run another make cleanlogs and svn up to get the golden logs again. >> The LLVM 2.3 version of Cyrille Berger's 64 bit -fPIC patch is >> available under the following URL: >> http://pure-lang.sf.net/X86JITInfo.cpp.pic.2.3.patch > > Do we need to add this patch to the llvm port? Don't think so, otherwise you'd presumably get a linkage error anyway. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Ryan S. <rya...@us...> - 2008-07-18 22:41:58
|
On Jul 18, 2008, at 15:15, Ryan Schmidt wrote: > On Jul 18, 2008, at 13:29, Albert Graef wrote: > >> Ryan Schmidt wrote: >> >>> Now that llvm in MacPorts is updated to 2.3 I tried updating pure- >>> devel to r438 but half of the tests fail (with Mac OS X 10.4.11 and >>> Xcode 2.5 on Intel Core 2 Duo): >> >> Latest svn? > > I'm using r438 of pure which is the latest at this time. Or did you > mean latest svn of llvm? If so, then no; the llvm port provides the > 2.3 release of llvm. Since yesterday there is also a new llvm-devel > port providing a later development version (53722); I could try that. Ok, pure r438 works with llvm-devel at 53722 so that's fine. The pure- devel port is updated! |
From: Albert G. <Dr....@t-...> - 2008-07-21 03:28:52
|
Ryan Schmidt wrote: > Ok, pure r438 works with llvm-devel at 53722 so that's fine. The pure- > devel port is updated! Thanks. So it seems that there are still some OSX bugs in LLVM 2.3 which are fixed in the latest LLVM from svn? I should probably mention that in the INSTALL file somewhere. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-07-07 21:41:26
|
Albert Graef wrote: >>> In return I promise to investigate why the dict and set modules take >>> so long to compile. :) >> It would be nice, it is really slow. > > Ok, I did some profiling now, and it seems that the lion's share (over > 70%) of the time is spent within LLVM, only a small fraction (not > exceeding 15%) goes to the parser and all my own semantic routines. I think I pinned that one down now. Noticing that only set.pure and dict.pure are affected, but not the other container data structures whose code is in principle quite similar, I took a closer look at the generated pattern matching code of these modules. Lo and behold, there are in fact some functions in those two modules which have a lot of overlapping rules producing automata with some 260-300 states. So I removed some the special case rules in those definitions, and that alone made the modules load thrice as fast. Mind you, 260-300 states isn't big at all, my construction algorithm can easily deal with much bigger automata, but of course the automaton then also yields a routine in LLVM IR which includes a decision tree (nested switch statements) with a total of 260-300 target labels. My guess is that the LLVM IR builder gets really slow when dealing with "big" assembler routines like this (maybe some badly coded algorithms with quadratic complexity in there, I haven't looked at the code). So now I have to get Pure to compile with LLVM 2.3 to see whether they fixed those horrible inefficiencies with the new builder class. If not then as an intermediate measure some refactoring of those overlapping rules in dict.pure and set.pure will help. Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |