Thread: [pure-lang-users] llvm 2.3 - good news
Status: Beta
Brought to you by:
agraef
From: Jiri S. <jir...@bl...> - 2008-07-10 13:50:42
|
Hello, I tried to compile Pure with llvm 2.3 under MinGW - without any problems. I only used the replacements from the Rooslan's patch and add another one. And first of all - the test #15 runs 12 s instead of original 55 s (both under MinGW, of course). :-) Jiri |
From: Jiri S. <jir...@bl...> - 2008-07-10 14:21:57
|
Jiri Spitz wrote: > And first of all - the test #15 runs 12 s instead of original 55 s (both > under MinGW, of course). > And under Linux: $ time make check Running tests. prelude.pure: passed test001.pure: passed test002.pure: passed test003.pure: passed test004.pure: passed test005.pure: passed test006.pure: passed test007.pure: passed test008.pure: passed test009.pure: passed test010.pure: passed test011.pure: passed test012.pure: passed test013.pure: passed test014.pure: passed test015.pure: passed real 0m18.424s user 0m17.970s sys 0m0.394s $ :-) :-) :-) Jiri |
From: Libor S. <li...@gm...> - 2008-07-10 19:17:02
|
Excellent! I look forward to this, or even LLVM 2.4, becoming the default Pure setup. Libor On Thu, 10 Jul 2008 15:21:59 +0100, Jiri Spitz <jir...@bl...> wrote: > Jiri Spitz wrote: >> And first of all - the test #15 runs 12 s instead of original 55 s (both >> under MinGW, of course). >> > And under Linux: > > $ time make check > Running tests. > prelude.pure: passed > test001.pure: passed > test002.pure: passed > test003.pure: passed > test004.pure: passed > test005.pure: passed > test006.pure: passed > test007.pure: passed > test008.pure: passed > test009.pure: passed > test010.pure: passed > test011.pure: passed > test012.pure: passed > test013.pure: passed > test014.pure: passed > test015.pure: passed > > real 0m18.424s > user 0m17.970s > sys 0m0.394s > $ > > :-) :-) :-) > > Jiri > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > pure-lang-users mailing list > pur...@li... > https://lists.sourceforge.net/lists/listinfo/pure-lang-users > |
From: Albert G. <Dr....@t-...> - 2008-07-10 20:13:33
|
Libor Spacek wrote: > Excellent! I look forward to this, or even LLVM 2.4, becoming the default Pure setup. LLVM 2.3 *is* the current official release, so that's what I'm going to target to make things easier, especially for the package maintainers. But Roostan's patches are for LLVM trunk anyway, so you should be able to use that if you prefer. Note that for 64 bit we still need Cyrille Berger's patch (fortunately this has been updated for LLVM 2.3 already, but I don't think it's in LLVM svn yet). I'm going to commit the necessary changes tomorrow. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-07-11 02:37:14
|
Jiri Spitz wrote: >> And first of all - the test #15 runs 12 s instead of original 55 s (both >> under MinGW, of course). Yep, the 'let' statement at the beginning of your test module compiles in 7.6 secs now, versus 96.4 secs before. (That includes the startup time of the interpreter and loading of the prelude, which takes about half a second on my AMD32.) But it's still too slow. 7 secs to initialize a constant list of just 1000 elements? That's ridiculous, Q does that in a heartbeat. And almost all that time is still spent in the JIT. What is it doing there? So I'll still have to optimize for that case. Nevertheless, it does seem that the JIT has improved a lot, so requiring LLVM 2.3 seems sensible. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Libor S. <li...@gm...> - 2008-07-11 11:00:18
|
Pure 434 + LLVM 2.3 run happily here, thanks! L. |
From: Albert G. <Dr....@t-...> - 2008-07-13 11:29:13
|
Albert Graef wrote: > Yep, the 'let' statement at the beginning of your test module compiles > in 7.6 secs now, versus 96.4 secs before. (That includes the startup > time of the interpreter and loading of the prelude, which takes about > half a second on my AMD32.) > > But it's still too slow. 7 secs to initialize a constant list of just > 1000 elements? That's ridiculous, Q does that in a heartbeat. And almost > all that time is still spent in the JIT. What is it doing there? > > So I'll still have to optimize for that case. Ok, I've done that now, putting the pure_listl and pure_tuplel runtime routines to good use there. Besides the code to generate the element expressions, a list or tuple expression now needs just three additional runtime calls, with a flat call graph. That speeds up the JIT considerably. I'm down to some 0.8 secs for compiling the 'let' statement at the beginning of the test015 module now, and there doesn't seem to be any way to make the code still more "digestable". This example clearly shows that there are some severe performance bottlenecks in the JIT (even in LLVM 2.3). The JIT doesn't scale well with code size at all. For the example at hand, on my system assigning a 1000 element list to a variable needs 0.82s from which 0.01s are spent in IR code generation including all optimization passes, 0.81s in the JIT(!), and 0.00s (zilch, up to rounding) in actually executing the code. I got these figures using clock(), so they should be pretty accurate. One further avenue of working around LLVM's deficiencies there would be to optimize the case that the expression to be evaluated is a constant (number, string or list/tuple of constants), in which case I could just skip the compilation step and directly convert the compile time expression to a pure_expr* instead. I'll try that tomorrow. Have a nice Sunday, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiri S. <jir...@bl...> - 2008-07-13 20:07:26
|
Albert Graef wrote: > > Ok, I've done that now, putting the pure_listl and pure_tuplel runtime > routines to good use there. Besides the code to generate the element > expressions, a list or tuple expression now needs just three additional > runtime calls, with a flat call graph. That speeds up the JIT > considerably. I'm down to some 0.8 secs for compiling the 'let' > statement at the beginning of the test015 module now, and there doesn't > seem to be any way to make the code still more "digestable". > Hello Albert, The code compiles much faster now. However, your latest changes made the execution memory eager and my favourite test 'set (1..1000000)' caused my PC swap to death. It seems the code isn't tail recursive anymore. Jiri |
From: Albert G. <Dr....@t-...> - 2008-07-13 22:47:52
|
Jiri Spitz wrote: > The code compiles much faster now. However, your latest changes made the > execution memory eager and my favourite test 'set (1..1000000)' caused > my PC swap to death. It seems the code isn't tail recursive anymore. Sorry, I can't test right now, because I just upgraded my system and I'm still in the process of getting up and running again. But it sounds like I introduced a memory leak with the latest change. (If TCO wouldn't work any more, you'd get stack overflows instead.) I will have a look asap. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-08-10 08:40:40
|
Jiri Spitz wrote: > The code compiles much faster now. However, your latest changes made the > execution memory eager and my favourite test 'set (1..1000000)' caused > my PC swap to death. It seems the code isn't tail recursive anymore. Fixed (r459). -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiri S. <jir...@bl...> - 2008-08-10 14:46:05
|
Albert Graef wrote: >> The code compiles much faster now. However, your latest changes made the >> execution memory eager and my favourite test 'set (1..1000000)' caused >> my PC swap to death. It seems the code isn't tail recursive anymore. > > Fixed (r459). > Thanks, but I am still not happy. The memory consumption is OK now, but my 1 M set example runs two times slower than before :-( . Regards, Jiri |
From: Albert G. <Dr....@t-...> - 2008-08-10 21:38:46
|
Jiri Spitz wrote: > Thanks, but I am still not happy. The memory consumption is OK now, but > my 1 M set example runs two times slower than before :-( . Right, the new code is faster for JIT compilation, but slower on execution for small list values. I worked around that now by adding a minimum bound for the size of lists/tuples to which the new list generation code is applied. Please check whether it's ok for you now. Using #set(1..1000000) as a test example, over here r462 still seems to be a tad slower than r436, but that's probably due to some other, unrelated fixes I did to the environment-handling code, which also incur some (small) runtime cost; I'll have another look at that tomorrow. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiri S. <jir...@bl...> - 2008-08-10 22:06:18
|
Albert Graef napsal(a): > Right, the new code is faster for JIT compilation, but slower on > execution for small list values. I worked around that now by adding a > minimum bound for the size of lists/tuples to which the new list > generation code is applied. Please check whether it's ok for you now. > The speed is back as it used to be before the fixes. > Using #set(1..1000000) as a test example, over here r462 still seems to > be a tad slower than r436, but that's probably due to some other, > unrelated fixes I did to the environment-handling code, which also incur > some (small) runtime cost; I'll have another look at that tomorrow. I do not see any measurable slowdown now. Thanks, Jiri |
From: Albert G. <Dr....@t-...> - 2008-08-11 06:35:29
|
Jiri Spitz wrote: > I do not see any measurable slowdown now. Great, then I consider this fixed. :) -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-08-10 22:04:08
|
Albert Graef wrote: > One further avenue of working around LLVM's deficiencies there would be > to optimize the case that the expression to be evaluated is a constant > (number, string or list/tuple of constants), in which case I could just > skip the compilation step and directly convert the compile time > expression to a pure_expr* instead. I'll try that tomorrow. This is now implemented as well. In most cases, constant expressions at the toplevel aren't compiled any more but are directly converted to the runtime expression data structure. That makes assigning a big constant list to a global variable much faster. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-07-10 19:41:32
|
Jiri Spitz wrote: > I tried to compile Pure with llvm 2.3 under MinGW - without any > problems. I only used the replacements from the Rooslan's patch and add > another one. Can you please post your additional change? > And first of all - the test #15 runs 12 s instead of original 55 s (both > under MinGW, of course). That's good news indeed. :) So I guess it's time to switch to LLVM 2.3 now. I can commit the necessary changes tomorrow. Everybody ready to take the plunge? Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Ryan S. <rya...@us...> - 2008-07-10 19:51:32
|
On Jul 10, 2008, at 14:41, Albert Graef wrote: > Jiri Spitz wrote: >> I tried to compile Pure with llvm 2.3 under MinGW - without any >> problems. I only used the replacements from the Rooslan's patch >> and add >> another one. > > Can you please post your additional change? > >> And first of all - the test #15 runs 12 s instead of original 55 s >> (both >> under MinGW, of course). > > That's good news indeed. :) So I guess it's time to switch to LLVM 2.3 > now. I can commit the necessary changes tomorrow. Everybody ready to > take the plunge? MacPorts has llvm 2.2 right now. I could ask its maintainer to update it to 2.3. Will pure still work with llvm 2.2 or will llvm 2.3 be required now? |
From: Albert G. <Dr....@t-...> - 2008-07-10 20:21:53
|
Ryan Schmidt wrote: > MacPorts has llvm 2.2 right now. I could ask its maintainer to update > it to 2.3. That would be nice. > Will pure still work with llvm 2.2 or will llvm 2.3 be required now? I'd prefer the latter, because of all the quirks we see with the LLVM 2.2 JIT. But maybe you should first test with LLVM 2.3 on OSX after I committed the patches, before we decide on that. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiri S. <jir...@bl...> - 2008-07-10 19:58:08
Attachments:
mypure.patch.gz
|
Albert Graef wrote: > > Can you please post your additional change? > My patch against rev. 432 is enclosed. Jiri |
From: Eddie R. <er...@bm...> - 2008-07-10 20:14:39
|
On Thu, 2008-07-10 at 21:41 +0200, Albert Graef wrote: > > That's good news indeed. :) So I guess it's time to switch to LLVM 2.3 > now. I can commit the necessary changes tomorrow. Everybody ready to > take the plunge? NO, I'm never going to use 2.3!!! Just kidding ;=) I already have llvm 2.3 installed but I haven't gotten pure to compile yet. I need Jiri's changes. e.r. |
From: Libor S. <li...@gm...> - 2008-07-13 11:50:34
|
On Sun, 13 Jul 2008 12:29:12 +0100, Albert Graef <Dr....@t-...> wrote: > This example clearly shows that there are some severe performance > bottlenecks in the JIT (even in LLVM 2.3). The JIT doesn't scale well > with code size at all. For the example at hand, on my system assigning a > 1000 element list to a variable needs 0.82s from which 0.01s are spent > in IR code generation including all optimization passes, 0.81s in the > JIT(!), and 0.00s (zilch, up to rounding) in actually executing the > code. I got these figures using clock(), so they should be pretty accurate. Nice work! Incidentally, I find clock() to be a pretty blunt tool with its resolution of 10ms, which is way too long timelapse to measure most executions on modern machines, apart from pretty massive tasks. For example, in real-time image processing, you would process a whole image in no more than five ticks of the clock(). gettimeofday is a lot more accurate but measures the elapsed time. I am not even sure if clock() internally does any rounding, from what I can see it is just a discrete counter running in 10ms units, meaning that CPU time of as much as 9ms can register as 0 (zilch). L. |
From: Albert G. <Dr....@t-...> - 2008-07-13 17:17:29
|
Libor Spacek wrote: > I am not even sure if clock() internally does any rounding, from what I can see > it is just a discrete counter running in 10ms units, meaning that CPU time of > as much as 9ms can register as 0 (zilch). That's just what Linux's default HZ value of 100 will give you. They changed HZ to 1000 for a while, but then changed it back because it was eating too much energy on laptops. The latest kernels have tickless timers (not always enabled by default, so you might have to build your own kernel to get those), which enables you to get any resolution that you want with the POSIX highres timers (up to what the hardware provides, which is typically ~1 microsec on current systems). I'm not sure where gettimeofday gets the microsec ticks on Linux, probably it directly reads some hardware timer, but that's not guaranteed by POSIX. And since it's wallclock time it's useless for measuring performance anyway. The proper solution is to use the POSIX highres timers on systems that have them (recent Linux 2.6.x versions do). Have a look at the clock_gettime manual page, it should be easy to wrap this in Pure, or write a little C module for that purpose. (There's also code in Q's system module which could easily be ported to Pure -- see the nanotime et al routines in modules/clib/system.c in the Q sources.) HTH, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Jiri S. <jir...@bl...> - 2008-07-13 20:13:09
|
Libor Spacek wrote: > Nice work! Incidentally, I find clock() to be a pretty blunt tool with its > resolution of 10ms, which is way too long timelapse to measure most executions > on modern machines, apart from pretty massive tasks. For example, in real-time > image processing, you would process a whole image in no more than five ticks > of the clock(). gettimeofday is a lot more accurate but measures the elapsed time. > I am not even sure if clock() internally does any rounding, from what I can see > it is just a discrete counter running in 10ms units, meaning that CPU time of > as much as 9ms can register as 0 (zilch). > Hi Libor, I am not sure but I think you mentioned somewhere you are using Ubuntu 8.04. If so, then you can try to install the 'RT' (real time) Linux image. It contains timers with higher resolution. Jiri |