Re: JIT optimization framework (was: Re: [Sablevm-user] bug report...)
Brought to you by:
egagnon
From: John L. <je...@pi...> - 2000-07-30 01:58:41
|
> > Unfortunately I had to stop working on this project for two months > > due to a sports injury but I've recovered enough to carry on working > > now. > > That's unfortunate. I'm sorry to hear that. I'm sure you can appreciate (as a programmer) what it's like to not be able to type properly for 2 months! But any sports injury is ultimately self-inflicted, I'm very glad I'm not a professional sportsman (eg rugby, soccer etc). > > If you're interested in the code or just discussing jits or > > whatever, just let me know. > > Both. :) > > I've downloaded your 0.07 source and I'll have a look next week when I > have time. Ok. I'll prepare some quick docs to help you build it, I've been swapping build systems lately and things are not yet finalized in that area. Also make sure you download the jit (it's a separate module called cavalry). > As for my interest in JIT technology, I've been inspired by the > OpenJIT project (www.openjit.org) to implement a JIT framework where > the JIT, code generator, and optimizer are all first-class Java > objects. Normally I am not taken to duplicating the work of others, > but this is cool technology that really should be available under the > GPL. I managed to get through to the page today, it does look very interesting. I'm a bit sceptical about making the JIT compatible with a range of VMs (like the classic VM), because there are so many VM-specific optimisations to be made. But I love the idea of modularisation and flexibility. > I have not looked at the OpenJIT source, so this will be a > "clean-room" effort in the respect. I'm investigating if that status > also applies to work with Java in general. I'm not sure I understand the last line? > I note that you generate assembly text and then compile it. Let me quickly explain my strategy. When I first started this project (the translator), I was just playing around with ideas, it was essentially a prototype which would help me get a feel for what I was doing and tell me whether I wanted to take this further. Originally I generated machine code directly for each bytecode (literally putting the binary bytes into a stream). The reason I took such an extreme approach is that I wanted to have an intial translator which could transform the bytecodes into machine code as quickly as possible. I didn't want the overhead of any other framework, compiler, assembler etc. (If you haven't already, I urge you to read about IBM's jalapeno JVM, for the PPC http://www-4.ibm.com/software/developer/library/jalapeno/index.html ). So after I had played with that a bit I decided I liked it and would try some more stuff. I wrote an IA32 assembler which then allowed me to compile a far greater amount of bytecodes. At the moment I'm busy finishing this round of prototyping. For me this was very much a way of getting acquainted with the issues involved (especially wrt to assembly and the interface with the jvm). I initially wanted to do a little bit of optimisation (such as register allocation), but was so depressed with the IA32's pitiful registers and different instructions for addressing the different sets of registers (MMX etc) that I decided to leave that for later. >I was > thinking of doing something more along the lines of the following: > > - Partition bytecodes into basic blocks while building a control > flow graph. > > - Optionally run the standard Java code safety verification process, > if this has been requested, e.g. "jvm -verify ..." This definitely has to be present in a final version of any JVM. What is also needed is to gather information for doing precise-GC. I don't do this yet, but Ettiene has (or is planning to have) it in Sable. > - Convert bytecodes into register transfer lists, mapping values on > the operand stack to symbolic registers. And it would be wonderful if this could be easily modified for different architectures. > - Hand off the RTL to "plug-in" optimization modules. What > optimizations are actually performed depend very much on how > quickly I can get them to operate. Some classic optimizations are > very time-intensive. Initially I was thinking: > > - Constant folding, constant propagation, copy propagation, and > value numbering. What's interesting is that I haven't look much at the code that the java compilers produce. From many casual glances at bytecode however, I think there are quite a few simple optimisations which aren't performed. > - Code straightening and jump threading. What is this? BTW I don't regard myself as a "compiler person", I'm generally interested in systems software and this area was one of many I decided to play with while deciding where to put most of my research effort. But the way it's going it looks like this will dominate my time. > - Common subexpression elimination (maybe). > > - Dead code elimination (maybe). I'm don't have enough experience to know how costly these are versus the others. It would be really great to get an idea of those costs and mix it with profiling data to produce a very flexible solution. > - Replacement of certain bytecode patterns with precooked > assembly templates, e.g. for x86: > > aload $value; invokevirtual "ntohl" -> > asm ("bswap %0" : "=r" (value) : "0" (value)) > > aload $value; invokevirtual "ntohs" -> > asm ("xchgb %b0,%h0" : "=q" (value) : "0" (value)) > > or for Alpha: > > aload $value; invokevirtual "ntohs" -> > asm ("insbl %2,1,%1; extbl %2,1,%0; or $2,$1,$2" > : "=r"(t1), "=&r"(t2) : "r"(x)) Yes, the whole issue of having the compiler substitute special code for marked methods is also something I'm very keen on. Specifically I want to avoid the overhead of native method calls for doing things like file IO etc. > - Finally, hand over the munged RTL to an emitter which emits native > code directly into space allocated on the Java heap. The emitter > would assign real registers from symbolic ones using one of the > graph coloring algorithms. If that kind of thing is too expensive, > there are alternatives. Yes, again there is a need for an initial translator which is very fast, but which will be superseded by a more advanced version at a later stage. Remember that objects in the java heap move around. (I suppose you could pin these down). > - For Alpha, a final peephole optimization pass is required because > instruction scheduling is very important for that platform. > > All of that may be a bit much for a JIT, but whether or not that is > true is what I want to find out. :) Well you see that's exactly the issue that hasn't quite been addressed. When I refer to a JIT I usually mean a "everything-gets-compiled-before-excution" type system. That's easy to implement but totally loses out on all the information available to the compiler at run time. There's no way you're going to compete with a static C/C++ compiler on CPU intensive stuff unless you spend a good deal of time optimising those hot spots. Jalapeno is similar to openJIT in that it has a few compilers to choose from. But I like the idea of really being able to fine-tune the process. It's so sad that efforts like IBM & Sun's compilers have to be closed source. > I would also want to add a module for converting variable references > to immediate constants if the values are known at runtime. Perhaps > there will also be an API for specifying such values. Code like this: > > if (case_a_is_true) { > /* a long block of code follows */ > } else { > /* more code */ > } > > could be processed into: > > /* case_a_is_true is constant and always false */ > /* more code */ Yip, and if you're willing to bend the rules a bit you can also cache field accesses. I haven't looked much at what optimisation is possible from an OO point of view, but there is definitely a great potential for speedup in this area. > Implementing all of the optimization framework in Java means > everything is very dynamic and -- best of all -- can be self- > bootstrapped into native code for maximum performance. Bootstrapping is a thing of beauty and joy forever. John Leuner |