Re: JIT optimization framework (was: Re: [Sablevm-user] bug report...)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> > Unfortunately I had to stop working on this project for two months
> > due to a sports injury but I've recovered enough to carry on working 
> > now. 
> 
> That's unfortunate. I'm sorry to hear that.

I'm sure you can appreciate (as a programmer) what it's like to not be
able to type properly for 2 months! But any sports injury is ultimately
self-inflicted, I'm very glad I'm not a professional sportsman (eg rugby,
soccer etc).

> > If you're interested in the code or just discussing jits
or > > whatever, just let me know. 
> 
> Both. :)
> 
> I've downloaded your 0.07 source and I'll have a look next week when I 
> have time.

Ok. I'll prepare some quick docs to help you build it, I've been swapping
build systems lately and things are not yet finalized in that area.

Also make sure you download the jit (it's a separate module called
cavalry).

> As for my interest in JIT technology, I've been inspired by the
> OpenJIT project (www.openjit.org) to implement a JIT framework where
> the JIT, code generator, and optimizer are all first-class Java
> objects. Normally I am not taken to duplicating the work of others,
> but this is cool technology that really should be available under the 
> GPL.

I managed to get through to the page today, it does look very interesting.
I'm a bit sceptical about making the JIT compatible with a range of VMs
(like the classic VM), because there are so many VM-specific optimisations
to be made. But I love the idea of modularisation and flexibility. 

> I have not looked at the OpenJIT source, so this will be a 
> "clean-room" effort in the respect. I'm investigating if that status
> also applies to work with Java in general.

I'm not sure I understand the last line?

> I note that you generate assembly text and then compile it. 

Let me quickly explain my strategy. When I first started this project (the
translator), I was just playing around with ideas, it was essentially a
prototype which would help me get a feel for what I was doing and tell me
whether I wanted to take this further. 

Originally I generated machine code directly for each bytecode (literally
putting the binary bytes into a stream). The reason I took such an extreme
approach is that I wanted to have an intial translator which could
transform the bytecodes into machine code as quickly as possible. 

I didn't want the overhead of any other framework, compiler, assembler
etc. (If you haven't already, I urge you to read about IBM's jalapeno JVM,
for the PPC
http://www-4.ibm.com/software/developer/library/jalapeno/index.html
).

So after I had played with that a bit I decided I liked it and would try
some more stuff. I wrote an IA32 assembler which then allowed me to compile a
far greater amount of bytecodes. At the moment I'm busy finishing this
round of prototyping. 

For me this was very much a way of getting acquainted with the issues
involved (especially wrt to assembly and the interface with the jvm). I
initially wanted to do a little bit of optimisation (such as register
allocation), but was so depressed with the IA32's pitiful registers and
different instructions for addressing the different sets of registers (MMX
etc) that I decided to leave that for later.

>I was
> thinking of doing something more along the lines of the following:
> 
>   - Partition bytecodes into basic blocks while building a control
>     flow graph.
> 
>   - Optionally run the standard Java code safety verification process, 
>     if this has been requested, e.g. "jvm -verify ..."

This definitely has to be present in a final version of any JVM. What is
also needed is to gather information for doing precise-GC. I don't do this
yet, but Ettiene has (or is planning to have) it in Sable.

>   - Convert bytecodes into register transfer lists, mapping values on
>     the operand stack to symbolic registers.

And it would be wonderful if this could be easily modified for different
architectures. 

>   - Hand off the RTL to "plug-in" optimization modules. What
>     optimizations are actually performed depend very much on how 
>     quickly I can get them to operate. Some classic optimizations are
>     very time-intensive. Initially I was thinking:
> 
>       - Constant folding, constant propagation, copy propagation, and 
>         value numbering.

What's interesting is that I haven't look much at the code that the java
compilers produce. From many casual glances at bytecode however, I think
there are quite a few simple optimisations which aren't performed.

>     - Code straightening and jump threading. 

What is this? BTW I don't regard myself as a "compiler person", I'm
generally interested in systems software and this area was one of many I
decided to play with while deciding where to put most of my research
effort. But the way it's going it looks like this will dominate my time.

>       - Common subexpression elimination (maybe).
> 
>       - Dead code elimination (maybe).

I'm don't have enough experience to know how costly these are versus the
others. It would be really great to get an idea of those costs and mix it
with profiling data to produce a very flexible solution.

>       - Replacement of certain bytecode patterns with precooked 
>         assembly templates, e.g. for x86:
> 
>             aload $value; invokevirtual "ntohl" ->
>                 asm ("bswap %0" : "=r" (value) : "0" (value))
> 
>             aload $value; invokevirtual "ntohs" ->
>                 asm ("xchgb %b0,%h0" : "=q" (value) :  "0" (value))
> 
>         or for Alpha:
> 
>             aload $value; invokevirtual "ntohs" ->
>                 asm ("insbl %2,1,%1; extbl %2,1,%0; or $2,$1,$2"
>                      : "=r"(t1), "=&r"(t2) : "r"(x))

Yes, the whole issue of having the compiler substitute special code for
marked methods is also something I'm very keen on. Specifically I want to
avoid the overhead of native method calls for doing things like file IO
etc. 

>   - Finally, hand over the munged RTL to an emitter which emits native 
>     code directly into space allocated on the Java heap. The emitter 
>     would assign real registers from symbolic ones using one of the
>     graph coloring algorithms. If that kind of thing is too expensive, 
>     there are alternatives.

Yes, again there is a need for an initial translator which is very fast,
but which will be superseded by a more advanced version at a later stage.

Remember that objects in the java heap move around. (I suppose you could
pin these down).

>   - For Alpha, a final peephole optimization pass is required because
>     instruction scheduling is very important for that platform.
> 
> All of that may be a bit much for a JIT, but whether or not that is
> true is what I want to find out. :)

Well you see that's exactly the issue that hasn't quite been addressed.
When I refer to a JIT I usually mean a
"everything-gets-compiled-before-excution" type system. That's easy to
implement but totally loses out on all the information available to the
compiler at run time. There's no way you're going to compete with a static
C/C++ compiler on CPU intensive stuff unless you spend a good deal of time
optimising those hot spots. Jalapeno is similar to openJIT in that it has
a few compilers to choose from. But I like the idea of really being able
to fine-tune the process.

It's so sad that efforts like IBM & Sun's compilers have to be closed
source.

> I would also want to add a module for converting variable references
> to immediate constants if the values are known at runtime. Perhaps
> there will also be an API for specifying such values. Code like this:
> 
>    if (case_a_is_true) {
>        /* a long block of code follows */
>    } else {
>        /* more code */
>    }
> 
> could be processed into:
> 
>    /* case_a_is_true is constant and always false */
>    /* more code */

Yip, and if you're willing to bend the rules a bit you can also cache
field accesses. I haven't looked much at what optimisation is possible
from an OO point of view, but there is definitely a great potential for
speedup in this area.

> Implementing all of the optimization framework in Java means
> everything is very dynamic and -- best of all -- can be self-
> bootstrapped into native code for maximum performance.

Bootstrapping is a thing of beauty and joy forever. 

John Leuner