Re: JIT optimization framework (was: Re: [Sablevm-user] bug report...)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

John Leuner wrote:
> Also make sure you download the jit (it's a separate module called
> cavalry).

Thanks... Didn't have that yet.

> I'm a bit sceptical about making the JIT compatible with a range of
> VMs (like the classic VM), because there are so many VM-specific
> optimisations to be made. But I love the idea of modularisation and
> flexibility.

I think the essential idea is a framework where one can write a "plug 
in" transformation that can be run on a number of VMs, if it is 
something that can be performed at a high enough level. It is true that
those kinds of things alone are not enough to produce the best 
attainable code for a particular environment, but that is not a problem
as a particular VM's JIT (hopefully) would know a few tricks about 
optimizing for the local environment.

>> I'm investigating if that status also applies to work with Java in
>> general.
>
> I'm not sure I understand the last line?

My employer (NAI Labs) cannot claim "clean room" status and I need to
determine if there is a legal distinction between my role as an
employee of the company and my personal projects.

> I didn't want the overhead of any other framework, compiler, assembler
> etc. (If you haven't already, I urge you to read about IBM's jalapeno
> JVM, for the PPC
> http://www-4.ibm.com/software/developer/library/jalapeno/index.html
> ).

Thanks.

> I initially wanted to do a little bit of optimisation (such as register
> allocation), but was so depressed with the IA32's pitiful registers and
> different instructions for addressing the different sets of registers (MMX
> etc) that I decided to leave that for later.

Yes, that is why it is much more fun to work with the Alphas, even
though producing code for them is actually harder (w.r.t. instruction
scheduling).

> >   - Convert bytecodes into register transfer lists, mapping values on
> >     the operand stack to symbolic registers.
> 
> And it would be wonderful if this could be easily modified for different
> architectures.

RTL form itself is agnostic about machine targets, because all
registers are symbolic and all are assumed to be capable of holding
the largest (or smallest) integer and real values expressible in 
code. This leaves a lot of work for a register allocator when it
comes time to generate native code, though.

However, practical implementations still need to consider machine-
dependent issues during all phases of compilation -- CSE and constant
propagation, for example, need reasonable limits on the number of
temporaries generated. What is reasonable of course depends on the
target.

An OpenJIT-style optimization framework, then, needs to provide 
target specific information in a generic manner -- e.g.

	if (number_of_temporaries >= platform.max_available_regs())
		/* prune temps */

> What's interesting is that I haven't look much at the code that the java
> compilers produce. From many casual glances at bytecode however, I think
> there are quite a few simple optimisations which aren't performed.

I think most JITs are invoked indiscriminantly and are therefore
limited by the need for speed to variations on simple bytecode to 
native code mapping. Etienne would like to see a VM invoke a JIT for
a particular method only if there is a clear benefit. I think that's
the only way to go if the JIT will be spending time aggressively 
optimizing the result.

> >     - Code straightening and jump threading.
> 
> What is this?

Code straightening and jump threading both have to do with improving
how branches are taken through the code. Code straightening takes
the code over what appears to be the most likely path of execution
through a procedure and arranges the corresponding basic blocks
sequentially -- so the most likely case is the fall through case.
This means that it is more likely than not that prefetched 
instructions can be executed rather than discarded. Jump threading
looks at branch targets. For example, if the target of an 
unconditional branch is another unconditional branch, then the
natural thing to do is make the first branch point to the second
branch's target. As another example, branches which point to the
very next instruction (commonly a result of other optimizations) can
be simply removed. Cases involving conditional branches are more
interesting.

> Yes, the whole issue of having the compiler substitute special code for
> marked methods is also something I'm very keen on. Specifically I want to
> avoid the overhead of native method calls for doing things like file IO
> etc.

You should look at the work the Jaguar project has done in this area:
http://www.cs.berkeley.edu/~mdw/proj/jaguar/

> Yes, again there is a need for an initial translator which is very fast,
> but which will be superseded by a more advanced version at a later stage.

Yes, or a VM with a fast interpreter which invokes the JIT 
selectively.

> Remember that objects in the java heap move around. (I suppose you could
> pin these down).

Hmm. Marking code object storage involatile is one option. Creatively
fixing up what code can't be made position independent as required
is another. I haven't thought too much about this yet, but I certainly
will be. :)

> It's so sad that efforts like IBM & Sun's compilers have to be closed
> source.

I agree. Or in Sun's case, what source is released comes with a
"problematic" license.

> Yip, and if you're willing to bend the rules a bit you can also cache
> field accesses. I haven't looked much at what optimisation is possible
> from an OO point of view, but there is definitely a great potential for
> speedup in this area.

Etienne worked in the past on a Java optimizer that did various OO
optimizations -- e.g. inlining, transforming virtual calls to static
ones, etc. (http://www.sable.mcgill.ca/soot/) I'm looking at this as
a starting point.

Andrew Purtell
from home -- apu...@ac...