Stripmining is a simpler transformation, so may be less vulnerable to lingering SSA bugs.
The main place where we think stripmining might be useful is in moderately small loop bodies to help
reduce the cost of backedge yieldpoints. (To support m-n threading, all JikesRVM compilers inject
conditional yields on loop backedges). So a basic loop like:
static int x = 0;
for (int i=0; i<100000; i++) {
x++;
}
gets compiled into:
for (int i=0; i<100000; i++) {
x++;
BACKEDGE_YIELDPOINT;
}
The BACKEDGE_YIELDPOINT expands to a load/compare/branch sequence. The branch is very infrequnetly taken, so likely to be correctly predicted. So the cost is 2-3 instructions per loop iteration.
With stripmining, one would get something like:
for (int i=0; i<100; i++) {
for (int j=0; j<100; j++) {
x++;
}
BACKEDGE_YIELDPOINT;
}
The inner loop has a small number of iterations and a small body, so the inner yieldpoint can be dropped.
Overall, it's not clear how much of a benefit this would actually be. In some tight loops it would make a difference and might be a better choice than unrolling (since it results in less code space expansion). But often loop unrolling is better than stripmining.
I'd suggest taking a look at Ian's LoopVersioning code and then talking some to Ian. He probably has a sense of what loop optimizations are most likely to be profitable ontop of the loop versioning that he's already implemented.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, not sure what prompted the comment but anyway, we have a number of loop optimizations here. Jisheng Zhao is responsible for most of these. Some of the more recent ones are loop blocking, interchange, reversal, the usual gamut of things to improve parallelization. I want to add these phases to the repository, but I want to fix loop versioning first (leave SSA is a problem). The eventual goal would be to release the parallelization code although our current results show its performance isn't good for a dual CPU SMP x86 system. I believe there'd be significant research interest in a parallelizing open source JVM. We also have a number of adaptive optimizations that tune the parallelization and extensions to interprocedural analysis, these should all make it into the repository. I want to improve their behavior and leave SSA is barring this.
Ian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: YES
user_id=1215435
Originator: YES
Loop versioning has since been done (OPT_LoopVersioning), but has not been enabled by default due to lingering bugs it exposed in SSA (see https://sourceforge.net/tracker/index.php?func=detail&aid=1488798&group_id=128805&atid=712768).
Stripmining is a simpler transformation, so may be less vulnerable to lingering SSA bugs.
The main place where we think stripmining might be useful is in moderately small loop bodies to help
reduce the cost of backedge yieldpoints. (To support m-n threading, all JikesRVM compilers inject
conditional yields on loop backedges). So a basic loop like:
static int x = 0;
for (int i=0; i<100000; i++) {
x++;
}
gets compiled into:
for (int i=0; i<100000; i++) {
x++;
BACKEDGE_YIELDPOINT;
}
The BACKEDGE_YIELDPOINT expands to a load/compare/branch sequence. The branch is very infrequnetly taken, so likely to be correctly predicted. So the cost is 2-3 instructions per loop iteration.
Loop unrolling can transform the loop into:
for (int i=0; i<25000; i++) {
x++;
x++;
x++;
x++;
BACKEDGE_YIELDPOINT;
}
With stripmining, one would get something like:
for (int i=0; i<100; i++) {
for (int j=0; j<100; j++) {
x++;
}
BACKEDGE_YIELDPOINT;
}
The inner loop has a small number of iterations and a small body, so the inner yieldpoint can be dropped.
Overall, it's not clear how much of a benefit this would actually be. In some tight loops it would make a difference and might be a better choice than unrolling (since it results in less code space expansion). But often loop unrolling is better than stripmining.
I'd suggest taking a look at Ian's LoopVersioning code and then talking some to Ian. He probably has a sense of what loop optimizations are most likely to be profitable ontop of the loop versioning that he's already implemented.
Logged In: YES
user_id=308843
Originator: NO
Hi, not sure what prompted the comment but anyway, we have a number of loop optimizations here. Jisheng Zhao is responsible for most of these. Some of the more recent ones are loop blocking, interchange, reversal, the usual gamut of things to improve parallelization. I want to add these phases to the repository, but I want to fix loop versioning first (leave SSA is a problem). The eventual goal would be to release the parallelization code although our current results show its performance isn't good for a dual CPU SMP x86 system. I believe there'd be significant research interest in a parallelizing open source JVM. We also have a number of adaptive optimizations that tune the parallelization and extensions to interprocedural analysis, these should all make it into the repository. I want to improve their behavior and leave SSA is barring this.
Ian
Logged In: YES
user_id=1215435
Originator: YES
Moved to JIRA: http://jira.codehaus.org/browse/RVM-567