On Fri, Dec 4, 2009 at 5:15 PM, Michael Bond <mikebond@cs.utexas.edu> wrote:
Hi Anthony,

It might not be a significant performance difference.  What's your execution methodology (e.g., adaptive vs. replay; how many trials)?  The bloat benchmark in particular can have lots of run-to-run performance variation (kind of looks bi-modal?) even with replay methodology.  So you might try all the other DaCapo benchmarks, and also try bloat with lots of trials and even look at the distribution of run times across trials.

I agree that the two increment-like operations should add similar (and low) overheads.

Cheers,
Mike



The measurements are done through adaptive compilation. Indeed, bloat benchmark have lots of run-to-run performance variation. Anyway, my measurements are average values. That's how I conclude this modulo instruction costs merely 700 ms.

On Fri, Dec 4, 2009 at 5:44 PM, Jose A. Joao <joao@mail.utexas.edu> wrote:
Hi Anthony,

I don't know how JikesRVM implements modulo, but it should be more
expensive than a simple add. In your case, since the operand is a
power of two number, it would be much better to use bitwise AND with
the proper mask:

   idx = (idx + 1) & 63;

Best regards,


Jose

I do agree, Jose! I should have thought to this trick. But still, the optimizing compiler could have done this transformation for me :-)

Yup, you're right.

Interestingly, GCC will optimize this case by doing effectively:

if (positive) {
  do bitwise and
} else {
  do bitwise and + other stuff
}

-Filip

I'm glad to see that my silly question finally ends up with a possible optimisation for optimizing compiler :-)

Thank you very much everybody.
Regards,
--
Anthony Hocquet