On Feb 25, 2013, at 10:23 AM, Eliot Moss <moss@cs.umass.edu> wrote:

On 2/24/2013 12:32 PM, Michael Bond wrote:
Hi Erik,

Right, 64-bit volatile accesses need to be all-at-once instead of
divided into two 32-bit accesses. (This requirement is in addition to
the requirement for all volatiles that a write of a volatile to a read
of a volatile induces a happens-before relationship.) However, my
understanding of IA-32 is that it doesn't support 64-bit accesses --
otherwise the compilers would use 64-bit accesses instead of 32-bit
accesses, right? Assuming that's right, it means that to support 64-bit
volatile accesses correctly, the IA-32 compilers would need to generate
code to do some kind of lock-free approach: either create a small
critical section based on spin locking, or use indirection so that each
volatile long/double is actually a (32-bit) pointer to a 64-bit value,
so the pointer can be updated atomically.

True; ugh.

Can we not use CMPXCHG8B for double-word volatiles?
That is available on x86-32.

On the other hand, if the machine target is actually x86-64, it seems
like the IA-32 compilers could emit a 64-bit load/store for load/store
of long/double volatiles. I believe a 64-bit access is atomic in the
sense of being all-or-nothing -- as long as it's 64-bit aligned? -- so
the 64-bit load/store doesn't need to be an actual atomic operation like
a CAS.

I would agree; the key thing is not crossing cache lines and especially
not crossing page boundaries.  64-bit alignment guarantees that.

The JSR-133 cookbook [2] also mentions the need of additional barriers
for monitorexit/monitorenter and certain VM internal operations. Do we
need to do something in this area (e.g. on PowerPC)?

I don't speak PowerPC, but I think monitorenter/monitorexit behavior is
correct on IA-32. The compiler treats these like lfence/sfence
operations, respectively, which can simply become no-ops when compiled
to machine code, since IA-32's memory model is TSO (except for special
accesses like non-temporal stores).

Right. One would have to look more closely at PPC. I think the JMM
semantics allow something like acquire-release models, but to that is
probably added ordering w.r.t. volatile accesses.  I think this basically
puts the burden on volatile accesses to do the right thing.

I think Lei Zhao from Purdue recently submitted a patch that has been incorporated to fix monitorenter and montorexit on PPC.
And that patch has been rolled into the latest release.

I'd be willing to spend the necessary time to get at least prototype
builds on IA32 and PPC working correctly.

DaCapo lusearch fails with digest validation errors on the PPC32
platform that I have access to (in all configurations) and I'd like to
exclude errors related to JMM before I start to dig deeper. I've already
checked the output manually and it's definitively incorrect (i.e. wrong
set of results as opposed to different order of lines).

If the lusearch PPC32 validation failures are due to incorrect handling
of 64-bit volatiles, it seems like the same problems would be likely to
show up on IA32?

To see if the PPC32 failure is due to a concurrency bug, the "taskset"
command might be helpful for limiting execution to 1 core, which might
make any concurrency bugs much less likely to manifest.

I wonder if lusearch is non-deterministic ...

Regards -- Eliot

Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
Jikesrvm-core mailing list

Antony Hosking | Associate Professor | Computer Science | Purdue University
305 N. University Street | West Lafayette | IN 47907 | USA
Mobile +1 765 427 5484