Re: [Sablevm-developer] Threading support in SableVM

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Archie Cobbs wrote:
> Chris Pickett wrote:
>=20
>>I started looking at the POSIX 1003.1c (pthreads) spec (it's
>>available online) and also at the comp.programming.threads FAQ (1 Mb
>>html file kills Mozilla on my machine, better to download) ... and
>>discovered a few interesting things:
>>
>>1) The only way to ensure cache coherency in a portable manner is to us=
e
>>the pthreads synchronization functions (e.g. lock and unlock).  So I
>>think that means there is no need for us to consider the Linux kernel
>>cache flush architecture, nor any processor-specific cache flush
>>instructions.
>=20
>=20
> On a related note: Java semantics imply a read barrier at MONITORENTER
> and a write barrier with MONITOREXIT. With fat locks, you get this
> automatically because they are implemented using pthread mutexes.
> But with thin locks where there is no contention, technicallly SableVM
> is at fault because it doesn't explicitly impose the read/write barrier=
s
> (does it?).

That's what I think :(

Etienne wrote about it here:

http://lists.debian.org/debian-ia64/2003/debian-ia64-200302/msg00035.html=

(first hit if you google for "thin locks smp"!)

and the description of locks in SableVM is here:

http://www.usenix.org/publications/library/proceedings/jvm01/gagnon/gagno=
n_html/node14.html

After reading the comp.programming.threads FAQ stuff (just search the=20
document for "cache"), they say that although workable hacks exist, if=20
you want any portability or guarantees you need to use POSIX only, and=20
you should only use the hacks if you know /exactly/ what you're doing.=20
But at the same time, it sounds like strictly-POSIX thin locks don't=20
exist ... so it might be easier to try and introduce a cache flush=20
instruction or system cache flush call in places.

There's two solutions I can see:

1) Make the current thin locks optional OR
2) Introduce explicit cache flushing where necessary

Personally, I would be happy enough with (1), since my speculative=20
multithreading work only needs to show relative speedup (and indeed, the =

faster an "unmodified" SableVM is, the less that relative speedup will=20
be ...), but I'm actually just eager to take the path of least resistance=
 :)

> On i386 it works out anyway because I think the compare-and-swap
> sequence enforces a memory barrier. But in general that's not true.

Well, SableVM doesn't work on an Athlon MP 2000+, which is i686.  But=20
I'm not sure if it's because of a broken C&S or not.  If it IS because=20
of a broken C&S, that's a good thing; however, if the C&S is /already/=20
imposing an MB, then that's bad because it means the problem is=20
elsewhere.  I think.

(more reading ensues)

I looked up the IA-32 instruction set reference (split in 2 parts):

http://developer.intel.com/design/pentium4/manuals/253666.htm
http://developer.intel.com/design/pentium4/manuals/253667.htm

CMPXCHG doesn't mention flushing the processor's cache.
INVD ignores cache contents and invalidates the cache.
WBINVD writes back cache contents and invalidates the cache, and signals =

other processors to do the same.

However, the documentation says:

The WBINVD instruction is a privileged instruction. When the processor=20
is running in protected mode, the CPL of a program or procedure must be=20
0 to execute this instruction. This instruction is also a serializing=20
instruction (see =93Serializing Instructions=94 in Chapter 8 of the IA-32=
=20
Intel Architecture Software Developer=92s Manual, Volume 3).

I'm not sure if this is a problem, but if not, maybe all that's required =

is WBINVD in the C&S for i386?

It would also be nice if we didn't have to call WBINVD on a uniprocessor =
=2E..

> I could be wrong about all this but this what memory recalls.

Whether or not you are, thanks for discussing it, it's always helpful.

Cheers,
Chris