Re: [Sablevm-developer] Threading support in SableVM

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

David B=E9langer wrote:

> On Thu, Feb 19, 2004 at 05:10:54PM -0500, Chris Pickett wrote:
>=20
>>The WBINVD instruction is a privileged instruction. When the processor =

>>is running in protected mode, the CPL of a program or procedure must be=
=20
>>0 to execute this instruction. This instruction is also a serializing=20
>>instruction (see ?Serializing Instructions? in Chapter 8 of the IA-32=20
>>Intel Architecture Software Developer?s Manual, Volume 3).
>=20
>=20
>>I'm not sure if this is a problem, but if not, maybe all that's require=
d=20
>>is WBINVD in the C&S for i386?
>>
>=20
>=20
> Yes, it means it can be executed only in kernel mode that is the
> code must be located either in the kernel or a kernel module.
>=20
> You need to find the instructions for user mode...
>=20

I made a mistake.  Although the CMPXCHG instruction doesn't specify any=20
cache effets on its own, it can be preceded by a LOCK operation.  And=20
yes, the SableVM code has this LOCK:

__asm__ __volatile__ ("lock\n\t"
                       "cmpxchgl %3, %1\n\t"
                       "sete %0"
                       :"=3Dq" (result), "=3Dm" (*pword), "=3Da"(current_=
value)
                       :"r" (new_value), "m" (*pword), "a" (old_value)
                       :"memory");

 From the IA-32 System Programming Guide:

7.1.4. Effects of a LOCK Operation on Internal Processor Caches

For the Intel486 and Pentium processors, the LOCK# signal is always=20
asserted on the bus during a LOCK operation, even if the area of memory=20
being locked is cached in the processor.

For the Pentium 4, Intel Xeon, and P6 family processors, if the area of=20
memory being locked during a LOCK operation is cached in the processor=20
that is performing the LOCK operation as write-back memory and is=20
completely contained in a cache line, the processor may not assert the
LOCK# signal on the bus. Instead, it will modify the memory location=20
internally and allow it=92s cache coherency mechanism to insure that the =

operation is carried out atomically. This operation is called =93cache=20
locking.=94 The cache coherency mechanism automatically prevents two or=20
more processors that have cached the same area of memory from=20
simultaneously modifying data in that area.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D

The manual then explains how snooping is used to maintain cache=20
coherency.  The only relevant user mode instruction is CLFLUSH, which=20
flushes a single cache line, but this is intended as an optimization=20
only.  So, AFAICT, the existing C&S is fine for i686, unless for some=20
reason either:

a) the processor is set to the startup CD=3D1, NW=3D1 mode (see Table 10-=
5=20
in the programming guide for more info), which does not maintain cache=20
coherency, or

b) there is some other device on the system bus that does not perform=20
cache snooping to maintain coherency.

=2E.. but then ... I think other multithreaded applications would crash.

So that means it looks like the problem is elsewhere (e.g. writes to=20
"xxx.flag" that Etienne mentioned).  Still, it's good to have made sure. =

  I'll investigate putting assembly locks around the unsynchronized piece=
s.

Cheers,
Chris

P.S.  If any of you don't want the CC's on this thread anymore, let me=20
know;  the SF server has been a bit unreliable semi-lately.