Re: [Sablevm-developer] Threading support in SableVM
Brought to you by:
egagnon
From: Chris P. <chr...@ma...> - 2004-02-20 01:11:04
|
David B=E9langer wrote: > On Thu, Feb 19, 2004 at 05:10:54PM -0500, Chris Pickett wrote: >=20 >>The WBINVD instruction is a privileged instruction. When the processor = >>is running in protected mode, the CPL of a program or procedure must be= =20 >>0 to execute this instruction. This instruction is also a serializing=20 >>instruction (see ?Serializing Instructions? in Chapter 8 of the IA-32=20 >>Intel Architecture Software Developer?s Manual, Volume 3). >=20 >=20 >>I'm not sure if this is a problem, but if not, maybe all that's require= d=20 >>is WBINVD in the C&S for i386? >> >=20 >=20 > Yes, it means it can be executed only in kernel mode that is the > code must be located either in the kernel or a kernel module. >=20 > You need to find the instructions for user mode... >=20 I made a mistake. Although the CMPXCHG instruction doesn't specify any=20 cache effets on its own, it can be preceded by a LOCK operation. And=20 yes, the SableVM code has this LOCK: __asm__ __volatile__ ("lock\n\t" "cmpxchgl %3, %1\n\t" "sete %0" :"=3Dq" (result), "=3Dm" (*pword), "=3Da"(current_= value) :"r" (new_value), "m" (*pword), "a" (old_value) :"memory"); From the IA-32 System Programming Guide: 7.1.4. Effects of a LOCK Operation on Internal Processor Caches For the Intel486 and Pentium processors, the LOCK# signal is always=20 asserted on the bus during a LOCK operation, even if the area of memory=20 being locked is cached in the processor. For the Pentium 4, Intel Xeon, and P6 family processors, if the area of=20 memory being locked during a LOCK operation is cached in the processor=20 that is performing the LOCK operation as write-back memory and is=20 completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location=20 internally and allow it=92s cache coherency mechanism to insure that the = operation is carried out atomically. This operation is called =93cache=20 locking.=94 The cache coherency mechanism automatically prevents two or=20 more processors that have cached the same area of memory from=20 simultaneously modifying data in that area. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D The manual then explains how snooping is used to maintain cache=20 coherency. The only relevant user mode instruction is CLFLUSH, which=20 flushes a single cache line, but this is intended as an optimization=20 only. So, AFAICT, the existing C&S is fine for i686, unless for some=20 reason either: a) the processor is set to the startup CD=3D1, NW=3D1 mode (see Table 10-= 5=20 in the programming guide for more info), which does not maintain cache=20 coherency, or b) there is some other device on the system bus that does not perform=20 cache snooping to maintain coherency. =2E.. but then ... I think other multithreaded applications would crash. So that means it looks like the problem is elsewhere (e.g. writes to=20 "xxx.flag" that Etienne mentioned). Still, it's good to have made sure. = I'll investigate putting assembly locks around the unsynchronized piece= s. Cheers, Chris P.S. If any of you don't want the CC's on this thread anymore, let me=20 know; the SF server has been a bit unreliable semi-lately. |