Thread: [Sablevm-developer] Threading support in SableVM
Brought to you by:
egagnon
From: Grzegorz B. P. <ga...@de...> - 2004-02-10 05:16:02
|
Hi Chris, hi everybody, I've just re-read the reports about JGF benchmarsk and I am really worried by how it looks. Especially that we didn't know previously that there were problems. I have problems on SMP machines but that's not a _regression_. But apparently we have some _regressions_, especially in threading. Chris, have you looked at these regressions again? Are they present in current staging version too? Do you have some traces about what exactly is happenning? Could you maybe try to debug sablevm behavior? I think that from active developers you might be the person most interested in multithreading (and simultanous execution in general), so this naturally makes you our 'Multithreading Expert'. This way or another we need to track this issue and fix this. Not being able to run benchmarks is *bad* thing in research (and this has impact on practical usefulness too!). Looking at it in the long term - do you think it would be feasible to create our own test cases/min-apps that would stretch and excersise SableVM's threads? I *will* be running regular tests of SableVM soon and I could put them in test set (they should ideally produce some output that I could compare w/ the expected one and register success/failure of such test). But of course the first thing now would be to track this issue and try to fix it. I guess David could help w/ Classpath/SableVM glue code as he has lots of experience there and this is most probably the source of our problems. I don't imagine we can release stable SableVM 1.2.0 w/ (slightly) broken threads. We might not support some features, but the existing ones cannot be broken. Cheers, Grzegorz B. Prokopski |
From: Chris P. <chr...@ma...> - 2004-02-10 05:58:18
|
Grzegorz B. Prokopski wrote: > Chris, have you looked at these regressions again? Are they present > in current staging version too? Yes (I just checked). But I am not sure if it's a regression. I was seeing weird multithreading behaviour with mtrt and my spmt stuff with my changes against 1.0.8 -- and while I do not know for sure, my intuition at the time (after looking at many instruction traces) was that the spmt was slowing down threads long enough for concurrency issues with ordinary threads to appear. Do you have some traces about what > exactly is happenning? Could you maybe try to debug sablevm behavior? I posted a while ago asking for more help about this (see message with gdb log in response to David) It fails in _svmf_enter_object_monitor(). I think Etienne needs to look at it because he wrote the locking stuff ... I would like to participate in debugging it but I don't know where to begin. Should we try, for example, removing thin locks altogether and see if that fixes it? Or try making some critical sections bigger? > I think that from active developers you might be the person most > interested in multithreading (and simultanous execution in general), > so this naturally makes you our 'Multithreading Expert'. Well, I would say Etienne and Clark are the multithreading experts. David likely knows more than me too ... I haven't actually started touching pthreads yet in my code. > This way or another we need to track this issue and fix this. Not being > able to run benchmarks is *bad* thing in research (and this has impact > on practical usefulness too!). Looking at it in the long term - do you > think it would be feasible to create our own test cases/min-apps that > would stretch and excersise SableVM's threads? The JGF benchmarks include many micro benchmarks. They really are a lot more useful than SPEC and JOlden, in my opinion, for finding small-yet-critical bugs and optimizing performance of specific things. > But of course the first thing now would be to track this issue and try > to fix it. I guess David could help w/ Classpath/SableVM glue code as he > has lots of experience there and this is most probably the source of > our problems. Can somebody check this quickly on 1.0.9 or an earlier version? Maybe all of the Montreal developers should get together to discuss 1.1.0, and plans for the future? We haven't had a multiple person meeting (at least one that involved me!) in a long time ... Cheers, Chris |
From: Etienne G. <gag...@uq...> - 2004-02-10 19:55:47
|
Chris Pickett wrote: > It fails in _svmf_enter_object_monitor(). I think Etienne needs to look > at it because he wrote the locking stuff ... I would like to participate > in debugging it but I don't know where to begin. Should we try, for > example, removing thin locks altogether and see if that fixes it? Or > try making some critical sections bigger? Chris, it would be nice of you could provide me with the following in order for me to be able to help you: Write a small Java application (one class, if possible) that exhibits the _svmf_enter_object_monitor() problem with the *trunk* of sablevm (i.e. the "real" 1.0.9 version, not "staging", not a hybrid spmt version). Thanks, Etienne -- Etienne M. Gagnon, Ph.D. http://www.info.uqam.ca/~egagnon/ SableVM: http://www.sablevm.org/ SableCC: http://www.sablecc.org/ |
From: Etienne G. <gag...@uq...> - 2004-02-17 21:17:50
|
Etienne Gagnon wrote: >> It fails in _svmf_enter_object_monitor(). I think Etienne needs to >> look at it because he wrote the locking stuff ... I would like to >> participate in debugging it but I don't know where to begin. Should >> we try, for example, removing thin locks altogether and see if that >> fixes it? Or try making some critical sections bigger? OK. The bug has been fixed in staging. Thanks a lot, Chris, for providing me (privately) with a small test case, and reporting the bug in the first place! :-) Have fun! Etienne -- Etienne M. Gagnon, Ph.D. http://www.info.uqam.ca/~egagnon/ SableVM: http://www.sablevm.org/ SableCC: http://www.sablecc.org/ |
From: Chris P. <chr...@ma...> - 2004-02-18 06:25:23
Attachments:
ThreadStarter.java
|
Etienne Gagnon wrote: > Etienne Gagnon wrote: > >>> It fails in _svmf_enter_object_monitor(). I think Etienne needs to >>> look at it because he wrote the locking stuff ... I would like to >>> participate in debugging it but I don't know where to begin. Should >>> we try, for example, removing thin locks altogether and see if that >>> fixes it? Or try making some critical sections bigger? > > > OK. The bug has been fixed in staging. Thanks a lot, Chris, for > providing me (privately) with a small test case, and reporting the > bug in the first place! :-) No problem :) Thanks for the fix. In case anyone is wondering, it was because in trying to acquire a lock, we initially read a lockword as having no owner. Then, a thinlock was acquired by another thread, and we had to inflate it, but by now the initial lockword value was stale and we were getting a NULL pointer to pass to pthread_mutex_lock() when using the lockword to look up the owner. The solution was simply to retry locking if the owner is NULL. (Etienne pls. correct any mistakes ...) However, I now have another test case (see attached), although I've only seen this bug manifest itself on a multiprocessor. It might affect uniprocessors in more complex scenarios. I get the following about 5% of the time: Exception in thread "Thread-1" java.lang.NullPointerException at IncrementRunnable.run (ThreadStarter.java:29) at java.lang.Thread.run (Thread.java:670) at java.lang.VMThread.callRun (VMThread.java:116) at java.lang.Thread.callRun (Thread.java:343) at java.lang.VirtualMachine.runThread (VirtualMachine.java:117) I have an instruction trace (it took 26 tries to get it!). All that I can see is that the NullPointerException appears to be thrown right after the main() method has exited ... anyway, I'll do some more digging. Cheers, Chris ============ relevant instruction trace ======================= [verbose instructions: executing @0x41353ec4 ALOAD_0] [verbose instructions: executing @0x41353ec8 DUP] [verbose instructions: executing @0x41353ecc GETFIELD_INT] [verbose instructions: executing @0x41353ed8 ICONST_1] [verbose instructions: executing @0x41353edc IADD] [verbose instructions: executing @0x41353ee0 PUTFIELD_INT] [verbose instructions: executing @0x41353eec IINC] [verbose instructions: executing @0x41353ef8 ILOAD_1] [verbose instructions: executing @0x41353efc LDC_INTEGER] [verbose instructions: executing @0x41353f04 IF_ICMPLT_CHECK] [verbose instructions: executing @0x41353f10 RETURN] [verbose methods: exiting method IncrementRunnable.run()V] [ returning to ThreadStarter.main([Ljava/lang/String;)V] [verbose instructions: executing @0x41353844 REPLACE] [verbose instructions: executing @0x41353850 GOTO] [verbose instructions: executing @0x41353734 RETURN] [verbose methods: exiting method ThreadStarter.main([Ljava/lang/String;)V] [ returning to java/lang/VirtualMachine.invokeMain(Ljava/lang/Class;[Ljava/lang/String;)V] [verbose instructions: executing @0x8050eb0 INTERNAL_CALL_END] [verbose methods: exiting method java/lang/VirtualMachine.invokeMain(Ljava/lang/Class;[Ljava/lang/String;)V] [ returning to java/lang/VirtualMachine.main([Ljava/lang/String;)V] [verbose instructions: executing @0x412efff4 REPLACE] [verbose instructions: executing @0x412f0000 GOTO] [verbose instructions: executing @0x412eff50 RETURN] [verbose methods: exiting method java/lang/VirtualMachine.main([Ljava/lang/String;)V] [verbose instructions: executing @0x8050eb0 INTERNAL_CALL_END] [verbose methods: entering method java/lang/NullPointerException.<init>(Ljava/lang/String;)V] [verbose instructions: executing @0x41274204 ALOAD_0] [verbose instructions: executing @0x41274208 ALOAD_1] [verbose instructions: executing @0x4127420c INVOKESPECIAL] [verbose methods: entering method java/lang/RuntimeException.<init>(Ljava/lang/String;)V] [verbose instructions: executing @0x41273df4 ALOAD_0] [verbose instructions: executing @0x41273df8 ALOAD_1] [verbose instructions: executing @0x41273dfc INVOKESPECIAL] [verbose methods: entering method java/lang/Exception.<init>(Ljava/lang/String;)V] [verbose instructions: executing @0x41273e38 ALOAD_0] [verbose instructions: executing @0x41273e3c ALOAD_1] [verbose instructions: executing @0x41273e40 INVOKESPECIAL] [verbose methods: entering method java/lang/Throwable.<init>(Ljava/lang/String;)V] [verbose instructions: executing @0x41273e7c ALOAD_0] [verbose instructions: executing @0x41273e80 INVOKESPECIAL] [verbose methods: entering method java/lang/Object.<init>()V] [verbose instructions: executing @0x41251644 RETURN] [verbose methods: exiting method java/lang/Object.<init>()V] [ returning to java/lang/Throwable.<init>(Ljava/lang/String;)V] [verbose instructions: executing @0x41273e94 ALOAD_0] [verbose instructions: executing @0x41273e98 ALOAD_0] [verbose instructions: executing @0x41273e9c PUTFIELD_REFERENCE] =============================================================== |
From: Etienne G. <gag...@uq...> - 2004-02-18 14:56:28
|
Chris Pickett wrote: > (Etienne pls. correct any mistakes ...) Just a clarification: the algorithm was designed to be able to handle situations where the lock owner has changed; the bug was that it didn't handle correctly the case where there was no owner to begin with. > However, I now have another test case (see attached), although I've only > seen this bug manifest itself on a multiprocessor. It might affect > uniprocessors in more complex scenarios. Multi-processor: part of the critical sablevm code for locking does NOT take into account "cache issues". For instance, "xxx.flag" is read and written assuming that any change is visible to other threads without synchronization, which is probably NOT the case on a multiprocessor. So, unless you can actually get your "bug" to manifest itself on a uniprocessor, I wouldn't worry much about it. Now, if you really want to get things running on a multi-processor, you should start investigating cache issues. :-) [WARNING: Not easy. In fact, the Java Memory Model is bronken on multi-processors...] Etienne -- Etienne M. Gagnon, Ph.D. http://www.info.uqam.ca/~egagnon/ SableVM: http://www.sablevm.org/ SableCC: http://www.sablecc.org/ |
From: Chris P. <chr...@ma...> - 2004-02-19 18:13:26
|
Etienne Gagnon wrote: > Chris Pickett wrote: > >> However, I now have another test case (see attached), although I've only >> seen this bug manifest itself on a multiprocessor. It might affect >> uniprocessors in more complex scenarios. > > > Multi-processor: part of the critical sablevm code for locking does NOT > take > into account "cache issues". For instance, "xxx.flag" is read and written > assuming that any change is visible to other threads without > synchronization, > which is probably NOT the case on a multiprocessor. You're right, it's not the case ... see below. > So, unless you can actually get your "bug" to manifest itself on a > uniprocessor, > I wouldn't worry much about it. > > Now, if you really want to get things running on a multi-processor, you > should > start investigating cache issues. :-) [WARNING: Not easy. In fact, the > Java > Memory Model is bronken on multi-processors...] I started looking at the POSIX 1003.1c (pthreads) spec (it's available online) and also at the comp.programming.threads FAQ (1 Mb html file kills Mozilla on my machine, better to download) ... and discovered a few interesting things: 1) The only way to ensure cache coherency in a portable manner is to use the pthreads synchronization functions (e.g. lock and unlock). So I think that means there is no need for us to consider the Linux kernel cache flush architecture, nor any processor-specific cache flush instructions. 2) (a bit on "volatile" lifted verbatim): You do NOT need volatile for threaded programming. You do need it when you share data between "main code" and signal handlers, or when sharing hardware registers with a device. In certain restricted situations, it MIGHT help when sharing unsynchronized data between threads (but don't count on it -- the semantics of volatile" are too fuzzy). If you need volatile to share data, protected by POSIX synchronization objects, between threads, then your implementation is busted. 3) No unsynchronized operation on shared data, even if it takes only one assembly instruction, is truly safe, with the exception of "one-shot-flags", where data changes only in one direction and it only changes once, and the actual changed-to value doesn't matter. Although it doesn't follow exactly the same semantics as a "one-shot-flag", the preparation sequence REPLACE operation is safe by similar logic. Cheers, Chris |
From: Archie C. <ar...@de...> - 2004-02-19 20:42:24
|
Chris Pickett wrote: > I started looking at the POSIX 1003.1c (pthreads) spec (it's > available online) and also at the comp.programming.threads FAQ (1 Mb > html file kills Mozilla on my machine, better to download) ... and > discovered a few interesting things: > > 1) The only way to ensure cache coherency in a portable manner is to use > the pthreads synchronization functions (e.g. lock and unlock). So I > think that means there is no need for us to consider the Linux kernel > cache flush architecture, nor any processor-specific cache flush > instructions. On a related note: Java semantics imply a read barrier at MONITORENTER and a write barrier with MONITOREXIT. With fat locks, you get this automatically because they are implemented using pthread mutexes. But with thin locks where there is no contention, technicallly SableVM is at fault because it doesn't explicitly impose the read/write barriers (does it?). On i386 it works out anyway because I think the compare-and-swap sequence enforces a memory barrier. But in general that's not true. I could be wrong about all this but this what memory recalls. -Archie __________________________________________________________________________ Archie Cobbs * CTO, Awarix * http://www.awarix.com |
From: Chris P. <chr...@ma...> - 2004-02-19 22:15:19
|
Archie Cobbs wrote: > Chris Pickett wrote: >=20 >>I started looking at the POSIX 1003.1c (pthreads) spec (it's >>available online) and also at the comp.programming.threads FAQ (1 Mb >>html file kills Mozilla on my machine, better to download) ... and >>discovered a few interesting things: >> >>1) The only way to ensure cache coherency in a portable manner is to us= e >>the pthreads synchronization functions (e.g. lock and unlock). So I >>think that means there is no need for us to consider the Linux kernel >>cache flush architecture, nor any processor-specific cache flush >>instructions. >=20 >=20 > On a related note: Java semantics imply a read barrier at MONITORENTER > and a write barrier with MONITOREXIT. With fat locks, you get this > automatically because they are implemented using pthread mutexes. > But with thin locks where there is no contention, technicallly SableVM > is at fault because it doesn't explicitly impose the read/write barrier= s > (does it?). That's what I think :( Etienne wrote about it here: http://lists.debian.org/debian-ia64/2003/debian-ia64-200302/msg00035.html= (first hit if you google for "thin locks smp"!) and the description of locks in SableVM is here: http://www.usenix.org/publications/library/proceedings/jvm01/gagnon/gagno= n_html/node14.html After reading the comp.programming.threads FAQ stuff (just search the=20 document for "cache"), they say that although workable hacks exist, if=20 you want any portability or guarantees you need to use POSIX only, and=20 you should only use the hacks if you know /exactly/ what you're doing.=20 But at the same time, it sounds like strictly-POSIX thin locks don't=20 exist ... so it might be easier to try and introduce a cache flush=20 instruction or system cache flush call in places. There's two solutions I can see: 1) Make the current thin locks optional OR 2) Introduce explicit cache flushing where necessary Personally, I would be happy enough with (1), since my speculative=20 multithreading work only needs to show relative speedup (and indeed, the = faster an "unmodified" SableVM is, the less that relative speedup will=20 be ...), but I'm actually just eager to take the path of least resistance= :) > On i386 it works out anyway because I think the compare-and-swap > sequence enforces a memory barrier. But in general that's not true. Well, SableVM doesn't work on an Athlon MP 2000+, which is i686. But=20 I'm not sure if it's because of a broken C&S or not. If it IS because=20 of a broken C&S, that's a good thing; however, if the C&S is /already/=20 imposing an MB, then that's bad because it means the problem is=20 elsewhere. I think. (more reading ensues) I looked up the IA-32 instruction set reference (split in 2 parts): http://developer.intel.com/design/pentium4/manuals/253666.htm http://developer.intel.com/design/pentium4/manuals/253667.htm CMPXCHG doesn't mention flushing the processor's cache. INVD ignores cache contents and invalidates the cache. WBINVD writes back cache contents and invalidates the cache, and signals = other processors to do the same. However, the documentation says: The WBINVD instruction is a privileged instruction. When the processor=20 is running in protected mode, the CPL of a program or procedure must be=20 0 to execute this instruction. This instruction is also a serializing=20 instruction (see =93Serializing Instructions=94 in Chapter 8 of the IA-32= =20 Intel Architecture Software Developer=92s Manual, Volume 3). I'm not sure if this is a problem, but if not, maybe all that's required = is WBINVD in the C&S for i386? It would also be nice if we didn't have to call WBINVD on a uniprocessor = =2E.. > I could be wrong about all this but this what memory recalls. Whether or not you are, thanks for discussing it, it's always helpful. Cheers, Chris |
From: David <db...@cs...> - 2004-02-19 22:42:15
|
On Thu, Feb 19, 2004 at 05:10:54PM -0500, Chris Pickett wrote: >=20 > The WBINVD instruction is a privileged instruction. When the processor=20 > is running in protected mode, the CPL of a program or procedure must be= =20 > 0 to execute this instruction. This instruction is also a serializing=20 > instruction (see ?Serializing Instructions? in Chapter 8 of the IA-32=20 > Intel Architecture Software Developer?s Manual, Volume 3). >=20 > I'm not sure if this is a problem, but if not, maybe all that's require= d=20 > is WBINVD in the C&S for i386? >=20 Yes, it means it can be executed only in kernel mode that is the code must be located either in the kernel or a kernel module. You need to find the instructions for user mode... David --- David B=E9langer Graduate Student School of Computer Science McGill University Office: MC226 Web page: http://www.cs.mcgill.ca/~dbelan2/ Public key: http://www.cs.mcgill.ca/~dbelan2/public_key.txt |
From: Chris P. <chr...@ma...> - 2004-02-20 01:11:04
|
David B=E9langer wrote: > On Thu, Feb 19, 2004 at 05:10:54PM -0500, Chris Pickett wrote: >=20 >>The WBINVD instruction is a privileged instruction. When the processor = >>is running in protected mode, the CPL of a program or procedure must be= =20 >>0 to execute this instruction. This instruction is also a serializing=20 >>instruction (see ?Serializing Instructions? in Chapter 8 of the IA-32=20 >>Intel Architecture Software Developer?s Manual, Volume 3). >=20 >=20 >>I'm not sure if this is a problem, but if not, maybe all that's require= d=20 >>is WBINVD in the C&S for i386? >> >=20 >=20 > Yes, it means it can be executed only in kernel mode that is the > code must be located either in the kernel or a kernel module. >=20 > You need to find the instructions for user mode... >=20 I made a mistake. Although the CMPXCHG instruction doesn't specify any=20 cache effets on its own, it can be preceded by a LOCK operation. And=20 yes, the SableVM code has this LOCK: __asm__ __volatile__ ("lock\n\t" "cmpxchgl %3, %1\n\t" "sete %0" :"=3Dq" (result), "=3Dm" (*pword), "=3Da"(current_= value) :"r" (new_value), "m" (*pword), "a" (old_value) :"memory"); From the IA-32 System Programming Guide: 7.1.4. Effects of a LOCK Operation on Internal Processor Caches For the Intel486 and Pentium processors, the LOCK# signal is always=20 asserted on the bus during a LOCK operation, even if the area of memory=20 being locked is cached in the processor. For the Pentium 4, Intel Xeon, and P6 family processors, if the area of=20 memory being locked during a LOCK operation is cached in the processor=20 that is performing the LOCK operation as write-back memory and is=20 completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location=20 internally and allow it=92s cache coherency mechanism to insure that the = operation is carried out atomically. This operation is called =93cache=20 locking.=94 The cache coherency mechanism automatically prevents two or=20 more processors that have cached the same area of memory from=20 simultaneously modifying data in that area. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D The manual then explains how snooping is used to maintain cache=20 coherency. The only relevant user mode instruction is CLFLUSH, which=20 flushes a single cache line, but this is intended as an optimization=20 only. So, AFAICT, the existing C&S is fine for i686, unless for some=20 reason either: a) the processor is set to the startup CD=3D1, NW=3D1 mode (see Table 10-= 5=20 in the programming guide for more info), which does not maintain cache=20 coherency, or b) there is some other device on the system bus that does not perform=20 cache snooping to maintain coherency. =2E.. but then ... I think other multithreaded applications would crash. So that means it looks like the problem is elsewhere (e.g. writes to=20 "xxx.flag" that Etienne mentioned). Still, it's good to have made sure. = I'll investigate putting assembly locks around the unsynchronized piece= s. Cheers, Chris P.S. If any of you don't want the CC's on this thread anymore, let me=20 know; the SF server has been a bit unreliable semi-lately. |
From: Etienne G. <gag...@uq...> - 2004-02-20 03:11:05
|
Hi All, First, I'll be away for a little over 2 weeks, so don't expect any reply from me. See comments below. Chris Pickett wrote: > ... > For the Pentium 4, Intel Xeon, and P6 family processors, if the area of > memory being locked during a LOCK operation is cached in the processor > that is performing the LOCK operation as write-back memory and is > completely contained in a cache line, the processor may not assert the > LOCK# signal on the bus.... > ... > So that means it looks like the problem is elsewhere (e.g. writes to > "xxx.flag" that Etienne mentioned). ... Hmmm... I'm not convinced. Have you actually read the JVM spec for JMM (memory model)? Acquiring a thin lock should cause all UNRELATED memory content to be gotten from main memory after the lock, and unlocking should do the reverse. SableVM does not do any of this currently. It's not a problem on uniprocessors, but I expect this to be quite a problem on MPs. My language, here, is quite fuzzy; it would be best explained in terms of "read/write" barriers, yet I have to find a "reliable" definition of "barriers" which would be consistent across MPs. Etienne -- Etienne M. Gagnon, Ph.D. http://www.info.uqam.ca/~egagnon/ SableVM: http://www.sablevm.org/ SableCC: http://www.sablecc.org/ |
From: Chris P. <chr...@ma...> - 2004-02-20 04:48:16
|
Etienne Gagnon wrote: > Hi All, > > First, I'll be away for a little over 2 weeks, so don't expect any > reply from me. Have fun! > See comments below. > > Chris Pickett wrote: > >> ... >> For the Pentium 4, Intel Xeon, and P6 family processors, if the area >> of memory being locked during a LOCK operation is cached in the >> processor that is performing the LOCK operation as write-back memory >> and is completely contained in a cache line, the processor may not >> assert the >> LOCK# signal on the bus.... >> ... So that means it looks like the problem is elsewhere (e.g. writes >> to "xxx.flag" that Etienne mentioned). ... > > > Hmmm... I'm not convinced. Have you actually read the JVM spec for > JMM (memory model)? Acquiring a thin lock should cause all UNRELATED > memory content to be gotten from main memory after the lock, and > unlocking should do the reverse. SableVM does not do any of this > currently. You mean in Section 8.9, right? "Locking any lock conceptually flushes all variables from a thread's working memory, and unlocking any lock forces the writing out to main memory of all variables that the thread has assigned." I'm pretty sure that when the spec says "working memory" it does not mean "processor cache" but "thread-local heap". SableVM doesn't have thread-local heaps (we discussed it the other day), and to me a large part of the JMM appears to be unimportant for SableVM. I think the current problem is related to multithreading in SableVM at the C / pthreads / MP level, but not at the JMM level. Cheers, Chris |
From: Archie C. <ar...@de...> - 2004-02-20 05:53:35
|
Chris Pickett wrote: > "Locking any lock conceptually flushes all variables from a thread's > working memory, and unlocking any lock forces the writing out to main > memory of all variables that the thread has assigned." > > I'm pretty sure that when the spec says "working memory" it does not > mean "processor cache" but "thread-local heap". SableVM doesn't have > thread-local heaps (we discussed it the other day), and to me a large > part of the JMM appears to be unimportant for SableVM. I think that's mistaken... by "working memory" they just mean a conceptual "working memory" that only the one thread has access to. I.e., the processor cache metaphor is a good one here. Thread-local heaps are a different topic I think. -Archie __________________________________________________________________________ Archie Cobbs * CTO, Awarix * http://www.awarix.com |
From: Chris P. <chr...@ma...> - 2004-02-20 08:12:59
|
Archie Cobbs wrote: > Chris Pickett wrote: > >>"Locking any lock conceptually flushes all variables from a thread's >>working memory, and unlocking any lock forces the writing out to main >>memory of all variables that the thread has assigned." >> >>I'm pretty sure that when the spec says "working memory" it does not >>mean "processor cache" but "thread-local heap". SableVM doesn't have >>thread-local heaps (we discussed it the other day), and to me a large >>part of the JMM appears to be unimportant for SableVM. > > > I think that's mistaken... by "working memory" they just mean > a conceptual "working memory" that only the one thread has access to. > I.e., the processor cache metaphor is a good one here. > Thread-local heaps are a different topic I think. Okay, I think I finally understand. Thank-you all for your patience ... and I apologize for all the long emails (they are coming to an end, as I think a reasonable solution is close). I'll first explain my current conception of things -- I'd be grateful for any comment as to whether this sounds right or not. In SableVM, all threads access the same heap, or "main memory". At the hardware level, when a heap memory location is read into the cache, this is the same operation as bringing the value into the Java thread's "working memory". On a uniprocessor, there is only one cache, so in effect all threads' "working memories" are visible to each other (and so it is as if there are no working memories at all). Okay, actually the cache might consist of L1 and L2, but it doesn't matter w.r.t. visibility. On an SMP machine, the "main memory" is the non-cache heap memory, visible to all processors. Threads may reside on the same processor, and if this is the case, their "working memories" are also visible to each other, and no problems arise (functionally identical to the uniprocessor case). If they are on separate processors, in order to meet the requirements of the JMM, each time a lock is acquired by a thread, ALL of the lines brought into the cache by the current thread as a result of reading values from the Java heap must be flushed: they are written back to main memory if they were touched by the thread, otherwise they are simply invalidated. When a lock is released, ONLY those lines associated with the thread that have been modified since acquiring the lock need flushing. So ... assuming that's now correct, I think there are three things that we might do to consider (some or all of which may be gibberish): 1) Flush the entire processor cache as part of each MONITORENTER and MONITOREXIT, or when entering or leaving a synchronized method. This would involve calling / executing one of: a) WBINVD (not available in user mode), b) CLFLUSH on the entire cache (one line at a time), c) a kernel whole-cache flush routine, d) flooding the cache by reading in a bunch of non-Java-heap data. 2) Keep track of which Java heap addresses are read / written by a thread, and flush only the cache lines that match those addresses as part of MONITORENTER / MONITOREXIT, or when entering or leaving a synchronized method. This would involve calling / executing: a) CLFLUSH for each line b) a line-specific kernel cache flush routine. 3) Use the memory barrier instructions: a) MFENCE on each (Java only?) lock/unlock ensures that all loads and stores occurring before the lock/unlock are globally visible before any load or store that follows the MFENCE b) *** while it appears that SFENCE (identical to MFENCE except only stores are serialized) might be appropriate for the unlock operation, this would mean a load operation depending on a store ordered before the MFENCE might occur out-of-order, which would be bad. *** ........ Finally: After I wrote this, I looked again at question #118 of the comp.programming.threads FAQ, and it seems to agree with what I've written, and also makes me think that method (3) is the best. http://www.lambdacs.com/cpt/FAQ.html (careful, it will probably kill Mozilla, lynx is better) Cheers, Chris |
From: Archie C. <ar...@de...> - 2004-02-20 15:09:19
|
Chris Pickett wrote: > On an SMP machine, the "main memory" is the non-cache heap memory, > visible to all processors. Threads may reside on the same processor, > and if this is the case, their "working memories" are also visible to > each other, and no problems arise (functionally identical to the > uniprocessor case). If they are on separate processors, in order to > meet the requirements of the JMM, each time a lock is acquired by a > thread, ALL of the lines brought into the cache by the current thread as > a result of reading values from the Java heap must be flushed: they are > written back to main memory if they were touched by the thread, > otherwise they are simply invalidated. When a lock is released, ONLY > those lines associated with the thread that have been modified since > acquiring the lock need flushing. > > So ... assuming that's now correct, I think there are three things that > we might do to consider (some or all of which may be gibberish): > > 1) Flush the entire processor cache as part of each MONITORENTER and > MONITOREXIT, or when entering or leaving a synchronized method. This > would involve calling / executing one of: > a) WBINVD (not available in user mode), > b) CLFLUSH on the entire cache (one line at a time), > c) a kernel whole-cache flush routine, > d) flooding the cache by reading in a bunch of non-Java-heap data. > > 2) Keep track of which Java heap addresses are read / written by a > thread, and flush only the cache lines that match those addresses as > part of MONITORENTER / MONITOREXIT, or when entering or leaving a > synchronized method. This would involve calling / executing: > a) CLFLUSH for each line > b) a line-specific kernel cache flush routine. > > 3) Use the memory barrier instructions: > a) MFENCE on each (Java only?) lock/unlock ensures that all loads and > stores occurring before the lock/unlock are globally visible before any > load or store that follows the MFENCE > b) *** while it appears that SFENCE (identical to MFENCE except only > stores are serialized) might be appropriate for the unlock operation, > this would mean a load operation depending on a store ordered before the > MFENCE might occur out-of-order, which would be bad. *** > > ........ > > Finally: After I wrote this, I looked again at question #118 of the > comp.programming.threads FAQ, and it seems to agree with what I've > written, and also makes me think that method (3) is the best. That is consistent with my understanding as well. I think #3 is best too, and it's probably OK to "punt" and say that a processor-specific instruction sequence will be required for the read and write barriers (and therefore an additional porting task). FYI the Linux kernel has examples of asm() statements that create memory barriers for all its supported architectures. -Archie __________________________________________________________________________ Archie Cobbs * CTO, Awarix * http://www.awarix.com |