Re: [Sablevm-developer] Threading support in SableVM
Brought to you by:
egagnon
From: Chris P. <chr...@ma...> - 2004-02-22 22:35:08
|
Chris Pickett wrote: > Clark VERBRUGGE wrote: > >> That would fix the java heap. But are not internal sablevm operations >> more of the problem? Are all shared data accesses (reads or writes) >> already protected by mutexes? > > > I didn't answer your question. The answer is I think so, but I don't > know. I think certain things are declared as volatile so that explicit > mutexes aren't required, but that this may be broken. There might be > problems with both thin locks and the thread status variable, amongst > other things. We'll see ... I think the first step is to make the Java > heap work. After that we can try to break things again. > Just an update on what I've found out, but first a summary for those who don't want to read all this: The problems seem to be in thread startup, and there might be something wrong with splay_tree.c. I put barriers in what I think are the right places, so I don't think the problems are JMM-related. I could be wrong about this, especially if the splay_tree problem violates the JMM rules about GC (see the "miscellany" at the end of the cookbook). Details of JMM / locking studies follow, and after that some debugging information (limited, because debugging these intermittent errors is frustrating). I don't really have any useful leads on what needs to be fixed anymore, so I'm going to stop now. I think careful study and/or unit testing of splay_tree.c might help -- I've done neither of these things. One of the more repeatable errors involves an "impossible control flow" in interpreter.c -- but I can't get an instruction trace on this so it seems useless. As a side note, Etienne, it seems strange to me that under contention, you don't inflate the actual thin lock being released in your algorithm, but you inflate all of the other locks that the thread owns. I think it means that if two threads were competing for just one lock, it would never get inflated. Cheers, Chris ====================================================================== I implemented _svmf_store_load_barrier() in system.c. It only works for i386 at the moment. Functions to implement StoreStore, LoadLoad, and LoadStore barriers are necessary for other processors (these barriers are no-ops on x86-PO, and the Athlon MP known as tofu.cs.mcgill.ca is an x86-PO). See the JSR-133 cookbook for details. The mfence instruction is illegal on this processor, so I'm using a locked add instead (as per the JSR-133 cookbook) -- it should be portable across all x86-PO's. Calling pthread_mutex_lock()/unlock() also imposes a StoreLoad barrier. These are the changes I made in my sandbox: * Implemented _svmf_store_load_barrier() for i386 in system.c * Commented _svmf_enter_object_monitor() after having gone through Bacon's, Onodera's, and Etienne's papers thoroughly. It should be easy to understand _svmf_exit_object_monitor() based on it. * Inserted StoreLoad barriers either side of setting the contention bit in _svmf_enter_object_monitor() -- recommended by Onodera * Inserted StoreLoad barriers either side of clearing the contention bit in _svmf_exit_object_monitor() -- recommended by Onodera * Inserted StoreLoad barriers either side of freeing a thin lock in _svmf_exit_object_monitor() -- recommended by Onodera * Inserted a StoreLoad barrier AFTER decrementing the recursive count of a thin lock in _svmf_exit_object_monitor() -- recommended by JSR-133 This didn't fix the test cases I have. I then looked at putting StoreLoad barriers after volatile stores. Although information about volatile fields is available at method preparation time, it is annoying to try and make volatile-safe versions of all the (get|put)(field|static) instructions (especially since I don't believe this to be the cleanest solution). So I just (temporarily) put a StoreLoad barrier between *every* instruction in the main dispatch loop. This still didn't fix things. I think ALL of the JNI functions are safe because they all call _svmf_resuming_java() which immediately calls _svmm_compare_and_swap() (except for the ones in invoke_interface.c, but I don't think that's where the problem lies). The JSR-133 cookbook says Thread.start() and Thread.join() need barriers ... but ... we don't appear to have a Thread.join() method, and Thread.start() calls pthread_create(). No errors looking for Thread.join() either. Finally, the last relevant points of the miscellany at the end of the JSR-133 cookbook address synchronization with garbage collectors, but we have a stop-the-world collector and I don't see where this could be unsafe. I even tried disabling GC altogethers. Note that I got a couple of core dumps where things died in splay_tree.c. The problems seem to be in thread startup, and there might be something wrong with splay_tree.c. 1) sablevm-chris-switch-debug Incrementer 628 sablevm: INTERNAL ERROR (source file "interpreter.c", line 330): impossible control flow 500 Aborted (gdb) bt #0 0x4012c781 in kill () from /lib/libc.so.6 #1 0x400b2e5e in pthread_kill () from /lib/libpthread.so.0 #2 0x400b3339 in raise () from /lib/libpthread.so.0 #3 0x4012dbe1 in abort () from /lib/libc.so.6 #4 0x4008ba78 in _svmh_fatal_error (filename=0x400954c1 "interpreter.c", linenumber=330, msg=0x4008e56a "impossible control flow") at fatal.c:29 #5 0x4006f391 in _svmf_interpreter (_env=0x8066df0) at interpreter.c:330 #6 0x400280cc in _svmh_invoke_static_virtualmachine_runthread (env=0x8066df0) at method_invoke.c:5065 #7 0x4001f897 in _svmf_thread_start (_env=0x8066df0) at thread.c:1427 #8 0x400b00ba in pthread_start_thread () from /lib/libpthread.so.0 #9 0x401d2d6a in clone () from /lib/libc.so.6 (gdb) q 2) sablevm-chris-switch-debug Incrementer (I don't have the stdout/stderr for this): (gdb) bt #0 0x40130781 in kill () from /lib/libc.so.6 #1 0x400b6e5e in pthread_kill () from /lib/libpthread.so.0 #2 0x400b7339 in raise () from /lib/libpthread.so.0 #3 0x40131be1 in abort () from /lib/libc.so.6 #4 0x4008e952 in _svmf_tree_splay_gc_map (proot=0x40098b01, node=0x14a) at splay_tree.c:408 #5 0x40070fd1 in _svmf_interpreter (_env=0x8066e00) at instructions_preparation_switch_threaded.c:286 #6 0x40028e2b in _svmh_invoke_static_virtualmachine_runthread (env=0x8066e00) at method_invoke.c:5065 #7 0x4001f897 in _svmf_thread_start (_env=0x8066e00) at thread.c:1427 #8 0x400b40ba in pthread_start_thread () from /lib/libpthread.so.0 #9 0x401d6d6a in clone () from /lib/libc.so.6 (gdb) 3) sablevm-chris-switch-debug ThreadStarter Exception in thread "Thread-1" java.lang.NullPointerException at IncrementRunnable.run (ThreadStarter.java:51) at java.lang.Thread.run (Thread.java:670) at java.lang.VMThread.callRun (VMThread.java:116) at java.lang.Thread.callRun (Thread.java:343) at java.lang.VirtualMachine.runThread (VirtualMachine.java:117) |