Re: [Sablevm-developer] Threading support in SableVM

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Chris Pickett wrote:

> Clark VERBRUGGE wrote:
> 
>> That would fix the java heap.  But are not internal sablevm operations
>> more of the problem?  Are all shared data accesses (reads or writes)
>> already protected by mutexes?
> 
> 
> I didn't answer your question.  The answer is I think so, but I don't 
> know.  I think certain things are declared as volatile so that explicit 
> mutexes aren't required, but that this may be broken.  There might be 
> problems with both thin locks and the thread status variable, amongst 
> other things.  We'll see ... I think the first step is to make the Java 
> heap work.  After that we can try to break things again.
> 

Just an update on what I've found out, but first a summary for those who 
don't want to read all this:

The problems seem to be in thread startup, and there might be something 
wrong with splay_tree.c.  I put barriers in what I think are the right 
places, so I don't think the problems are JMM-related.  I could be wrong 
about this, especially if the splay_tree problem violates the JMM rules 
about GC (see the "miscellany" at the end of the cookbook).

Details of JMM / locking studies follow, and after that some debugging
information (limited, because debugging these intermittent errors is 
frustrating).  I don't really have any useful leads on what needs to be 
fixed anymore, so I'm going to stop now.  I think careful study and/or 
unit testing of splay_tree.c might help -- I've done neither of these 
things.  One of the more repeatable errors involves an "impossible 
control flow" in interpreter.c -- but I can't get an instruction trace 
on this so it seems useless.

As a side note, Etienne, it seems strange to me that under contention, 
you don't inflate the actual thin lock being released in your algorithm, 
but you inflate all of the other locks that the thread owns.  I think it 
means that if two threads were competing for just one lock, it would 
never get inflated.

Cheers,
Chris

======================================================================

   I implemented _svmf_store_load_barrier() in system.c.  It only works 
for i386 at the moment.  Functions to implement StoreStore, LoadLoad, 
and LoadStore barriers are necessary for other processors (these 
barriers are no-ops on x86-PO, and the Athlon MP known as 
tofu.cs.mcgill.ca is an x86-PO).  See the JSR-133 cookbook for details. 
  The mfence instruction is illegal on this processor, so I'm using a 
locked add instead (as per the JSR-133 cookbook) -- it should be 
portable across all x86-PO's.

Calling pthread_mutex_lock()/unlock() also imposes a StoreLoad barrier.

These are the changes I made in my sandbox:

* Implemented _svmf_store_load_barrier() for i386 in system.c
* Commented _svmf_enter_object_monitor() after having gone through
   Bacon's, Onodera's, and Etienne's papers thoroughly.  It should
   be easy to understand _svmf_exit_object_monitor() based on it.
* Inserted StoreLoad barriers either side of setting the contention
   bit in _svmf_enter_object_monitor() -- recommended by Onodera
* Inserted StoreLoad barriers either side of clearing the contention
   bit in _svmf_exit_object_monitor() -- recommended by Onodera
* Inserted StoreLoad barriers either side of freeing a thin lock
   in _svmf_exit_object_monitor() -- recommended by Onodera
* Inserted a StoreLoad barrier AFTER decrementing the recursive count
   of a thin lock in _svmf_exit_object_monitor()
   -- recommended by JSR-133

This didn't fix the test cases I have.  I then looked at putting 
StoreLoad barriers after volatile stores.  Although information about 
volatile fields is available at method preparation time, it is annoying 
to try and make volatile-safe versions of all the 
(get|put)(field|static) instructions (especially since I don't believe 
this to be the cleanest solution).  So I just (temporarily) put a 
StoreLoad barrier between *every* instruction in the main dispatch loop. 
  This still didn't fix things.

I think ALL of the JNI functions are safe because they all call 
_svmf_resuming_java() which immediately calls _svmm_compare_and_swap() 
(except for the ones in invoke_interface.c, but I don't think that's 
where the problem lies).

The JSR-133 cookbook says Thread.start() and Thread.join() need barriers 
... but ... we don't appear to have a Thread.join() method, and 
Thread.start() calls pthread_create().  No errors looking for 
Thread.join() either.

Finally, the last relevant points of the miscellany at the end of the 
JSR-133 cookbook address synchronization with garbage collectors, but we 
have a stop-the-world collector and I don't see where this could be 
unsafe.  I even tried disabling GC altogethers.  Note that I got a 
couple of core dumps where things died in splay_tree.c.

The problems seem to be in thread startup, and there might be something 
wrong with splay_tree.c.

1) sablevm-chris-switch-debug Incrementer

628
sablevm: INTERNAL ERROR (source file "interpreter.c", line 330): 
impossible control flow
500
Aborted

(gdb) bt
#0  0x4012c781 in kill () from /lib/libc.so.6
#1  0x400b2e5e in pthread_kill () from /lib/libpthread.so.0
#2  0x400b3339 in raise () from /lib/libpthread.so.0
#3  0x4012dbe1 in abort () from /lib/libc.so.6
#4  0x4008ba78 in _svmh_fatal_error (filename=0x400954c1 
"interpreter.c", linenumber=330, msg=0x4008e56a "impossible control 
flow") at fatal.c:29
#5  0x4006f391 in _svmf_interpreter (_env=0x8066df0) at interpreter.c:330
#6  0x400280cc in _svmh_invoke_static_virtualmachine_runthread 
(env=0x8066df0) at method_invoke.c:5065
#7  0x4001f897 in _svmf_thread_start (_env=0x8066df0) at thread.c:1427
#8  0x400b00ba in pthread_start_thread () from /lib/libpthread.so.0
#9  0x401d2d6a in clone () from /lib/libc.so.6
(gdb) q

2) sablevm-chris-switch-debug Incrementer

(I don't have the stdout/stderr for this):

(gdb) bt
#0  0x40130781 in kill () from /lib/libc.so.6
#1  0x400b6e5e in pthread_kill () from /lib/libpthread.so.0
#2  0x400b7339 in raise () from /lib/libpthread.so.0
#3  0x40131be1 in abort () from /lib/libc.so.6
#4  0x4008e952 in _svmf_tree_splay_gc_map (proot=0x40098b01, node=0x14a) 
at splay_tree.c:408
#5  0x40070fd1 in _svmf_interpreter (_env=0x8066e00) at 
instructions_preparation_switch_threaded.c:286
#6  0x40028e2b in _svmh_invoke_static_virtualmachine_runthread 
(env=0x8066e00) at method_invoke.c:5065
#7  0x4001f897 in _svmf_thread_start (_env=0x8066e00) at thread.c:1427
#8  0x400b40ba in pthread_start_thread () from /lib/libpthread.so.0
#9  0x401d6d6a in clone () from /lib/libc.so.6
(gdb)

3) sablevm-chris-switch-debug ThreadStarter

Exception in thread "Thread-1" java.lang.NullPointerException
    at IncrementRunnable.run (ThreadStarter.java:51)
    at java.lang.Thread.run (Thread.java:670)
    at java.lang.VMThread.callRun (VMThread.java:116)
    at java.lang.Thread.callRun (Thread.java:343)
    at java.lang.VirtualMachine.runThread (VirtualMachine.java:117)