Re: multithread.txt

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Vladimir Tzankov writes:
 > First of all doc/multithread.txt is not intended to be documentation  
 > of the MT implementation. I view it mostly as notes describing  
 > principles and parts of the implementation. Also it is not updated  
 > regularly.
I understand.  It was the (irregular) update that reminded me to send
the note.  I think the principles are particularly important.  
I'm basically looking for what I need in order order to write programs
using MT.  The first thing I'm trying to understand is what exactly I 
need to know.

Starting with the easiest part:

 > > 3. Provide ways for programmers to define their own larger "atomic"
 > > operations, which prevent other processes from observing intermediate
 > > states that violate their assumptions/requirements.
 > 
 > Isn't this the purpose of synchronization primitives (which currently  
 > are absent - but soon will be ready)?
Yes, exactly.  I believe you are adressing this need.
It's the others that I think need more attention.

 > > 2. Define an execution model of lisp in terms of a sequence of states
 > > and ensure that every lisp process can be accurately understood in
 > > terms of that model.
 > > I hope we can think of any parallel execution as a sequence of states
 > > in which the transition between each pair of adjacent states is
 > > defined by the execution of the next step of one of the running
 > > processes.  The main point is that no process should ever be able to
 > > observe a state that is not predicted by that model.
 > > For instance, the lisp statement (setf (gethash x y) z) should be
 > > defined as a single state transition.  It should not be possible for
 > > another process to observe a state of the hash table between the one
 > > before the setf and the one after.  And it is the job of the lisp
 > > implementation to ensure that this is the case.
 > > (Is this correct?
 > > If not then the programmer better be told what is!
 > > The file says:
 > >   If two threads evaluate
 > >    (INCF (GETHASH x global-ht 0)),
 > >   the results are undefined.
 > > I hope this means that the value in the hash table might be 1 or 2,
 > > but NOT that lisp dies or that there are two different entries in the
 > > table for the value of x.
 > > )
 > 
 > I agree. Particularly for GETHASH - of course the value might be 1 or  
 > 2. No crashes or inconsistencies will be "tolerated".

Your answer suggests that you're focused on the example, whereas I'm
really more interested in the general principle.  If you agree with me
that the semantics of lisp programs can be defined in terms of
sequences of primitive operations that cause transitions between
states, and that the semantics of a multithreaded version can be
defined in terms of interleavings of these sequences of operations,
then the problem reduces to
- defining the primitive operations and how they relate to the lisp
  source
- ensuring that the implementation conforms with that semantics (which
  of course is your problem as the implementor, rather than mine as the
  programmer trying to use your implementation)

Just out of curiosity, how do you (or do you plan to) ensure that two
threads each executing (INCF (GETHASH x global-ht 0)) does something
acceptable, e.g., does NOT end up with two entries for x in the table?
I'd expect you'd have to internally lock the hashtable.  Also, I'd 
expect that the code to read a hash table could not run at the same 
time as code to write it, so effectively every hash table read would
have to acquire a lock.  And I worry that this, in combination with
other locks, could lead to deadlocks.

I believe the reason you think it's acceptable for the value of gethash
to be 1 or two is that you interpret (INCF (GETHASH x global-ht 0))
not as a primitive operation, but rather as
 (setf (GETHASH x global-ht 0) (1+ (GETHASH x global-ht 0)))
which is a sequence of three primitive operations:
 (setf temp-var (GETHASH x global-ht 0))
 (incf temp-var) ;; which can be treated as primitive only because
                 ;; temp-var is not shared by any other thread
 (setf (GETHASH x global-ht 0) temp-var)
If two such sequences are interleaved then we never get two entries 
for x in the table, whereas, if the the primitives are made smaller,
then such an outcome might become possible.

 > > 1. Specify the set of objects read and written by a piece of code so
 > > that the programmer can figure out which ones are written by one
 > > process and read by another.
 > Hmm, i am not sure I understand this. Can you give an example?

If the semantics (#2) were completely defined, I suppose that
description would include #1.  What I worry about is that the things I
think of as primitive might be implemented in terms of other things
that I don't know about.  This would mean that in order to write
correct code I would have to do more locking than I realize.
Off hand I don't know what I don't know, so it's hard to point to
examples.  How about this:
Do I have to lock streams, e.g., if two threads serve http requests,
and they both want to write to a common log, and they both do
 (format log ...)
and I don't use any additional locks,
- can the outputs of different calls to format be interleaved 
  (I hope not)
- can this cause lisp to crash (I even more hope not!)

 > The documentation of all lisp is not very extensive in MT part.  
 > Probably the only exception is Franz.
I recall seeing documentation of #3 there (and elsewhere), but not
#1,2.  If you have some references to anything like #1,2 please 
let me know.