Re: some ideas for multithreading support

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Joe...@t-... writes:

 > A) The idea is to give a minimum set of guarantees when users are
 > programming in the wild, without locks.

You mean without adding their own locks or without the hash table
implementation adding internal locks?
I'm expecting the hash table implementation to do whatever is necessary
in order to make programs written by users without adding their own
locks work as I think they are justified in expecting in the presence
of other threads.

 > Basically: no crash, no endless iteration due to sudden cycles in internal data structures.
 > Perhaps attempt to detect that and raise an error as Bruno suggested?
 > 
 > B) Other parts would use locks in order to give stronger guarantees.

I think you mean that the clisp hash table implementation would add locks
in order to make things work as expected.  That's what I want.

 > For instance, clisp would internally use locks when updating or
 > iterating a package's hash table, because

I've been assuming that a package IS a hash table and that when the
implementation of hash tables is fixed then packages will work.

 > Therefore, it does not make sense to raise the minimum set of
 > guarantees for the A case.

I don't understand the distinction between A and B.

 > Of course, you can try and make A and B collapse using algorithms
 > for concurrent hash-tables.  I have some doubts that this can work
 > because e.g. the package example shows that often enough you need
 > locks (or rather, critical sections) above and surrounding the
 > level where you have atomic/concurrent updates to individual
 > parts. The hash tables may support concurrent updates, but they are
 > only part of the package object which (likely) has more invariants
 > and postconditions.

I'm missing something.  Show me an example of what you're talking
about above.  You mean the lisp implementation has to lock things
internally?  That's fine with me.  Or the user writing a program to
iterate in one thread while altering the table in another has to add
locks?  I want this to be unnecessary in order to obtain the guarantee
I described.  If you need something stronger then you might have to add
locks.  For instance, you might want to guarantee that an entire 
iteration is done in the same hash table state.  And that would require
user level locks.

 > I have not analyzed your algorithm in sufficient detail to
 > determine whether it could work as a concurrent hash table

I have not described an algorithm, only a requirement.
I expect the requirement can be satisfied by various implementations,
and I hope some of them are even efficient and not too complicated.

 > implementation. But that's what it wants to be. My proposal was way
 > more limited: just use the present, non concurrent code, add a few
 > checkpoints / atomics / fences to ensure it will terminate. But I

I don't know enough about the current code to know what can go wrong
now or what has to be added in order to have what effects.

 > don't see how a simple non-concurrent bucket-based algorithm could
 > prevent keys from disappearing from an enumeration in the presence
 > of parallel deletion of other keys. That's precisely why CL forbids
 > such modifications.

Again I'm failing to understand your problem.
As a simple example, suppose we have a stable ordering of all keys,
and that a hash table always has the same number of buckets, with
a stable function mapping keys to buckets, and that all keys in the
same bucket are stored in an ordered list (sorted by the ordering).
Then an iteration would only have to keep track of the current key,
right?  It could lock the table in order to find the next key and
then unlock the table.
My specification says there should be some ordering of the keys that
were in the table at the start of iteration along with those added
during the iteration.  If the iteration has an order on buckets, and
within buckets an order on keys, then that gives us the required 
ordering: X < Y if either X belongs in an earlier bucket than Y 
according to the stable function mapping keys to buckets, or if 
that function maps both to the same bucket and X precedes Y in the 
stable ordering of all keys.

I realize that GC is likely to have an important impact on all
of this.  I expect it already does in the current implementation.
For the moment I leave that as a later issue.  The above is only 
intended as an example to illustrate the concept.  Maybe it can 
be extended to work acceptably in practice, maybe not.

 > One could RPLACA the deleted element with #<UNBOUND> to preserve
 > the integrity of the bucket list. But that would be a burden
 > imposed on all tables, not exactly nice.

I can sort of understand that in the context of some particular 
algorithm, or type of algorithm, but I don't think it's worth while
to try to respond without knowing exactly what algorithm you have in
mind.  I think you're saying that you DO have a solution for that
particular algorithm but you're not satisfied with its performance.
Maybe that particular performance problem can be fixed, or maybe
a different algorithm would be better.  If you're hoping to make 
a small change to the current hashing algorithms in clisp, I can
understand that desire, but since they were probably not designed
with this in mind, it might be easier to change them.  As I said
above, I don't know how they work in any detail.