Re: some ideas for multithreading support

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Joe...@t-... writes:

 > Regarding concurrent hash-tables, I'd relax the requirements about iteration:
 > - Iteration must terminate.
 > - The set of keys need not match the serializable database isolation level.
 > - It's OK for some keys to be missing from the enumeration! That's the price
 >   an application pays for going through a table without adequate outer locking.
 > - No key shall be presented twice(?) (except if it's removed + re-added meanwhile)
 > - Iteration must not crash.
 > 
 > IOW, it's OK during iteration to present keys that have been removed
 > concurrently. There's no need to implement snapshots. But the data
 > structures should never be seen in such an inconsistent state that the
 > iteration loops infinitely or crashes.

I think that if you have a thread that keeps adding new keys,
it's ok for the iteration to never terminate.
I don't think it's ok to miss keys that were present at the
beginning of the iteration and never removed.
Nor do I think it's ok to present the same key twice in any case.
I do agree that the iteration should not crash.

Here's what I wrote before:

  For instance, I hope hash table iteration could be described in 
  terms of all of the keys in the table at the beginning plus all 
  those added during the iteration being ordered somehow, and the 
  iteration visiting them in that order (along with the associated
  values at the time the key is visited), but skipping those that 
  were not in the table (either cause they were deleted earlier 
  or not yet added) at the time the iteration reaches their position.  

I hope the informal specification above is clear.

  (BTW, I think that the statement: 
   the iteration visits once every key that was in the table at the 
   beginning of the iteration but never deleted during the iteration, 
   never visits any key that was not in the table at the beginning 
   and not added during the iteration, and visits either once or zero
   times those that were added or deleted during the iteration
   is implied by that but fails to guarantee some desired ordering 
   properties guaranteed by the more operational description.)

To be a bit more explicit, here's an example.
I think I'm asking for something that every programmer will expect.

Suppose an iteration over a table starts in a state with the table
empty, and another thread performs the following operations in the
following order:
 operation     resulting state
 add A => 1    (A=>1)
 add B => 2    (A=>1,B=>2)
 remove A      (B=>2)
 add A => 3    (B=>2,A=>3)  
 remove B      (A=>3)
 add B => 4    (A=>3,B=>4)
and now the iteration returns.  
Suppose the iteration simply collects and returns a list of all the
keys and values in the order visited.
The following results from the iteration are allowable:
 NIL - the iteration could have finished before the first add
 all of the resulting states listed above are allowed when 
   interpreted as iterations results, since the entire iteration
   could have been performed in that state
 (A=>1,B=>4), even though the table was never in such a state,
   because it could visit A right after A was added and B just
   before the end)
 (B=>4) is allowed because A might have been visited during the
   time it was not present, and then B visited at the end.

However (B=>4,A=>1) is NOT allowed, because after B became
related to 4, A was never related to 1.

Note that my model does require that the table can be viewed
as having gone through a sequence of states corresponding to
the operations having been performed in some order.
So if the first two adds had been done by two different threads,
we might have a choice of the order in which they were done.
But this choice could be limited by other evidence, such as
reads that determine what's in the table at different ordered
points in time.

The specification above also rules out clearly ridiculous
outcomes such as (C=>4) or (A=>9).